OpenGL pipeline stall with CUDA

I’m doing morph animations on GPU using CUDA. Each frame, I update the vertex buffer before rendering:

cudaGraphicsResourceSetMapFlags(cudaResource, cudaGraphicsMapFlagsWriteDiscard);
cudaError err = cudaGraphicsMapResources(1, &cudaResource, 0);
ASSERT(err == cudaSuccess);
cudaStreamSynchronize(0);

//Update Vertex Buffer 

err = cudaGraphicsUnmapResources(1, &cudaResource, 0);
ASSERT(err == cudaSuccess);
cudaStreamSynchronize(0);

After, I render using glDrawRangeElements.
Using Nsight I see that glDrawRangeElements call stalls until GPU begins to actually draw the same mesh.

[ATTACH=CONFIG]621[/ATTACH]

The lag is independent of the computation I’m doing. As long as the resource is Map / Unmapped the lag is present.
I added cudaStreamSynchronize and cudaDeviceSynchronize to ensure GPU is done and I also double and triple buffered my Vertex Buffer but it didn’t change anything.
I get the lag only when I use the Map the resource using CUDA, otherwise it all runs well.

I’m on windows 7 with NVIDIA GTX 480.
I’ve tried updating the drivers, CUDA versions(5.5 and 6.0) and the GPU (GTX 680) but to no avail.

Any ideas or pointers would be greatly appreciated.
Thanks!