Iím doing morph animations on GPU using CUDA. Each frame, I update the vertex buffer before rendering:

Code :
cudaGraphicsResourceSetMapFlags(cudaResource, cudaGraphicsMapFlagsWriteDiscard);
cudaError err = cudaGraphicsMapResources(1, &cudaResource, 0);
ASSERT(err == cudaSuccess);
cudaStreamSynchronize(0);
 
//Update Vertex Buffer 
 
err = cudaGraphicsUnmapResources(1, &cudaResource, 0);
ASSERT(err == cudaSuccess);
cudaStreamSynchronize(0);

After, I render using glDrawRangeElements.
Using Nsight I see that glDrawRangeElements call stalls until GPU begins to actually draw the same mesh.

Click image for larger version. 

Name:	nsight.jpg 
Views:	114 
Size:	94.4 KB 
ID:	1284

The lag is independent of the computation Iím doing. As long as the resource is Map / Unmapped the lag is present.
I added cudaStreamSynchronize and cudaDeviceSynchronize to ensure GPU is done and I also double and triple buffered my Vertex Buffer but it didnít change anything.
I get the lag only when I use the Map the resource using CUDA, otherwise it all runs well.

Iím on windows 7 with NVIDIA GTX 480.
Iíve tried updating the drivers, CUDA versions(5.5 and 6.0) and the GPU (GTX 680) but to no avail.

Any ideas or pointers would be greatly appreciated.
Thanks!