Part of the Khronos Group
OpenGL.org

The Industry's Foundation for High Performance Graphics

from games to virtual reality, mobile phones to supercomputers

Results 1 to 6 of 6

Thread: Performance problems with GL_MAP_PERSISTENT_BIT

  1. #1
    Intern Newbie
    Join Date
    Nov 2008
    Posts
    43

    Question Performance problems with GL_MAP_PERSISTENT_BIT

    At http://www.slideshare.net/CassEveritt/beyond-porting it is stressed that GL_MAP_UNSYCHRONIZED_BIT should not be used because it causes a sync between the client and driver threads. So I dropped that. Next, I wanted to save on making the glMapBufferRange() and glUnmapBuffer() calls, so I tried to use GL_MAP_PERSISTENT_BIT. I used it in two cases, and came up against a problem.

    In the first case, I used it for the uniform buffers that holds the transform matrices, together with GL_MAP_FLUSH_EXPLICIT_BIT and calls to glFlushMappedBufferRange(). It seemed to work without any performance issue, even though I'm uploading per object before each draw call. I assume that any performance difference is small that other factors dominate.

    In the second case, I tried it in the following context: I have shared memory where another process draws an HD resolution 32bpp image, on average once per frame. Whenever there's a new image, I upload it to a PBO and from there to a texture (as the latter is asynchronous)--there are actually two PBOs that I ping-pong between. What happened is that when I changed from map-memcpy-unmap to a persistent mapping and then memcpy-flush, as I had done with the uniform buffers in my first test case, the performance dropped a lot. Note that this happened with any combination of other flags I tried. I tried flushing both right after the memcpy, and instead right before the use of the data to load into texture from the PBO. I tried no explicit flushing. I tried putting GL_MAP_UNSYNCHRONIZED bit in again. I tried GL_MAP_COHERENT_BIT. I also tried to use fences (one for each PBO) set after the use of the buffer to load into texture and corresponding glClientWaitSync() before the memcpy into it. I tried orphaning with GL_MAP_INVALIDATE_BUFFER_BIT (though I'm not sure it makes sense for the large amount of data being transferred). I tried these in various combinations, but in the end, I simply could not get the performance back to what it was with the map-memcpy-unmap.

    What am I missing? I'm running this on an NVIDIA GTX680 with the 332.21 driver (Windows 7 x64).

  2. #2
    Senior Member OpenGL Guru Dark Photon's Avatar
    Join Date
    Oct 2004
    Location
    Druidia
    Posts
    3,213
    Quote Originally Posted by Prune View Post
    What am I missing? I'm running this on an NVIDIA GTX680 with the 332.21 driver (Windows 7 x64).
    Good question! You got me. I haven't tried that trick Cass and John talk about in that presentation. Thanks for posting a link BTW! I hadn't seen that one yet (fresh off the presses a few weeks ago).

    Currently I'm an UNSYNC+INVALIDATE_RANGE / INVALIDATE_BUFFER addict. But I need to try what they suggest to see if it's truly worth kicking UNSYNC to the curb.

    You might cook a short GLUT test program that can be easily flipped back and forth between the two methods. You're guaranteed to get a number of folks trying it, tweaking it, and posting their results to the forum for you to see.

  3. #3
    Intern Contributor
    Join Date
    Nov 2011
    Posts
    51
    Try GL_MAP_PERSISTENT_BIT with Immutable Storages and not normal Buffers/PBOs. It should works.

  4. #4
    Intern Newbie
    Join Date
    Nov 2008
    Posts
    43
    Quote Originally Posted by tdname View Post
    Try GL_MAP_PERSISTENT_BIT with Immutable Storages and not normal Buffers/PBOs. It should works.
    I am using immutable storage: I'm creating the buffers with glNamedBufferStorageEXT()

  5. #5
    Senior Member OpenGL Pro
    Join Date
    Jan 2012
    Location
    Australia
    Posts
    1,117
    You may be interested in these results for a series of line draws


    I do the following with GL_MAP_UNSYNCHRONIZED_BIT


    Code :
     
     
    for 200 times
      map vertex buffer
      copy in 200,000 vertices
      unmap buffer
      draw GL_LINES
     
     
    swap render buffer
     
     
    glFenceSync
    glWaitSync


    I repeat this with the method suggested by Cass using a triple size buffer
    and it ran 20-30% faster.


    The numbers bounced around a lot more than with GL_MAP_UNSYNCHRONIZED_BIT but were never slower


    My timers are a bit crude but the speed improvement was noticable.


    My fastest test was with glBegin/glEnd which was about 40% faster but I was running in debug mode!

    EDIT:
    I tried these tests in release mode and GL_MAP_PERSISTENT_BIT and glBegin/glEnd are on par both about 40% faster than GL_MAP_UNSYNCHRONIZED_BIT
    Last edited by tonyo_au; 02-11-2014 at 11:11 PM.

  6. #6
    Senior Member OpenGL Guru Dark Photon's Avatar
    Join Date
    Oct 2004
    Location
    Druidia
    Posts
    3,213
    That's impressive. Thanks for posting your test results.
    Last edited by Dark Photon; 02-12-2014 at 05:30 AM.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •