Part of the Khronos Group
OpenGL.org

The Industry's Foundation for High Performance Graphics

from games to virtual reality, mobile phones to supercomputers

Results 1 to 6 of 6

Thread: Image load/store mutex problem

  1. #1
    Member Regular Contributor
    Join Date
    Jan 2012
    Location
    Germany
    Posts
    325

    Question Image load/store mutex problem

    Hello,


    I have a fragment shader in which I need to guarantee exclusive access to some memory locations for one thread. As this is not just one operation for which I could use an atomic operation, I need to lock a part of my code. Currently I tried the following pattern (simplified):

    Code :
    bool keepWaiting = true;
    int MAX_TRY = 30;
    int try = 0;
    while (keepWaiting && try < MAX_TRY) {
        memoryBarrier();
        if (imageAtomicExchange( mutex, coord, 1u) == 0u) {
            doWork();
            keepWaiting = false;
            memoryBarrier();
            imageAtomicExchange( mutex, coord, 0u);
        } else {
            try++;
        }
    }

    My problem with this is the following: As doWork() needs some time (basically a couple of image store operations), the waiting threads which run on another warp/wavefront can burn thru there tries very quickly and just give up. If I increase MAX_TRY to counter this, the performance drops drastically as each try will need one expensive atomic memory access. If all threads fighting for the same lock would run on the same warp I wouldn't have this problem, sadly, this is not always the case.

    Now my question is, are there better suited pattern for this?


    Robert

  2. #2
    Senior Member OpenGL Pro
    Join Date
    Apr 2010
    Location
    Germany
    Posts
    1,129
    It'd be interesting to know what kind of work you do in doWork() - maybe there's a way to avoid syncing altogether. Also, why do you need the first barrier? Do you have the same problems when using a buffer object and non-image atomics functions?

    If all threads fighting for the same lock would run on the same warp I wouldn't have this problem, sadly, this is not always the case.
    Can you elaborate on that?

  3. #3
    Member Regular Contributor
    Join Date
    Jan 2012
    Location
    Germany
    Posts
    325
    Quote Originally Posted by thokra View Post
    It'd be interesting to know what kind of work you do in doWork() - maybe there's a way to avoid syncing altogether. Also, why do you need the first barrier? Do you have the same problems when using a buffer object and non-image atomics functions?



    Can you elaborate on that?
    Hi,

    doWork() contains write operations to 3 or 4 images and a few read operations (~8) but not much other operations. Sadly, I can't avoid the sync. The first barrier is a remaining relict of an earlier test, it can get deleted ;-)
    I have not tried to do the same thing with buffer objects as I need one mutex per pixel (read: a lot).
    If a warp of 32 threads all want to perform doWork, one of them gets the mutex and blocks the other 31 while doing the if-case, only when the first thread finishes the other 31 run into the else part and increase the counter. One of them will in the next loop get the mutex etc. In a SIMD processor the threads in one warp are not independent, so they can't run thru the else part a couple of times while one thread is in the if-part, but in case the threads who want to access this specific mutex it can be the case...

  4. #4
    Senior Member OpenGL Pro
    Join Date
    Apr 2010
    Location
    Germany
    Posts
    1,129
    Incidentally, did you get anywhere with this yet? Would be very interesting to know.

  5. #5
    Member Regular Contributor
    Join Date
    Jan 2012
    Location
    Germany
    Posts
    325
    So far I did not find a solution, mainly due to a vacation :-D
    I will soon continue to look into the problem again but I believe I have to try to change my algorithms.

  6. #6
    Junior Member Newbie
    Join Date
    Mar 2011
    Location
    Australia
    Posts
    25
    You might have already seen it, but there's similar locking code here: http://blog.icare3d.org/2010/07/open...-lists-of.html
    It's a few years old but it might be worth a look.

    So to clarify, this is in a fragment shader. You have a per-pixel mutex and this contention occurs between fragments writing to the same pixel. You say all threads in a warp must converge at an if-statement (maybe designed like this to help thread/later operation coherency).

    I assume it's quite likely for two threads in contention for a pixel to be running in separate warps, since threads in a warp are likely from the same polygon and won't be overlapping.

    Just thinking out loud but the continuous atomicExchanges may be overshadowing the memory operations in doWork(), depending on the amount of contention. Maybe introducing some form of sleep as well as ++try would actually improve performance, reducing the load on memory transfer.

    Can the second imageAtomicExchange just be an imageStore (assuming the image unity is 'coherent')? (possibly not if another thread's atomic exchange can overwrite it before reading it)

    Is there anything expensive in doWork() that can be moved outside the lock. For example in the above link, the lock quickly gets a memory location and returns the lock before going on to do some work and write to the memory location.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •