Part of the Khronos Group
OpenGL.org

The Industry's Foundation for High Performance Graphics

from games to virtual reality, mobile phones to supercomputers

Page 1 of 2 12 LastLast
Results 1 to 10 of 13

Thread: Are two OpenGL contexts still necessary for concurrent copy and render?

  1. #1
    Intern Newbie
    Join Date
    Nov 2008
    Posts
    43

    Question Are two OpenGL contexts still necessary for concurrent copy and render?

    Looking at http://on-demand.gputechconf.com/gtc...-Transfers.pdf
    Are the two contexts required? Will rendering not occur while the DMA transfer is proceeding, unless I do the upload in another thread, even with last-gen NVIDIA cards? If so, how does that make sense? It seems an artificial limitation, as the hardware obviously can handle it (even in the single copy engine consumer-level cards) if you have another thread.

    (If it matters, I'm using persistently mapped PBOs.)
    Last edited by Prune; 05-13-2014 at 04:15 PM.

  2. #2
    Junior Member Regular Contributor Agent D's Avatar
    Join Date
    Sep 2011
    Location
    Innsbruck, Austria
    Posts
    145
    The OpenGL(R) API is inherently single threaded. To issue OpenGL(R) commands, you need a context bound and the commands will operate on the currently bound context.
    A single context can only be bound to exactely one thread at any time and one thread can only have exactely one context bound.

    Except of course if you have the GLX_MESA_multithread_makecurrent extension or something similar.

    Concurrent loading from a second thread while the first keeps on rendering requires that the second thread has its own context and that the two contexts share resources.

  3. #3
    Intern Newbie
    Join Date
    Nov 2008
    Posts
    43
    Perhaps my question was not clear, or you did not look at the link. I didn't ask about multithreading but about asynchronous operation--these are different concepts, and the one-thread-per-GL-context is a red herring. A DMA transfer doesn't block a CPU thread because it only involves the CPU in triggering it, so there's no reason provided why it should need to be initiated in a different thread.

    The article notes that the last few generations of NVIDIA cards have copy engines, which allow DMA transfers (such as from mapped buffers) to proceed concurrently with the GPU rendering. However, it implies that the transfer is only asynchronous (as in, concurrent with rendering, on the GPU side--nothing to do with client threads) if initiated in another GL context. My question is why that is, and if it's still the case on latest generation hardware.

  4. #4
    Senior Member OpenGL Pro Aleksandar's Avatar
    Join Date
    Jul 2009
    Posts
    1,146
    Up to my knowledge it is still so, and let's try to answer why it is so...

    1. NV Dual Copy Engine is not free. Everything in the world has its price. There is an overhead in initialization and synchronization when NV Dual Copy Engine is activated. For the small transfers better result is achieved when it is off. So, by default it is off.

    2. Since NV Dual Copy Engine is off by default there should be a way to activate it if needed.

    3. There is no special command for turning on. Drivers use heuristics to figure out when to do that. The trigger is a transfer in a separate context (a special dedicated context just for transferring data).

    4. According to answer on the GeForce forum (because there is no official statement in any documentation), or device querying with CUDA API, Kepler in GeForce cards also has one copy engine. So, architecture is not changed.

    All I wrote is according to what I read. I'm not a driver developer. It would be nice if someone could confirm or correct my post (but it is not likely since no NV driver developer posted anything on this forum for couple of years).

    P.S. I, personally, would like to know what happens when NV Dual Copy Engine accesses texture currently used in drawing. It is probably not a regular case but it should be allowed (if not) when both engines access different locations. That's something that worked on SGI graphics workstations 16 years ago, I guess.
    Last edited by Aleksandar; 05-16-2014 at 12:23 PM.

  5. #5
    Intern Newbie
    Join Date
    Nov 2008
    Posts
    43
    I thought consumer cards have only a single copy engine, not dual--at least, only one active at a time--from http://www.nvidia.com/docs/IO/40049/Dual_copy_engines.pdf: "Having two separate threads running on a Quadro graphics card with the consumer NVIDIA Fermi architecture or running on older generations of graphics cards the data transfers will be serialized resulting in a drop in performance." First, I can't tell from this if they mean if the serialization is only if there are two transfers, or transfer/render is also serialized. Note that their example is overlapping upload/render/download.
    Now, I'm not interested in 3-way overlapping, just 2-way (upload/render). So I'm not asking about "Dual Copy Engine" being activated. I'm asking about one copy engine being activated. Is a second GL context/client thread still necessary? I can't tell from the information presented this far.

  6. #6
    Senior Member OpenGL Pro Aleksandar's Avatar
    Join Date
    Jul 2009
    Posts
    1,146
    Quote Originally Posted by Prune View Post
    I thought consumer cards have only a single copy engine, not dual--at least, only one active at a time--from http://www.nvidia.com/docs/IO/40049/Dual_copy_engines.pdf: "Having two separate threads running on a Quadro graphics card with the consumer NVIDIA Fermi architecture or running on older generations of graphics cards the data transfers will be serialized resulting in a drop in performance." First, I can't tell from this if they mean if the serialization is only if there are two transfers, or transfer/render is also serialized. Note that their example is overlapping upload/render/download.
    I'm sorry if I was not clear. Quadro cards have two DMA channels while GeForce cards and low-end Quadros have just one, but the principle is the same. Pre-Fermi cards have none. So, on pre-Fermi cards everything is serialized, on GeForce transfer and rendering can overlap (2-way overlapping), while on Quadros two transfers (upload and download) and rendering can overlap (3-way overlapping).

    Quote Originally Posted by Prune View Post
    Now, I'm not interested in 3-way overlapping, just 2-way (upload/render). So I'm not asking about "Dual Copy Engine" being activated. I'm asking about one copy engine being activated. Is a second GL context/client thread still necessary? I can't tell from the information presented this far.
    3-way overlapping is not possible on GeForce cards anyway. Considering the second context, it is as necessary today as it was at the moment of copy engine introduction. How can you tell the driver to activate separate DMA channel on a graphics card otherwise?

  7. #7
    Intern Contributor
    Join Date
    Nov 2011
    Posts
    51
    I don't think a second Context is a good choice because all sync operations are left to OpenGL/Driver and could be not a good way in some cases.
    I prefer to "waste" a bit of time and make small procedures/functions to use Async transfers (double object's buffer or orphaning).

    However some OpenGL "objects" are not shareable between contexts, so it could be a great limitation.

  8. #8
    Intern Newbie
    Join Date
    Nov 2008
    Posts
    43
    Quote Originally Posted by tdname View Post
    I prefer to "waste" a bit of time and make small procedures/functions to use Async transfers (double object's buffer or orphaning).
    Do you mean with GL_MAP_UNSYNCHRONIZED_BIT? That's driver to GPU asynchronous, but causes client thread and the driver server thread to synchronize: http://www.slideshare.net/CassEveritt/beyond-porting page 9 "It's quite expensive (almost always needs to be avoided)"

  9. #9
    Newbie Newbie
    Join Date
    Jan 2014
    Posts
    2
    Quote Originally Posted by Prune View Post
    page 9 "It's quite expensive (almost always needs to be avoided)"
    On nvidia drivers specifically. Others don't necessarily have that issue.

  10. #10
    Intern Newbie
    Join Date
    Nov 2008
    Posts
    43
    Does this upload overlapping copy engine trigger only apply to glTexSubimage()? What about (persistently) mapped buffer objects? As I use indirect rendering, I use possibly large buffer objects to store all transforms, materials, and other per-draw data. I'd like to know whether the copy engine/DMA transfer concurrent with rendering only applies to the texture upload and not buffer object upload, or the same dual context/thread trick would trigger concurrent uploading for buffer objects as well, or whether buffer objects already do DMA transfer concurrently even in the same context/thread as rendering (until glMemoryBarrier(GL_BUFFER_UPDATE_BARRIER_BIT)).
    ?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •