Name NV_occlusion_query Name Strings GL_NV_occlusion_query Contact Matt Craighead, NVIDIA Corporation (mcraighead 'at' nvidia.com) Notice Copyright NVIDIA Corporation, 2001, 2002. IP Status NVIDIA Proprietary. Status Shipping (version 1.0) Version NVIDIA Date: February 6, 2002 (version 1.0) Number 261 Dependencies Written based on the wording of the OpenGL 1.3 specification. Requires support for the HP_occlusion_test extension. Overview The HP_occlusion_test extension defines a mechanism whereby an application can query the visibility of an object, where "visible" means that at least one pixel passes the depth and stencil tests. The HP extension has two major shortcomings. - It returns the result as a simple GL_TRUE/GL_FALSE result, when in fact it is often useful to know exactly how many pixels passed. - It provides only a simple "stop-and-wait" model for using multiple queries. The application begins an occlusion test and ends it; then, at some later point, it asks for the result, at which point the driver must stop and wait until the result from the previous test is back before the application can even begin the next one. This is a very simple model, but its performance is mediocre when an application wishes to perform many queries, and it eliminates most of the opportunites for parallelism between the CPU and GPU. This extension solves both of those problems. It returns as its result the number of pixels that pass, and it provides an interface conceptually similar to that of NV_fence that allows applications to issue many occlusion queries before asking for the result of any one. As a result, they can overlap the time it takes for the occlusion query results to be returned with other, more useful work, such as rendering other parts of the scene or performing other computations on the CPU. There are many situations where a pixel count, rather than a boolean result, is useful. - If the visibility test is an object bounding box being used to decide whether to skip the object, sometimes it can be acceptable, and beneficial to performance, to skip an object if less than some threshold number of pixels could be visible. - Knowing the number of pixels visible in the bounding box may also help decide what level of detail a model should be drawn with. If only a few pixels are visible, a low-detail model may be acceptable. In general, this allows level-of-detail mechanisms to be slightly less ad hoc. - "Depth peeling" techniques, such as order-independent transparency, would typically like to know when to stop rendering more layers; it is difficult to come up with a way to determine a priori how many layers to use. A boolean count allows applications to stop when more layers will not affect the image at all, but this will likely be unacceptable for performance, with minimal gains to image quality. Instead, it makes more sense to stop rendering when the number of pixels goes below a threshold; this should provide better results than any of these other algorithms. - Occlusion queries can be used as a replacement for glReadPixels of the depth buffer to determine whether, say, a light source is visible for the purposes of a lens flare effect or a halo to simulate glare. Pixel counts allow you to compute the percentage of the light source that is visible, and the brightness of these effects can be modulated accordingly. Issues * Should we use an object-based interface? RESOLVED: Yes, this makes the interface much simpler, and it is friendly for indirect rendering. * Should we offer an entry point analogous to glTestFenceNV? RESOLVED: No, it is sufficient to have glGetOcclusionQueryivNV provide a query for whether the occlusion query result is back yet. Whereas it is interesting to poll fence objects, it is relatively less interesting to poll occlusion queries. * Is glGetOcclusionQueryuivNV necessary? RESOLVED: Yes, it makes using a 32-bit pixel count less painful. * Should there be a limit on how many queries can be outstanding? RESOLVED: No. This would make the extension much more difficult to spec and use. Allowing this does not add any significant implementation burden; and even if drivers have some internal limit on the number of outstanding queries, it is not expected that applications will need to know this to achieve optimal or near-optimal performance. * What happens if glBeginOcclusionQueryNV is called when an occlusion query is already outstanding for a different object? RESOLVED: This is a GL_INVALID_OPERATION error. * What happens if HP_occlusion_test and NV_occlusion_query usage is overlapped? RESOLVED: The two can be overlapped safely. Counting is enabled if we are _either_ inside a glBeginOcclusionQueryNV or if if GL_OCCLUSION_TEST_HP is enabled. The alternative (producing an error) does not work -- it would require that glPopAttrib be capable of producing an error, which would be rather problematic. Note that glBeginOcclusionQueryNV, not glEndOcclusionQueryNV, resets the pixel counter and occlusion test result. This can avoid certain types of strange behavior where an occlusion query's pixel count does not always correspond to the pixels rendered during the occlusion query. The spec would make sense the other way, but the behavior would be strange. * Does EndOcclusionQuery need to take any parameters? RESOLVED: No. Giving it, for example, an "id" parameter would be redundant -- adding complexity for no benefit. Only one query can be active at a time. * How many bits should we require the pixel counter to be, at minimum? RESOLVED: 24. 24 is enough to handle 8.7 full overdraws of a 1600x1200 window. That seems quite sufficient. * What should we do about overflows? RESOLVED: Overflows leave the pixel count undefined. Saturating is recommended but not required. The ideal behavior really is to saturate. This ensures that you always get a "large" result when you render many pixels. It also ensures that apps which want a boolean test can do one on their own, and not worry about the rare case where the result ends up exactly at zero from wrapping. That being said, with 24 bits of pixel count required, it's not clear that this really matters. It's better to be a bit permissive here. In addition, even if saturation was required, the goal of having strictly defined behavior is still not really met. Applications don't (or at least shouldn't) check for some _exact_ number of bits. Imagine if a multitextured app had been written that required that the number of texture units supported be _exactly_ two! Implementors of OpenGL would be greatly annoyed to find that the app did not run on, say, three-texture or four- texture hardware. So, we expect apps here to always be doing a "greater than or equal to" check. An app might check for, say, at least 28 bits. This doesn't ensure defined behavior -- it only ensures that once an overflow occurs (which may happen at any power of two), that overflow will be handled with saturation. This behavior still remains sufficiently unpredictable that the reasons for defining behavior in even rarely-used cases (preventing compatibility problems, for example) are unsatisfied. All that having been said, saturation is still explicitly recommended in the spec language. * What is the interaction with multisample, which was not defined in the original spec? RESOLVED: The pixel count is the number of samples that pass, not the number of pixels. This is true even if GL_MULTISAMPLE is disabled but GL_SAMPLE_BUFFERS is 1. Note that the depth/stencil test optimization whereby implementations may choose to depth test at only one of the samples when GL_MULTISAMPLE is disabled does not cause this to become ill-specified, because we are counting the number of samples that are still alive _after_ the depth test stage. The mechanism used to decide whether to kill or keep those samples is not relevant. * Exactly what stage are we counting at? The original spec said depth test; what does stencil test do? RESOLVED: We are counting immediately after _both_ the depth and stencil tests, i.e., pixels that pass both. This was the original spec's intent. Note that the depth test comes after the stencil test, so to say that it is the number that pass the depth test is reasonable; though it is often helpful to think of the depth and stencil tests as being combined, because the depth test result impacts the stencil operation used. * Is it guaranteed that occlusion queries return in order? RESOLVED: Yes. It makes sense to do this. If occlusion test X occurred before occlusion query Y, and the driver informs the app that occlusion query Y is done, the app can infer that occlusion query X is also done. For applications that do poll, this allows them to do so with less effort. * Will polling an occlusion query without a glFlush possibly cause an infinite loop? RESOLVED: Yes, this is a risk. If you ask for the result, however, any flush required will be done automatically. It is only when you are polling that this is a problem because there is no guarantee that a flush has occured in the time since glEndOcclusionQueryNV, and the spec is written to say that the result is only "available" if the value could be returned _instantaneously_. This is different from NV_fence, where FinishFenceNV can cause an app hang, and where TestFenceNV was also not guaranteed to ever finish. There need not be any spec language to describe this behavior because it is implied by what is already said. In short, if you use GL_PIXEL_COUNT_AVAILABLE_NV, you _must_ use glFlush, or your app may hang. * The HP_occlusion_test specs did not contain the spec edits that explain the exact way the extension works. Should this spec fill in those details? RESOLVED: Yes. These two extensions are intertwined in so many important ways that doing so is not optional. * Should there be a "target" parameter to BeginOcclusionQuery? RESOLVED: No. We're not trying to solve the problem of "query anything" here. * What might an application that uses this extension look like? Here is some rough sample code: GLuint occlusionQueries[N]; GLuint pixelCount; glGenOcclusionQueriesNV(N, occlusionQueries); ... // before this point, render major occluders glColorMask(GL_FALSE, GL_FALSE, GL_FALSE, GL_FALSE); glDepthMask(GL_FALSE); // also disable texturing and any fancy shading features for (i = 0; i < N; i++) { glBeginOcclusionQueryNV(occlusionQueries[i]); // render bounding box for object i glEndOcclusionQueryNV(); } // at this point, if possible, go and do some other computation glColorMask(GL_TRUE, GL_TRUE, GL_TRUE, GL_TRUE); glDepthMask(GL_TRUE); // reenable other state for (i = 0; i < N; i++) { glGetOcclusionQueryuivNV(occlusionQueries[i], GL_PIXEL_COUNT_NV, &pixelCount); if (pixelCount > 0) { // render object i } } * Is this extension useful for saving geometry, fill rate, or both? It is expected that it will be most useful for saving geometry work, because for the cost of rendering a bounding box you can save rendering a normal object. It is possible for this extension to help in fill-limited situations, but using it may also hurt performance in such situations, because rendering the pixels of a bounding box is hardly free. In most situations a bounding box will probably have more pixels than the original object. One exception is that for objects rendered with multiple passes, the first pass can be wrapped with an occlusion query almost for free. That is, render the first pass for all objects in the scene, and get the number of pixels rendered on each object. If zero pixels were rendered for an object, you can skip subsequent rendering passes. This trick can be very useful in many cases. * What can be said about guaranteeing correctness when using occlusion queries, especially as it relates to invariance? Invariance is critical to guarantee the correctness of occlusion queries. If occlusion queries go through a different code path than standard rendering, the pixels rendered may be different. However, the invariance issues are difficult at best to solve. Because of the vagaries of floating-point precision, it is difficult to guarantee that rendering a bounding box will render at least as many pixels with equal or smaller Z values than the object itself would have rendered. Likewise, many other aspects of rendering state tend to be different when performing an occlusion query. Color and depth writes are typically disabled, as are texturing, vertex programs, and any fancy per-pixel math. So unless all these features have guarantees of invariance themselves (unlikely at best), requiring invariance for NV_occlusion_query would be futile. For what it's worth, NVIDIA's implementation is fully invariant with respect to whether an occlusion query is active; that is, it does not affect the operation of any other stage of the pipeline. (When occlusion queries are being emulated on hardware that does not support them, via the emulation registry keys, using an occlusion query produces a software rasteriation fallback, and in such cases invariance cannot be guaranteed.) Another problem that can threaten correctness is near and far clipping. If the bounding box penetrates the near clip plane, for example, it may be clipped away, reducing the number of pixels counted, when in fact the original object may have stayed entirely beyond the near clip plane. Whenever you design an algorithm using occlusion queries, it is best to be careful about the near and far clip planes. * How can frame-to-frame coherency help applications using this extension get even higher performance? Usually, if an object is visible one frame, it will be visible the next frame, and if it is not visible, it will not be visible the next frame. Of course, for most applications, "usually" isn't good enough. It is undesirable, but acceptable, to render an object that wasn't visible, because that only costs performance. It is generally unacceptable to not render an object that was visible. The simplest approach is that visible objects should be checked every N frames (where, say, N=5) to see if they have become occluded, while objects that were occluded last frame must be rechecked again in the current frame to guarantee that they are still occluded. This will reduce the number of wasteful occlusion queries by a factor of almost N. It may also pay to do a raycast on the CPU in order to try to prove that an object is visible. After all, occlusion queries are only one of many items in your bag of tricks to decide whether objects are visible or invisible. They are not an excuse to skip frustum culling, or precomputing visibility using portals for static environments, or other standard visibility techniques. In general, though, taking advantage of frame-to-frame coherency in your occlusion query code is absolutely essential to getting the best possible performance. New Procedures and Functions void GenOcclusionQueriesNV(sizei n, uint *ids); void DeleteOcclusionQueriesNV(sizei n, const uint *ids); boolean IsOcclusionQueryNV(uint id); void BeginOcclusionQueryNV(uint id); void EndOcclusionQueryNV(void); void GetOcclusionQueryivNV(uint id, enum pname, int *params); void GetOcclusionQueryuivNV(uint id, enum pname, uint *params); New Tokens Accepted by the parameter of Enable, Disable, and IsEnabled, and by the parameter of GetBooleanv, GetIntegerv, GetFloatv, and GetDoublev: OCCLUSION_TEST_HP 0x8165 Accepted by the parameter of GetBooleanv, GetIntegerv, GetFloatv, and GetDoublev: OCCLUSION_TEST_RESULT_HP 0x8166 PIXEL_COUNTER_BITS_NV 0x8864 CURRENT_OCCLUSION_QUERY_ID_NV 0x8865 Accepted by the parameter of GetOcclusionQueryivNV and GetOcclusionQueryuivNV: PIXEL_COUNT_NV 0x8866 PIXEL_COUNT_AVAILABLE_NV 0x8867 Additions to Chapter 2 of the OpenGL 1.3 Specification (OpenGL Operation) None. Additions to Chapter 3 of the OpenGL 1.3 Specification (Rasterization) None. Additions to Chapter 4 of the OpenGL 1.3 Specification (Per-Fragment Operations and the Frame Buffer) Add a new section "Occlusion Tests and Queries" between sections 4.1.6 and 4.1.7: "4.1.6A Occlusion Tests and Queries Occlusion testing keeps track of whether any pixels have passed the depth test. Such testing is enabled or disabled with the generic Enable and Disable commands using the symbolic constant OCCLUSION_TEST_HP. The occlusion test result is initially FALSE. Occlusion queries can be used to track the exact number of fragments that pass the depth test. Occlusion queries are associated with occlusion query objects. The command void GenOcclusionQueriesNV(sizei n, uint *ids); returns n previously unused occlusion query names in ids. These names are marked as used, but no object is associated with them until the first time BeginOcclusionQueryNV is called on them. Occlusion queries contain one piece of state, a pixel count result. This pixel count result is initialized to zero when the object is created. Occlusion queries are deleted by calling void DeleteOcclusionQueriesNV(sizei n, const uint *ids); ids contains n names of occlusion queries to be deleted. After an occlusion query is deleted, its name is again unused. Unused names in ids are silently ignored. An occlusion query can be started and finished by calling void BeginOcclusionQueryNV(uint id); void EndOcclusionQueryNV(void); If BeginOcclusionQueryNV is called with an unused id, that id is marked as used and associated with a new occlusion query object. If it is called while another occlusion query is active, an INVALID_OPERATION error is generated. If EndOcclusionQueryNV is called while no occlusion query is active, an INVALID_OPERATION error is generated. Calling either GenOCclusionQueriesNV or DeleteOcclusionQueriesNV while an occlusion query is active causes an INVALID_OPERATION error to be generated. When EndOcclusionQueryNV is called, the current pixel counter is copied into the active occlusion query object's pixel count result. BeginOcclusionQueryNV resets the pixel counter to zero and the occlusion test result to FALSE. Whenever a fragment reaches this stage and OCCLUSION_TEST_HP is enabled or an occlusion query is active, the occlusion test result is set to TRUE and the pixel counter is incremented. If the value of SAMPLE_BUFFERS is 1, then the pixel counter is incremented by the number of samples whose coverage bit is set; otherwise, it is always incremented by one. If it the pixel counter overflows, i.e., exceeds the value 2^PIXEL_COUNTER_BITS_NV-1, its value becomes undefined. It is recommended, but not required, that implementations handle this overflow case by saturating at 2^PIXEL_COUNTER_BITS_NV-1 and incrementing no further. The necessary state is a single bit indicating whether the occlusion test is enabled, a single bit indicating whether an occlusion query is active, the identifier of the currently active occlusion query, a counter of no smaller than 24 bits keeping track of the pixel count, and a single bit indicating the occlusion test result." Additions to Chapter 5 of the OpenGL 1.3 Specification (Special Functions) Add to the end of Section 5.4 "Display Lists": "DeleteOcclusionQueriesNV, GenOcclusionQueriesNV, IsOcclusionQueryNV, GetOcclusionQueryivNV, and GetOcclusionQueryuivNV are not complied into display lists but are executed immediately." Additions to Chapter 6 of the OpenGL 1.3 Specification (State and State Requests) Add a new section 6.1.13 "Occlusion Test and Occlusion Queries": "The occlusion test result can be queried using GetBooleanv, GetIntegerv, GetFloatv, or GetDoublev with a of OCCLUSION_TEST_RESULT_HP. Whenever such a query is performed, the occlusion test result is reset to FALSE and the pixel counter is reset to zero as a side effect. Which occlusion query is active can be queried using GetBooleanv, GetIntegerv, GetFloatv, or GetDoublev with a of CURRENT_OCCLUSION_QUERY_ID_NV. This query returns the name of the currently active occlusion query if one is active, and zero otherwise. The state of an occlusion query can be queried with the commands void GetOcclusionQueryivNV(uint id, enum pname, int *params); void GetOcclusionQueryuivNV(uint id, enum pname, uint *params); If the occlusion query object named by id is currently active, then an INVALID_OPERATION error is generated. If is PIXEL_COUNT_NV, then the occlusion query's pixel count result is placed in params. Often, occlusion query results will be returned asychronously with respect to the host processor's operation. As a result, sometimes, if a pixel count is queried, the host must wait until the result is back. If is PIXEL_COUNT_AVAILABLE_NV, the value placed in params indicates whether or not such a wait would occur if the pixel count for that occlusion query were to be queried presently. A result of TRUE means no wait would be required; a result of FALSE means that some wait would occur. The length of this wait is potentially unbounded. It must always be true that if the result for one occlusion query is available, the result for all previous occlusion queries must also be available at that point in time." GLX Protocol Seven new GL commands are added. The following two rendering commands are sent to the server as part of a glXRender request: BeginOcclusionQueryNV 2 8 rendering command length 2 ???? rendering command opcode 4 CARD32 id EndOcclusionQueryNV 2 4 rendering command length 2 ???? rendering command opcode The remaining fivecommands are non-rendering commands. These commands are sent separately (i.e., not as part of a glXRender or glXRenderLarge request), using the glXVendorPrivateWithReply request: DeleteOcclusionQueriesNV 1 CARD8 opcode (X assigned) 1 17 GLX opcode (glXVendorPrivateWithReply) 2 4+n request length 4 ???? vendor specific opcode 4 GLX_CONTEXT_TAG context tag 4 INT32 n n*4 LISTofCARD32 ids GenOcclusionQueriesNV 1 CARD8 opcode (X assigned) 1 17 GLX opcode (glXVendorPrivateWithReply) 2 4 request length 4 ???? vendor specific opcode 4 GLX_CONTEXT_TAG context tag 4 INT32 n => 1 1 reply 1 unused 2 CARD16 sequence number 4 n reply length 24 unused n*4 LISTofCARD322 queries IsOcclusionQueryNV 1 CARD8 opcode (X assigned) 1 17 GLX opcode (glXVendorPrivateWithReply) 2 4 request length 4 ???? vendor specific opcode 4 GLX_CONTEXT_TAG context tag 4 CARD32 id => 1 1 reply 1 unused 2 CARD16 sequence number 4 0 reply length 4 BOOL32 return value 20 unused 1 1 reply GetOcclusionQueryivNV 1 CARD8 opcode (X assigned) 1 17 GLX opcode (glXVendorPrivateWithReply) 2 5 request length 4 ???? vendor specific opcode 4 GLX_CONTEXT_TAG context tag 4 CARD32 id 4 ENUM pname => 1 1 reply 1 unused 2 CARD16 sequence number 4 m reply length, m=(n==1?0:n) 4 unused 4 CARD32 n if (n=1) this follows: 4 INT32 params 12 unused otherwise this follows: 16 unused n*4 LISTofINT32 params GetOcclusionQueryuivNV 1 CARD8 opcode (X assigned) 1 17 GLX opcode (glXVendorPrivateWithReply) 2 5 request length 4 ???? vendor specific opcode 4 GLX_CONTEXT_TAG context tag 4 CARD32 id 4 ENUM pname => 1 1 reply 1 unused 2 CARD16 sequence number 4 m reply length, m=(n==1?0:n) 4 unused 4 CARD32 n if (n=1) this follows: 4 CARD32 params 12 unused otherwise this follows: 16 unused n*4 LISTofCARD32 params Errors The error INVALID_VALUE is generated if GenOcclusionQueriesNV is called where n is negative. The error INVALID_VALUE is generated if DeleteOcclusionQueriesNV is called where n is negative. The error INVALID_OPERATION is generated if GenOcclusionQueriesNV or DeleteOcclusionQueriesNV is called when an occlusion query is active. The error INVALID_OPERATION is generated if BeginOcclusionQueryNV is called when an occlusion query is already active. The error INVALID_OPERATION is generated if EndOcclusionQueryNV is called when an occlusion query is not active. The error INVALID_OPERATION is generated if GetOcclusionQueryivNV or GetOcclusionQueryuivNV is called where id is not the name of an occlusion query. The error INVALID_OPERATION is generated if GetOcclusionQueryivNV or GetOcclusionQueryuivNV is called where id is the name of the currently active occlusion query. The error INVALID_ENUM is generated if GetOcclusionQueryivNV or GetOcclusionQueryuivNV is called where pname is not either PIXEL_COUNT_NV or PIXEL_COUNT_AVAILABLE_NV. The error INVALID_OPERATION is generated if any of the commands defined in this extension is executed between the execution of Begin and the corresponding execution of End. New State (table 6.18, p. 226) Get Value Type Get Command Initial Value Description Sec Attribute --------- ---- ----------- ------------- ----------- ------ --------- OCCLUSION_TEST_HP B IsEnabled FALSE occlusion test enable 4.1.6A enable OCCLUSION_TEST_RESULT_HP B GetBooleanv FALSE occlusion test result 4.1.6A - - B GetBooleanv FALSE occlusion query active 4.1.6A - CURRENT_OCCLUSION_QUERY_ID_NV Z+ GetIntegerv 0 occlusion query ID 4.1.6A - - Z+ - 0 pixel counter 4.1.6A - New Implementation Dependent State (table 6.29, p. 237) Add the following entry: Get Value Type Get Command Minimum Value Description Sec Attribute -------------------------- ---- ----------- ------------- ---------------- ------ -------------- PIXEL_COUNTER_BITS_NV Z+ GetIntegerv 24 Number of bits in 6.1.13 - pixel counters Revision History none yet