PBO perf problem

Hi !

I am using PBOs to upload textures.

On NVidia HW i get lower performance with PBOs that using old fasioned upload ??

How come ? What kind of strategy shall you use when using PBOs. Do you need to check formats etc ?

Yes, the input format is important.
Read this: http://developer.nvidia.com/object/fast_texture_transfers.html

Do you need to set up a “feature” matrix for each vendor and driver version to use PBO or do you need to run benchmarks on each tested HW ?

I downloaded a sample from nvidias website that renders PBO etc and you can compare with normal operations as well and I can clearly see ahuge perf drop in “non” normal PBO usage compared to using common old opengl texture uploads.

I have also been runnig on some ATI HW on both PC and OSX and i can see huge differencies in how various HW have “perf drops” in certain situations.

my initial assumption was that PBO is always faster than the old way but this is clearly wrong ??

Originally posted by ToolTech:
my initial assumption was that PBO is always faster than the old way but this is clearly wrong ??
It depends on what you do with it, uploading regular textures should theoreticly have no benefit as it’s the same underlying system operations, the same as with copying textures, the operations that benefit from this are those that go GPU->CPU->GPU, like moving render data to a VBO.

When app want to upload some texture data w/o PBO, driver must allocate video memory then copy texture data. When GPU start executing uploading command it will copy data from video memory into “texture memory”. So… there is a “hidden” copy behind glTexSubImage2d call.

With PBO, driver can allocate video memory and app can access to that memory. App copy data to this memory and call glTexSubImage2d. GPU will execute glTexSubImage2d as usual.

So… imagine following scenario, loading JPG as texture.
w/o PBO: App decode JPG into sys memory, then call glTexImage2D, then driver allocate video memory, copy texture data and finally GPU execute command and copy data from video memory to texture memory:
with PBO: App ask driver for chunk of PBO memory. App force JPG decoding library to decode image directly in PBO memory. App call gltexImage2D. When GPU execute gltexImage2D command it will copy image from video memory to texure memory.

With PBO there is a one copy operation less, and driver will not spent time for memory managmnt.

Another performace hit might be textue format. If you can, choose most suitable format to avoid conversion during glTexImage call. So… on NV hardware use BGR or BGRA format.

PBO performaces on NV hardware is very good. In may app, I can upload HD (1920x1080 RGB) image into texture, apply 3d effect, convert RGB to YUV422 and readback result in HD resolution and sent it to video out card in realtime. All this on P4 dual core 3Ghz + NV7600GT.

I agree with yooyo. I do the same thing for effect in my scene graph using video, and i can see a tremendous gain on most HW BUT in some cases i get worse performance and it is this that annoys me !!! I get LOWER performance using PBOs on NV hardware 6800 GT for various pixel formats. On my ATI OSX system i get soo much more performance using PBOs compared to not to.

So my question remains. Do you need a CrossRef LUT for different vendors and drivers/HW to know if you should use PBOs ??

PBO just eliminate copying of texture data. But, if you use some pixel transfer function to ajdust colors based on LUT, this is done (in most cases) on CPU and this lead to slow performance.

So… upload your video frame “as is” and adjust colors using LUT texture in shader.

I want use R2VB in OpenGL, but i don’t find information for this. somebody can help me? some site explaining the correct use of this technique.

I don’t use any LUT in that sense. I ment that I needed a lookup table or dictionary whether to use PBOs or not depending on hardware, texture format and driver version. I upload raw data.

Ok. Here are some results…

On My 7800 i get tremendous PBO accel on BGRA,FP16 formats but not that much on RGB or RGBA 8 (about the same as for glTex)

On my 6800 i get good accel on BGRA,FP16 but not very much more than glTex. HOWEVER on RGB and RGBA formats i get very low performance on PBO but normal on glTex. This is not the case on my ATI hw where I get good accel always compared to glTex.

So my conclusion is that on nv hardware 6800 do NOT use PBO for RGB and RGBA. On ATI hw use it all the time and on nv 7800 use it for BGRA and FP16 where it runs like lihning (3x) faster than glTex.

What do you think about my conclusion ???

@ToolTech:
Yes, you are right. NVidia hw doesnt like RGB or RGBA. They prefer BGR or BGRA.

@yalmar:
Create rgba32f fbo, then render something in that buffer, create PBO, readback result in PBO, then rebind that PBO as VBO, map vertex pointers and continue redering as usuall.

Originally posted by ToolTech:
What do you think about my conclusion ???
What about DMA scheduling? :stuck_out_tongue:

I think most of these issues are covered in the NVIDIA performance docs and the PBO spec, especially regarding formats. PBO is strictly a performance enhancement, and should be treated as such (with healthy skepticism and careful testing).

Which ATI cards do support PBO? Can’t find any references about PBO and ATI (no ATI cards are listed at http://www.delphi3d.net/hardware/index.php for example)…

Originally posted by Hampel:
Which ATI cards do support PBO? Can’t find any references about PBO and ATI (no ATI cards are listed at http://www.delphi3d.net/hardware/index.php for example)…
No ATI cards support PBOs.

But how do I have to interpret ToolTech’s quote:

So my conclusion is that on nv hardware 6800 do NOT use PBO for RGB and RGBA. On ATI hw use it all the time and …

Originally posted by RickA:
No ATI cards support PBOs.
All ATI cards with HW TCL support PBO on Mac OS X.

(Is Delphi up to date?)