Core requires more streamlined assets

Dear All,

I recently started making an OpenGL 2.1 framework more core like by removing uses of features which are deprecated. The framework is educational, and I would like it to be helpful to students interested in mobile graphics. Hence the need for change.

Actually, materials, lights, transformations etc. were simple to deal with since the framework is already shader based. However, display lists I cannot get rid of. Te thing is that I load an object from an obj file which uses 23 materials, 12 textures, and has 60 usemtl instructions.

Obviously I do not want 60 draw calls, so in the end I compiled one display list for all geometry which calls nested display lists that contain state changes. The geometry list is created just once. The nested material lists are created for every time the shader changes (actually every time the object is drawn). This works well, and it takes into account the issue that I cannot create a single display list which works with all shaders since they have different uniform bindings.

However, it is not very core’ish. My assumption is that for core GL I need to fix the asset. For instance make a material array and combine the textures into a texture atlas or a texture array or fire up a content creation tool and fix the assets there. Banal perhaps but important to point out to learners :slight_smile:

In any case it is ok, but what it means is that OpenGL compatibility is far more tolerant of ill structured assets. Just curious whether you agree with this or whether you feel I overlook something?

  • Andreas

Hi,

you still can draw the model with 23 draw calls after re-arranging the geometry (as you only have 23 materials) to only use 23 ‘usemtl’ instructions. In between you will have to change shaders, textures, etc, but the driver would probably do the same thing in a displaylist.
So yes, in core (or OpenGL ES) you have to take care of such low level stuff and you can clearly see which types of assets better fits the programming model / helps to reduce state changes.

Thanks. You are right, if I use a VBO and vertex arrays I would probably get similar performance with 23 draw calls as now where I have to make 23 nested display lists. However, it requires state sorting, more setup and is still not as efficient as fixing the asset.

Essentially, the rub is that array textures have to be of the same dimension. Otherwise I could “vectorize” the materials and have them all on the GPU at once.

I am not trying to say that compatibility is better than core - but simply trying to verify that the “right core approach” is to fix the asset and that there is no way to deal with poorly optimized assets as conveniently in core as in compatibility. Nor does there have to be!

I’m curious as to what state changes you need between each material and why the textures would have to be same dimension.
Can you explain in a bit more detail.

Right. I have an object which is divided into a set of meshes that have different materials. Materials have properties such as specular and diffuse reflectance and also texture maps which I use to modulate the diffuse reflectance.

The fundamental issue is that in core OpenGL one cannot merge all of these meshes and draw in a single go since each mesh has its own material that I need to set using uniforms.

The biggest problem is textures since one could use uniform arrays for the material reflectance parameters, but I think it would be a bad idea - and sometimes impossible - to allocate a texture unit for each material. Texture arrays could be a solution but they require the individual slices to be of same size - hw limitation, I guess.

Thus, we either need to switch materials while drawing or somehow merge materials or at least textures. My conjecture is simply that the former strategy works well when using compatibility and the second appears vastly preferable when using core.

I agree that materials and uniforms need to be set for each mesh. It’s therefore best to organise the model by material to minimise the number of switches required. That’s what I do for my obj models.
However binding textures per mesh/draw call is a low cost option compared to switching shader. I render an obj model with many materials ( perhaps over 30) and bind albedo, normal and Specular maps for each mesh/material. I don’t think that this the bottle neck of the rendering process so it isnit on my list to optimise. I don’t think that other texture binding schemes are applicable as texture arrays have dimension constraints which would limit the assets used for the model.
I can’t see any other approach which would render the model any more efficiently and not also impose a rigid constraint. Fixed function or shader based core profile.

The fundamental issue is that in core OpenGL one cannot merge all of these meshes and draw in a single go since each mesh has its own material that I need to set using uniforms.

Then you need to ask yourself a question: do you believe that the driver that do that when you cannot?

Just because you package a bunch of stuff in a display list does not mean that executing that display list happens “in a single go”. The driver simply has to implement the various texture swapping, material changes, etc and issue multiple draw calls.

Now granted, it may be able to do so more efficiently than you can (due to not having to error test for certain conditions and so forth). But that doesn’t mean that it isn’t issuing multiple draw calls internally and incurring some overhead for doing so.

My concern is that you seem to be wanting to optimize something that may not be a problem. You haven’t profiled the manual version yet, so you don’t know if it’s fast enough or not.

Try it manually and see what performance you get. Use some basic performance techniques (sorting by state changes and such), and see what you get. If it’s too slow, then you can try things like array textures/texture atlases, uniform arrays, and so forth.

Actually, the driver cannot. Now, I have timed it, and - yes - display lists are much slower than vertex arrays in VBO’s.

I am measuring FPS after the framerate has stabilized. The scene is a terrain and three objects which are drawn either using

  • begin end calls in a display list (59 lines) or
  • vertex arrays in a display list (56 lines) or
  • vertex arrays in vertex buffer objects (63 lines).
    The shader does simple forward rendering with per-pixel Phong shading. The timings are as follows:

dpy list (begin-end) ~ 277 FPS
dpy list (vertex arrays) ~ 280 FPS
VBO (vertex arrays) ~ 428 FPS

on my ancient Macbook Pro with a Geforce 8600M GT.

So, VBOs win hands down! I readily admit that the comfortable margin surprises me a bit since, as mentioned, there are 60 calls to glDrawElements for one of the objects … versus a single call glCallList. I was thinking display lists would have an edge because there is much less CPU-GPU communication, but, as Alfonse points out, I don’t know what the driver actually does.

So, clearly, I could re-index the whole thing and throw in state sorting making the VBO version even faster.

In any case, my point was that OpenGL compatibility has an edge when it comes to complicated assets. To some extent that edge is a bit dull since nested display lists do not appear to be much faster than even a big amount of draw calls. Quite the opposite.

When it comes to convenience … well about the same number of lines are needed.

In point of fact, the rendering is a bit wrong for the vertex array based methods since they use indexed arrays but the texture coordinates have different indices from the vertices. I could solve that by de-indexing the geometry.

To some extent I still believe that compatibility is more convenient for dealing efficiently with poorly crafted assets. However, when you really try the differences are a bit less than expected at first.

I attach the file with the three versions of the rendering code if anyone is interested.

Actually, the driver cannot. Now, I have timed it, and - yes - display lists are much slower than vertex arrays in VBO’s.

I’m curious to know what hardware and drivers you’re running. That’s a pretty surprising result. Even on AMD hardware, buffer objects should only be approximately equal to display lists.

I attach the file with the three versions of the rendering code if anyone is interested.

What is it with people and these PDF attachments? When did people stop using plain-text, that’s easy to download, view, copy, and run?

I’m curious to know what hardware and drivers you’re running. That’s a pretty surprising result. Even on AMD hardware, buffer objects should only be approximately equal to display lists.

It is an NVIDIA 8600M GT in a Macbook Pro with an Intel Core 2 Duo 2,4 GHz. It runs OSX 10.6.8. The renderer info reports the version as 2.1 NVIDIA-1.6.36.

I have investigated a bit more. The framework also draws a terrain of 7200 triangles. I can draw it in two modes:
-As a single triangle strip (using degenerate triangles) drawn using an unindexed vertex array in a VBO.
-As 60 unindexed triangle strips stored in a single display list.

if I draw just the terrain I get 1577 fps in both modes (with some variation). So it depends, clearly, on the precise type of rendering.

Also, if I render the scene but remove the expensive object that really was what motivated this thread the nubers are pretty close. 1225 FPS with VBO and 1162 with display lists.

What is it with people and these PDF attachments? When did people stop using plain-text, that’s easy to download, view, copy, and run?

I can only speak for myself, but I also prefer plain text - but plan text attachments are apparently not supported. Guess I should have used a code block. It just seemed a bit long.

/andreas