is freezing the meshes inside Maya a good choice?

Hi there
As you well know, Maya software has an option to freeze the meshes. It multiplies the vertexes by the transformations and computes the world space position of each vertex. So there’s no need to multiply those transformations on the fly.So is it a good choice to freeze the meshes inside the Maya software?

If you want world-space in Maya to be exactly world-space in your software, and you don’t want to move the object ever from that location, and your world space is not big enough to cause problems, then this is fine.

Or, to put it another way, don’t do this. It doesn’t get you anything. You still need to transform the vertices by at least one matrix in the vertex shader, so it doesn’t save you any performance. And it makes working with that mesh more difficult, since it’s in world space now.

If it helps you to render a bigger batch, it is a good idea, otherwise not.

Yes, You’re correct. I always transform the vertexes inside my vertex shader, So freezing the meshes is not a good idea for me.

Alfonse, you’ve forgotten the cost of switching from one batch to another. There may be a VBO switch or a VAO switch involved, so while you always need to do a transform, you might not need to switch from one batch to another, if you “freeze” the geometry. At least the batch, batch, batch paper says so.

Or maybe I’ve gotten something wrong again?

I always prefer to use global space coordinates for all the static only scenes, mainly if I need to do collision detection (calculations are then faster).

ugluk: isn’t it the same thing, with or without batches ? if you have 1 batch or 10 batches, whether you need to do transform or not you’ll have to bind to new vbo for each. Or did I missed what you said ?

What I wanted to say is, that it may be possible to reduce the number of batches (say from 10 to 5), if the geometry is frozen, hence requiring less VBO or VAO binds for the same scene.

At the very least a new matrix has to be uploaded, which is cheap, but requires another draw call and smaller batch size.

There may be a VBO switch or a VAO switch involved, so while you always need to do a transform, you might not need to switch from one batch to another, if you “freeze” the geometry. At least the batch, batch, batch paper says so.

That doesn’t make any sense. If you have multiple objects, where the only possible state difference between them is what transform they use (which is why “freezing” it would mean that there is no state difference between them), then by definition, the objects all use the same vertex format.

Also, the objects are all coming from the same Maya file. So your exporting/asset conditioning pipeline has access to all of the same data.

Which means that there is no reason why you couldn’t just put them in the same buffer object(s) anyway. So there would be no need for a VAO or VBO switch between glDraw* calls. Only the necessary glUniform changes needed for the transforms.

Also, if you’ve read the “batch, batch, batch” paper, you know that it specifically refers to Direct3D. It also specifically mentions that NVIDIA’s OpenGL implementation works rather differently. Namely, that batch size, while important, isn’t everything the way it is for D3D. This doesn’t mean that batching is unimportant, but it does mean that you should benchmark before optimizing.

Also, the most advanced card mentioned in the paper is the GeForceFX. Which was, what, 5+ years ago?

Lastly, don’t forget that freezing geometry is nothing you couldn’t do in your asset conditioning pipeline if it does in fact help performance. Having the modellers freeze geometry makes it more difficult for them to work. There’s no reason to force them to do something you couldn’t do yourself automatically.

You can have many instances of the same object in one Maya file. After you export, you can instantiate the objects yourself (along with other objects) or you could batch a number of them together. If you instantiate you need to upload separate modelview matrices for each instance. True, there is a nice glDrawElementsInstanced() call, but we don’t know if it is available. The main thing I want to say is: you don’t need to follow Maya’s object boundaries, but if you don’t freeze you have to.

AFAIK, batching did not lose in importance over the years. I am still reading about the topic.

You can have many instances of the same object in one Maya file. After you export, you can instantiate the objects yourself (along with other objects) or you could batch a number of them together.

What does that have to do with freezing things? Unless the modeler replicated a bunch of objects to make the overall model (which isn’t how modelers model), that won’t help. Plus, freezing objects removes instancing information. It bakes the meshes directly into the scene without transforms.

AFAIK, batching did not lose in importance over the years.

Except that batching, as the document in question clearly indicates, wasn’t of paramount importance for OpenGL to begin with. Generally, when NVIDIA talks OpenGL vertex transfer performance, they’re emphasizing the number of binds of buffer objects (and the use of NV_vertex_buffer_unified_memory). They don’t talk about the need to make fewer glDraw* calls (ie: batching).

Except that batching, as the document in question clearly indicates, wasn’t of paramount importance for OpenGL to begin with.

How can you possibly make a statement like this? There are huge numbers of different GPUs now. How could this statement apply to them all? Maybe what it is true for NVIDIA GPUs, but to generalize is wrong IMO.

How can you possibly make a statement like this? There are huge numbers of different GPUs now. How could this statement apply to them all? Maybe what it is true for NVIDIA GPUs, but to generalize is wrong IMO.

It may not apply to all of them. However, there is no evidence that batching is of paramount importance on any OpenGL implementation.

Let’s define some terms here. By “batching,” I mean specifically optimizations based on reducing the number of glDraw* calls issued. And by “paramount importance,” I mean exactly that: paramount importance. That is, the most overriding concern with regard to vertex transfer. More important than buffer object binding, vertex formats, or anything else.

If batching were of “paramount importance”, then you would easily be able to justify going to all kinds of lengths to reduce the number of batches sent. Even if it means that you increase the granularity of culling, so that you don’t cull as much. Even if it means that sometimes, you have to have empty vertex attributes for certain meshes that get rendered alongside certain others. Even if it means resorting to uber-shaders. And so forth.

The document in question indicates that batching is of paramount importance for D3D8/9 on that hardware. The document just as clearly indicates that batching is not of paramount importance for OpenGL on that hardware. The document suggest going to great lengths to increase batch size in D3D. It does not suggest the same for OpenGL.

Note that there is a big difference between “not of paramount importance” and “not important”.

All right, this has gotten ridiculous. You’re muddying simple performance optimization here with silly posturing.

Fact: When optimizing a pipelined system, always optimize the bottleneck.

Fact: DX9 was much more expensive to submit batches to than GL at the time, meaning you had to optimize for this more often on DX9 than OpenGL. The Batch, batch, batch paper speaks to this era.

Fact: You could still hit scenarios where batch submission was the bottleneck under OpenGL then.

Fact: You can “still” hit scenarios where batch submission is the bottleneck under OpenGL today (more often, since CPU cores have “hit the wall”).

This “paramount importance” rhetoric is ridiculous. Please stop it. Batching is one of the most important factors for GPU performance. And it’s becoming more important all the time.

If batching were of “paramount importance”, then you would easily be able to justify going to all kinds of lengths to reduce the number of batches sent.

Yeah, you do, if your trying to pump the max geometry in a given frame time. It consumes valuable developer time, is annoying to have to mess with, but nevertheless a necessary evil nowadays for max performance.

This is one of the cool things about NV bindless. Just use it, and instantly there’s less batch-batch-batch “head banging” required. It just renders faster, without as much repacking shenanigans, by getting rid of pointless CPU work in the driver.

For you to say this, I suspect your app(s) aren’t quite as demanding.

Thanks a lot Dark Photon for coming to the rescue, like always. Let me use this opportunity to ask about the most important things for perf. For now I only have this list:

  • minimize various buffer switches (using bindless, for example, or VBO caching)
  • batch (not just because of the CPU bottleneck, but also because of the expensive texture and shader switches and other state switches)
  • ?

Is the ? already in the algorithmic domain, because the switches and batching are in the “API” domain?

Is the ? already in the algorithmic domain, because the switches and batching are in the “API” domain?

No, definitely not after just these two points.

Consider: with your two points, we’re talking about how we can just drive the API/GPU differently to obtain the same visual result (same triangles, same textures on the triangles, etc.) with potentially different performance (preferably “better” performance :slight_smile: ).

There are lots of other simple optimizations in that category before you get to more complex algorithmic rework. For instance, using state tracking to eliminate needless state switching, sorting your rendering work by “expensive” state changes before submission, using indexed primitives and optimizing triangle order to reduce vertex transform work on the GPU, increasing batch sizes through a number of techniques, and so on. Lots of these, and SIGGRAPH/GDC perf presentations do a great job of rounding these up, with vendor docs providing more insights, so I won’t even try listing them. It helps to classify them into what pipeline stage they help optimize, because again, it can not only be fruitless but counterproductive to try optimizing a stage you aren’t bottlenecked on.

Then there are typically some mid-ground things you can do, such as precull/pregenerate/prerender some things to eliminate useless or duplicate work for each view or frame (this is a balancing act based on CPU and GPU perf). And this all doesn’t even touch on major algorithm changes, which are also an option.