[QUOTE=mhagain;1261591]If you’ve got so much data that you run out of video memory, there are other ways of dealing with the situation.
From the mention of blocks it looks like you’re writing a Minecraft clone/voxel engine, so that can make things really easy.
If you want normals, and if you’re drawing blocks so each vertex of each face has the same normal, then instead of including them in your vertex data you calculate them in a geometry shader.
If you don’t care about block rotation you can use instancing. Here you store the full fat vertex for only a single block and then for each block you draw the data is reduced to 4 floats: position and size (if you’re not drawing cubes you’ll need 6 floats per block).
On the other hand if performance falls off a cliff you shouldn’t just assume it’s due to memory usage. You may be doing something else that’s triggering a slow path in your driver, maybe even a software emulated path.[/QUOTE]
What’s the advantage of calculating normals in a geometry shader over in the vertex shader? Aren’t geometry shaders less performant and less well-supported? That’s what I’m doing now - the vertex shader I posted above just gets the normal from a uniform array, which I assume is faster than branching, though I haven’t investigated extensively (I’d really hope it’s faster than branching, since branching definitely shouldn’t be the most performant option). Such small performance differences in the vertex shader probably aren’t going to be a bottleneck either way, I suppose. At any rate, I want to apply a similar technique for the rest of the duplicated block data, and there should absolutely be a performant way to do that.
Instancing is definitely on the list of things to do, but I’m not intent on limiting myself to cubes; slopes will be included in the blocks in the future, which greatly reduces the opportunity for, and advantage of, instancing.
As for the performance tanking, I suspect VRAM usage because simply increasing the buffer capacity was enough to cause the dramatic FPS reduction as well.
Edit: You’re right, at this point, it’s probably worth just throwing the block data into a VBO and having an index buffer that looks like (0,0,0,…0,1,1,1,…1,2,2,2,…) . It bothers me immensely, but it seems like optimizing further is going to take much more work than it’s worth at this point, and that getting an “optimal” result (i.e. one that takes full advantage of the assumptions we can make because of my use case, like not even bothering to fetch the block type again for the next vertex of a block) might require hacking around OpenGL altogether using something like OpenCL. Again, WAY work than it’s worth for the yield it would bring at this point. Plus, there’s an upper bound on how many things I should even be trying to render; good optimizations at the game logic level might completely eliminate my risk of maxing out VRAM, who knows?