loops and arrays: performance cost?

Recently, while trying to clean up shader code, I turned a big mass of repetitive code into a loop and an array, and my performance absolutely TANKED. Unrolling the loop again, but keeping the array, got it back to around normal, although it still seemed a little slower. My render speed fluctuates quite a bit anyway, so it might’ve just been noise, but this brings me to my questions:

[ul]
[li]Are loops just slow, or was I using them wrong? Should I always unroll when I can?
[/li][li]Is creating a const array and indexing into it with a const index the same as having several variables and using the correct one “manually”?
[/li][/ul]

Thanks!

My guess would be that a loop includes a condition which needs to be evaluated. This can take its time and will be a lot of additional work to your GPU.

That makes sense, it just makes me sad that it didn’t get unrolled. Although I’m realizing I may have been trickier with the iterator incrementing than I should have been.

Does anybody know if a really obviously unrollable loop will get unrolled, e.g.


for(int i = 0; i < 36; ++i) {
  ...
}

You can always just try to see if the performance changes.
By the way, shaders are compiled every time you start your application. This means that on different hardware the loop might not get unrolled although it is unrolled on certain other hardware.
This difference in behavior should better be avoided, so I suggest you unroll it by hand just to be 100% sure.

[QUOTE=Cornix;1261526]You can always just try to see if the performance changes.
By the way, shaders are compiled every time you start your application. This means that on different hardware the loop might not get unrolled although it is unrolled on certain other hardware.
This difference in behavior should better be avoided, so I suggest you unroll it by hand just to be 100% sure.[/QUOTE]

Good points, thanks.

I’ve discovered that (at least on my intel graphics card with mesa/dri drivers), using an if is SIGNIFICANTLY faster than using a const array (the difference between ~100FPS and oh-god-why FPS with ~40k triangles), if you only actually use one element every time the shader is run. Since it’s still faster to just pass data in directly (which hugely bloats the amount of data I’m storing in VRAM), my next try is going to be loading the array into a uniform and indexing into that.