I remember reading that mat4x3 takes 1 more register than mat3x4 as it stores it as 4 columns of vec3. At least i can’t seem to find anywhere to confirm this anymore. Has this changed in the spec at all? Does it automatically store it in 3 as well or do i have to use mat3x4 instead?
Also:
// assuming i believe these do the samething...
mat3x4 a;
result = transpose(a) * vec4(somevalue, 1); // better as maintains "order"
result = vec4(somevalue, 1) * a; // similar performance as above?
Just tried it, cross-compiling GLSL to NV assembly (gp5vp profile), and it looks like what you say is the case. mat4x3 takes 4 uniform slots. mat3x4 takes 3. Which makes intuitive sense. GLSL is column major by default, so it’s all about the number of column vectors.
In a test I just did, passing in a mat4x3, postmultiplying it by a vec4 directly, and outputting the vec3 from the shader consumes 8 instructions (w/ 2 R-Regs). However, if I pass in a mat3x4, transpose it, and postmultiply that by the vec4 to output a vec3, I get 19 instructions (3 R-regs). Lots of extra moves. So a penalty of 11 instructions and 1 R-reg to eliminate use of one uniform slot while keeping the v2 = A*v multiplication order.
You can try using row_major. Then ideally it should be all about the number of rows.
// assuming i believe these do the samething…
mat3x4 a;
result = transpose(a) * vec4(somevalue, 1); // better as maintains “order”
result = vec4(somevalue, 1) * a; // similar performance as above?
Yep. The cost of the latter is 8 instructions (w/ 2 R-reg), so that’s definitely one way to go. As I said, here the former consumes considerably more assembly instructions, and so prob isn’t your best bet. But check into this and see.