Using Uniform Buffer Objects (UBO) in Fragment Shaders

Hello,

recently I’ve being toying with UBOs to pass light data (position, color, radius) to my fragment shader. After getting terrible performance results I’m starging to wonder if what I did makes sense at all.
Maybe someone here has more insights…

in my fragment shader I declared a uniform block to hold the light data:


#version 330
layout(std140) uniform light {
  vec3  position[16];
  vec3  color[16];
  float radius[16];
} u_light;

And the GL UBO setupcode:


static GLuint uidx[3];
static GLint offset[3];
static GLint stride[3];
     static const GLchar* unames[3] = {
       "light.position",
       "light.color",
       "light.radius"
     };
     glGetUniformIndices(pgm, 3, unames, uidx);
     blockIndex = glGetUniformBlockIndex(pgm, "light");
     glGetActiveUniformsiv(pgm, 3, uidx, GL_UNIFORM_OFFSET, offset);
     glGetActiveUniformsiv(pgm, 3, uidx, GL_UNIFORM_ARRAY_STRIDE, stride);
     glGetActiveUniformBlockiv(pgm, blockIndex, GL_UNIFORM_BLOCK_DATA_SIZE, &blocksize);
    glGenBuffers(1, &buffer);
    glBindBuffer(GL_UNIFORM_BUFFER, buffer);
    glBufferData(GL_UNIFORM_BUFFER, blocksize * maxlights, NULL, GL_DYNAMIC_DRAW);
    glBindBufferBase(GL_UNIFORM_BUFFER, 1, buffer);
    glUniformBlockBinding(pgm, blockIndex, 1);


And the GL code to fill the UBO every frame:


      glBindBuffer(GL_UNIFORM_BUFFER, buffer);
      glUniformBlockBinding(pgm, blockIndex, bindingPoint);
      GLint i, off;
      // position
      for(off = offset[0], i = 0; i < numlights; i++) {
         ....
        glBufferSubData(GL_UNIFORM_BUFFER, off, stride[0] * sizeof(vec3f32_t), ((float32*)(data + off)));
        off += stride[0];
      }
      // color
      for(off = offset[1], i = 0; i < numlights; i++) {
        ....
        glBufferSubData(GL_UNIFORM_BUFFER, off, stride[1] * sizeof(vec3f32_t), ((float32*)(data + off)));
        off += stride[1];
      }
      // color
      for(off = offset[2], i = 0; i < numlights; i++) {
        // ....
        glBufferSubData(GL_UNIFORM_BUFFER, off, stride[2] * sizeof(float32), ((float32*)(data + off)));
      }
      glBindBufferBase(GL_UNIFORM_BUFFER, bindingPoint, buffer);


Offset and stride reports that the UBO size is 768 bytes. Maybe I’m mistaken but looking up 768 bytes for every pixel means a huge(!) memory traffic. I guess this is why the frame rate drops?!

Is this an elegant way to pass light data to the fragment shader or are there better ways to do this?

Regards
Saski

Hmm, I think you should first try to determine where the performance problem comes from; do you need a lot more CPU or GPU time, do you consume more memory bandwidth on the GPU than before? Why are you saying every pixel accesses the full UBO? Are you rendering with 16 active light sources? That sounds like a lot of per-pixel computation independent of the UBO…
A couple of things you may want to experiment with: use 2 or 3 buffers and rotate them, i.e. frame i uses buffer (i%3), frame i+1 buffer (i%3 + 1), etc. Map the UBO if you are filling it piecewise or construct a block of memory with the correct layout and copy it with a single glBufferSubData call.

You shouldn’t use glBufferSubData to update individual fields in your UBO, create a structure in memory that matches your UBO layout and then upload it with a single call per frame.

Sadly, you have to find a way to work around the buffer upload bottleneck. I had a similar issue, namely that frequent buffer uploads offer utterly terrible performance across the board with all hardware. It doesn’t matter whether you use glBufferSubData or a mapping/unmapping a buffer - both methods only work if your amount of updates remains small.
So unless you target only recent hardware (see below) you will have to find a way to generate the buffer data up front and upload it all at once - or at least with a relatively small number of calls.

This problem has been addressed in OpenGL in the mean time with persistently mapped buffers, which allow buffer updates with virtually no overhead at all, but this is a recent feature that’s not yet widely available. It is supported by most GL 4.x hardware on Windows (provided you are not stuck with an outdated OEM driver), on Linux it’s spotty and on MacOSX it’s not available at all.