How fast is point rendering?

Specifically, if I have a decently fast shader (e.g. one that just does linear interpolation from a texture), can I sanely run it on every pixel on the screen?

Thanks

Yeah, why not?.. Deferred shading works similarly: first, the framebuffer gets stuffed with rasterized raw input values, then a screen-aligned quad is applied few times to perform different types of calculations for each of the pixels to shade them. At the end contents containing final colors are copied into the default framebuffer.
Depends on how many times you want to run through your framebuffer’s contents and how many pixels it contains, though.

Well I’d expect it to be fast, since you should be able to run a separate GPU thread for each pixel, but when I say “run the shader on every pixel” it sounds scary.

What I want to do is render to a framebuffer object, then render to the screen by interpolating every pixel from the neighbors it has in the FBO. So how can I say “run this shader for every pixel on the screen” without creating a buffer that contains the coordinates of every pixel on the screen?

Well, if you want to just copy contents from the custom FBO to the default window’s framebuffer, you may consider using glBlitFramebuffer. That should be the fastest way to copy because it is made exactly for that.

Well I specifically want the step of “run my arbitrary shader on every pixel” in there, e.g. for whatever kind of interpolation I want.

Can I just create some polygons that span the entire screen and render them? Would that cause the shader for every pixel?

Specifically, if I have a decently fast shader (e.g. one that just does linear interpolation from a texture), can I sanely run it on every pixel on the screen?

Just think, any immersive 3D world must draw something in every pixel, and hence must run a shader at least once for every pixel.

[QUOTE=BenFoppa;1261338]Well I’d expect it to be fast, since you should be able to run a separate GPU thread for each pixel, but when I say “run the shader on every pixel” it sounds scary.

What I want to do is render to a framebuffer object, then render to the screen by interpolating every pixel from the neighbors it has in the FBO. So how can I say “run this shader for every pixel on the screen” without creating a buffer that contains the coordinates of every pixel on the screen?[/QUOTE]
Every fragment doesn’t get its own GPU thread, it is determined by how the hardware is set up and the number of pixel pipelines it has, but there are many hardware components which will just handle fragment (pixel) shading.

What I want to do is render to a framebuffer object, then render to the screen by interpolating every pixel from the neighbors it has in the FBO. So how can I say “run this shader for every pixel on the screen” without creating a buffer that contains the coordinates of every pixel on the screen?

This is just normal texture mapping with “linear interpolation” for minification or magnification.

How you set it up.


glGenTextures(1, &name);
glBindTexture(GL_TEXTURE_2D, &name);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR); // << HERE (linear interpolation)
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR); // << HERE
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);

When you draw your textured quad to fill the whole screen, the interpolation happens as part of the texture lookup.

This is the quad data I use for full-screen quads. The first two numbers in each line are positions, the last two are UV texture coordinates. You don’t need to give coordinates for every pixel because the UV coordinates of the vertices will be interpolated and pick associated values from your texture. In a sense, every pixel in your texture is picked by looking up a UV of ( x / imageWidth, y/imageHeight ).


    GLfloat quadData[16] = {
        -1,-1,    0,0,
        1,-1,     1,0,
        -1,1,     0,1,
        1,1,      1,1
    };

Then in your shader, you don’t do any transform:


attribute mediump vec2 position;
attribute mediump vec2 texCoord2;

varying mediump vec2 texCoord2Varying;

// This assumes that the quad position coordinates are being input as in the
// range [-1,1] in both X and Y directions, so no transformation is necessary.
void main() {
    texCoord2Varying = texCoord2;
    gl_Position      = vec4( position.x, position.y, 0, 1 );
}

Makes sense. Again, I’d expect it to be fast, since there’s no hardware reason it wouldn’t be. But saying the hardware can do it and saying it’s straightforward and performant in OpenGL are two different things, which was why I asked.

Makes sense that there are other constraints, but am I right in thinking that on “ideal” hardware (e.g. infinite pixel piplines), OpenGL would be able to run a separate GPU thread for each pixel? Is there a reason you can’t parallelize to that degree?

So if I create two triangles that span the size of the screen and draw them using my custom shader, the fragment shader will be called for every pixel in the shape? If my shader called on pixel (x,y) were to just read the pixel (x, y) from some texture, would this be comparably performant to blitting from the texture to the screen?

Thanks

The fragment shader will run for every fragment (not pixels, its not 100% the same). The vertex shader will run for every vertex.
If you have 2 triangles spanning the whole screen then your vertex shader will run 2 * 3 times. Your fragment shader will run for as many fragments as are needed to fill your whole screen.
If you have a lot of points instead your vertex shader will run for each of them, but your fragment shader will most likely run as just often as before. So the work for the fragment shader will be almost the same, but the vertex shader will run much more often.

Testing on my hardware, it is much slower to draw 6 triangles filling the screen then to draw 1 point for every pixel on my monitor.

[QUOTE=Cornix;1261348]The fragment shader will run for every fragment (not pixels, its not 100% the same). The vertex shader will run for every vertex.
If you have 2 triangles spanning the whole screen then your vertex shader will run 2 * 3 times. Your fragment shader will run for as many fragments as are needed to fill your whole screen.
If you have a lot of points instead your vertex shader will run for each of them, but your fragment shader will most likely run as just often as before. So the work for the fragment shader will be almost the same, but the vertex shader will run much more often.

Testing on my hardware, it is much slower to draw 6 triangles filling the screen then to draw 1 point for every pixel on my monitor.[/QUOTE]

Okay, good to know. I’ve been wondering about the difference…

To draw one point for every pixel, did you have to create a buffer to hold the coordinates of every pixel? How can I run a shader on every pixel without having to do that?

I tested with VBO’s. You can simply test for yourself and compare the performance on your own hardware, this shouldnt be too hard to do.
If you just want the shader output you can also use offscreen rendering, read up on FBO’s. This might possibly be even faster.
If you just want the fragment shader output (and dont care about the vertex shader) you can also just draw a giant quad (2 triangles) across the screen. That does not make a difference to the fragment shader.

Did your VBO hold the coordinates of every pixel? How can I run the shader on every pixel (and output to the screen) without having to do that?

There might be ways, but I simply uploaded them all to a VBO.
Perhaps you could use geometry / tessalation shaders, but I never worked with those so I cant help you.

How can I run the shader on every pixel (and output to the screen) without having to do that?

You draw two triangles that together fill the screen.

[QUOTE=Cornix;1261353]There might be ways, but I simply uploaded them all to a VBO.
Perhaps you could use geometry / tessalation shaders, but I never worked with those so I cant help you.[/QUOTE]

Alright, I’m sure I’ll figure the rest out. Thanks so much for your help!

It sounds like that doesn’t create exactly the behavior I want:

[QUOTE=Cornix;1261348]The fragment shader will run for every fragment (not pixels, its not 100% the same). The vertex shader will run for every vertex.
If you have 2 triangles spanning the whole screen then your vertex shader will run 2 * 3 times. Your fragment shader will run for as many fragments as are needed to fill your whole screen.
If you have a lot of points instead your vertex shader will run for each of them, but your fragment shader will most likely run as just often as before. So the work for the fragment shader will be almost the same, but the vertex shader will run much more often.

Testing on my hardware, it is much slower to draw 6 triangles filling the screen then to draw 1 point for every pixel on my monitor.[/QUOTE]

Testing on my hardware, it is much slower to draw 6 triangles filling the screen then to draw 1 point for every pixel on my monitor.

I haven’t tried myself, but I have a bit of a hard time believing there is not a typo that inverts the meaning in this sentence (also why 6 triangles?). At the very least transforming all these points consumes more memory bandwidth on the graphics card than just transforming 6 points (4 when using a triangle strip) and letting the rasterizer do its job.
Every description of post-processing effects (essentially shaders that take as input the image produced by a previous rendering step transforming it in some way) I’ve seen calls for rendering a screen filling rectangle to get the fragment shader executed on each pixel, but I suppose the only way to be really sure is to measure it yourself :wink:

If you want to run a fragment shader on each pixel of your rendering surface once and only once, you can do that by simply rendering a two triangles that fill the scree. You don’t even need to actually pass data to the vertex shader at all if you just hardcode the values of the transformed vertices.

If you need actual proof that this is the behavior that will be observed, try this: make a fragment shader that uses an atomic uint counter and increases the atomic once in the shader. Once the rendering of a single frame is done, query the data from the atomic counter’s buffer and see what it says. It should match your width * height.

Whoops, yes there actually IS not just one but more typos in that sentence, I am sorry.
This would be what I wanted to say:

Testing on my hardware, it isnt much slower to draw 1 point for every pixel on my monitor than to draw 2 triangles filling the screen.

I cant understand how I could have screwed that up. I must have been very very tired…

[QUOTE=Ed Daenar;1261382]…try this: make a fragment shader that uses an atomic uint counter and increases the atomic once in the shader…[/QUOTE]There is a simplier way. Create a VAO and when it is bound (no attributes enabled) call glDrawArrays(GL_TRIANGLE_FAN,0,4). This way your vertex shader will execute 4 times and checking the value of gl_VertexID you will find it to be 0 for first vertex, 1 for second, 2 for third and 3 for the last vertex. Based on those values you set gl_Position to the desired coordinate (±1, ±1, 0, 1). As simple as that. Just ensure that some VAO is bound (that is the requirement, even if the VAO is not actually used).
The easy way is to define an array of constants - 4 vectors carrying the corner coordinates, and pick the right one by the index of gl_VertexID.

Kudos on an interesting way to do a screen-filling fan. I’m going to have to try that.

Thanks for all the clarifications and help! I’m happy the simpler solution turned out to be no less performant after all haha.