Code Examples

From GPGPU.org Wiki

Jump to: navigation, search

c4tlielracc c4tdro

GPGPU Code Examples

This repository is for the inclusion of minimal examples to either display a technique that has been found useful, or provide a solution to a problem faced. To keep the examples short, try to avoid inserting setup code unless necessary.

Contents

Ping-Pong / Double Buffering between surfaces

[TODO: No explanation required... --woody 17:17, 15 Aug 2005 (EDT)] [TODO: Decided to split this one up into several subtopics Dom 16:17, 16 Aug 2005 (EDT)]

OpenGL FBO example

  • Declare texture handles and management vars:
GLuint pingpongTexIDs[2];
static const GLenum source[] = { GL_COLOR_ATTACHMENT0_EXT, GL_COLOR_ATTACHMENT1_EXT };
int writeTex = 0;
int readTex = 1;
  • Set up textures and create/bind the FBO [...]
  • Attach textures to FBO:
glFramebufferTexture2DEXT(GL_FRAMEBUFFER_EXT, source[writeTex], GL_TEXTURE_2D, pingpongTexIDs[writeTex], 0);
glFramebufferTexture2DEXT(GL_FRAMEBUFFER_EXT, source[readTex], GL_TEXTURE_2D, pingpongTexIDs[readTex], 0);
  • Set "write-only" destination texture:
glDrawBuffer (source[writeTex]);
  • Bind input texture:
glBindTexture(GL_TEXTURE_2D,pingpongTexIDs[readTex]);
  • Render [...]
  • Swap writeTex and readTex [...]

[Is this sufficient? Dom 16:39, 16 Aug 2005 (EDT)]

OpenGL pBuffer example

OpenGL pBuffers are best managed with the RenderTexture class by Mark Harris. The class must be set up correctly in double-buffered mode. This example works only with Windows because it uses WGL extensions.

  • Declare some management vars and macros:
const GLenum glsurf[2] = {GL_FRONT_LEFT, GL_BACK_LEFT};
const GLenum wglsurf[2] = {WGL_FRONT_LEFT_ARB, WGL_BACK_LEFT_ARB}; 
int SOURCE_BUFFER = 0;
#define DESTINATION_BUFFER !SOURCE_BUFFER
#define SWAPBUFFERS()  SOURCE_BUFFER = DESTINATION_BUFFER; 
  • Set the "write-only" surface as destination:
glDrawBuffer(glsurf[DESTINATION_BUFFER]);
  • Bind the "read-only" surface as input texture:
glActiveTexture(GL_TEXTUREi); // i is the number of the texture image unit you want to bind to)
myRenderTexture->BindBuffer(wglsurf[SOURCE_BUFFER]); 
  • Do some rendering...
  • Swap the surfaces:
SWAPBUFFERS();
  • Start over.

[TODO: More elaborate? Dom 16:31, 16 Aug 2005 (EDT)]

DirectX example

Texture creation

[ I split up woody's suggestion into several examples Dom 17:40, 16 Aug 2005 (EDT)]

OpenGL

  • 1D
  • 2D
  • 3D
  • fixed-point
  • floating point

DirectX

[same...]


Multitexturing

OpenGL

  • Enabling texturing
  • Accessing textures from shaders

DirectX

[same...]


Data transfer CPU<->CPU

[TODO fill out all of this: Dom 17:40, 16 Aug 2005 (EDT) ]

From CPU to GPU textures (download)

OpenGL

DirectX

From GPU to CPU (upload/readback)

OpenGL

DirectX

Using the Depth Buffer

Early-z

Early-Z: This is where the per-pixel depth test that is normally run after the shader can be switched to run before the shader. The key is this is per-pixel, and exact. It cannot be run before the shader if shader-dependent data is needed to determine if the pixel is killed (such as shader-computed depth).

Z-Cull: This is a conservative attempt to cull large blocks of pixels (possibly hierarchically) before shader. This is typically done during rasterization. It requires some compressed representation of blocks of pixels. The idea is to build up an occluder representation in on-chip memory that can be used to trivially accept or reject whole blocks of pixels -- this can result in a very large bandwidth reduction because this information is in on-chip RAM, while the depth test requires accessing the z buffer which is stored in the frame buffer. (Note that trivially accepting can reduce bandwidth too!) ATI calls this HyperZ.

Note that these two are compatible -- you can cull whole blocks using Z-Cull because they are conservatively behind an existing occluder. But another block may only be partially occluded, so shader work can still be saved by Early-Z.

Both of these features exist on the latest GPUs from both NVIDIA and ATI.

This is currently the most complete list of things that can disable early-z for the remainder of the frame:

NV3X and NV4X:

  • Changing the depth test direction invalidates z-cull for the remainder of the frame (so change only late in frame). I believe "frame" is defined by when the z-buffer is cleared, because this resets the z-cull surface. Note that it's OK to change from GL_LESS to GL_ALWAYS and back (or GL_GREATER to GL_ALWAYS and back), but changing from GL_LESS to GL_GREATER is bad.

NV3X only:

  • Alpha test
  • Alpha-to-coverage
  • User clip planes
  • Pixel kill in the shader
  • Shader alters z (depth replace)

NV4X:

  • Writing stencil while rejecting based on stencil (so reject color/Z in one pass; run a separate pass to write/update the stencil buffer).
  • Changing stencil func/ref/mask invalidates for remainder of frame (So only change late in frame).
  • I think shader alters Z also effectively invalidates, even though the hardware could support shader altering z (monotonically) if the API exposed a user hint.

When zcull invalidated - stays that way until reset (by a clear on the depth buffer)

[TODO: Add information for ATI hardware? --woody 20:04, 17 Aug 2005 (EDT)]


The following works on GeForce 6800. I'm not sure about ATI cards, and it doesn't work on GeForce FX GPUs. (Mark)

  • Clear depth buffer to 1
  • 1. Render a full-screen quad with depth 0. In the shader for this pass, use KIL (discard, in Cg) to cancel the writing of fragments that you DO want to process in the next pass.
  • 2. Enable depth test, set to LESS. Render your next passes at a depth between 0 and 1, and do not write depth in the shader. Pixels that were written in the first pass will have depth of 0, and thus will be culled. Pixels that were not written will have depth of 1, and thus will be overwritten by pixels in the second pass. Note that if you want to cull in multiple subsequent passes, disable depth writes for them.
  • Note: if you have good coherence in the regions of 0/1, then you will get better culling.

This information was sourced from the GPGPU forums, from these postings:

http://www.gpgpu.org/forums/viewtopic.php?t=361
http://www.gpgpu.org/forums/viewtopic.php?t=256
http://www.gpgpu.org/forums/viewtopic.php?t=367

Fragment Discarding

[TODO: Add methods for using depth buffer to selectively process fragments. --woody 17:17, 15 Aug 2005 (EDT)]

Personal tools