A Short Guide to 3D Graphics Performance testing

From Wikiid
Revision as of 03:14, 21 October 2010 by SteveBaker (Talk | contribs) (Eliminate the bottleneck)

Jump to: navigation, search

Understand the system

In most 3D applications - whether under OpenGL, OpenGLES, WebGL or Direct3D, there are four principle places where speed bottlenecks can happen:

  1. The CPU - you have to calculate what meshes you're going to draw and set them up for rendering. This tends to be more or less a fixed cost per mesh.
  2. The transmission link between CPU and GPU which is a cost that depends on the number of vertices you send multiplied by the number of per-vertex attributes - plus the cost of updating textures and shaders that you might change during that frame.
  3. The GPU's vertex processor. This is the per-vertex processing cost of transforming/lighting your vertex data - without shaders, the cost roughly depends on the number of vertices times the number of lights you have turned on - with shaders, the cost roughly depends on the number of vertices times the complexity of your shader.
  4. The GPU's pixel/fragment processor. This is the per-pixel cost for pixels that pass clipping. The cost roughly depends on the number of pixels you draw onto the screen times the number of textures you use and/or the complexity of your fragment shader.

Find the bottleneck

If your application is running slower than you'd hoped - then you need to establish which of these four things is the problem.

GPU Pixel processing

Pixel processing time is easy to understand - reduce the size of the window you're rendering to (keeping everything else the same). If your program goes faster in rough proportion to the area of the window (height x width) - then pixel processing is the bottleneck.

CPU processing

If you have eliminated that then since CPU time generally doesn't depend on the number of vertices you draw, then you can (just as a test) keep rendering to a tiny window (to more or less eliminate pixel processing costs) and deliberately halve the number of triangles in each mesh. If your application's performance increases by roughly a factor of two then you were obviously not limited by the CPU's per-mesh costs - if your performance hardly changes - then probably you're drawing too many objects or doing too much per-mesh work in the CPU and you need to improve your code somehow.

GPU Vertex transmission/processing

Figuring out whether the transmission costs (2) or the GPU's vertex processing (3) costs are your problem is tricky. But since both depend mostly on number of vertices, you probably don't need to.

Eliminate the bottleneck

If CPU time is the culprit

You either need to optimize your code so that other CPU time-sinks are reduced (eg make physics, collision, AI, etc faster) - or you need to improve your field-of-view culling so you draw fewer meshes that are off-screen - or you need to reduce the number of meshes in your art (eg by combining multiple parts of an object into a single object using tricks like texture atlassing).

In my experience, the last of these is the first thing most people should be looking at.

If GPU vertex transmission/processing is the culprit

Then your meshes are too complex or you have too many (fixed function) light sources or your vertex shader is too complex. Use level-of-detail to reduce the number of vertices in meshes that are further from the eyepoint. Consider doing occlusion queries to reduce the number of meshes you draw. Optimize light source culling.

If GPU pixel processing is the culprit

Simplify your pixel shaders. Consider doing a depth-only pass before your 'beauty' pass. Can you reduce the resolution of your textures? Can you draw to a smaller window? Can you make better use of approximate front-to-back rendering order?

Rinse and repeat

When you've done some kind of optimization - re-do the testing phase to see if you've speeded things up - and also (very important) to see if you moved the bottleneck somewhere else. If the CPU was the limiting factor - and you improved that, then perhaps the GPU is now the limiting factor. If so, then speeding up the code in your CPU still more probably won't help - and (worse still) doing more work there won't result in better frame rates. So after each round of optimization, see which part of the system needs more work next.

Also, if you're getting good frame rates - then you can probably improve the quality of your graphics by drawing more meshes, more polygons, or having more complicated lighting or something. When things are humming along fast enough, you can use "reverse optimization" to understand where you could be making things look nicer at little to no cost to performance.

This is an over-simplification

There are times when (for example) your CPU is spending too long getting to the point in the frame cycle where it starts drawing meshes...then, once it starts drawing them, the meshes are too complicated and the CPU is held up waiting for the GPU to get done. In such cases, you have multiple bottlenecks and improving either CPU or vertex count will both improve performance.