Matrices can be your friends

From Wikiid
Jump to: navigation, search

By Steve Baker

What stops most novice graphics programmers from getting friendly with matrices is that they look like 16 utterly random numbers. However, a little mental picture that I have seems to help most people to make sense of what's going on. Most programmers are visual thinkers and don't take kindly to piles of abstract math.

Take an OpenGL matrix:

 float m [ 16 ] ;

Consider this as a 4x4 array with it's elements laid out into four columns like this:

  m[0]  m[4]  m[ 8]  m[12]
  m[1]  m[5]  m[ 9]  m[13]
  m[2]  m[6]  m[10]  m[14]
  m[3]  m[7]  m[11]  m[15]

WARNING: Mathematicians like to see their matrices laid out on paper this way (with the array indices increasing down the columns instead of across the rows as a programmer would usually write them). Look CAREFULLY at the order of the matrix elements in the layout above! ...but we are OpenGL programmers - not mathematicians - right?! The reason OpenGL arrays are laid out in what some people would consider to be the opposite direction to mathematical convention is somewhat lost in the mists of time. However, it turns out to be a happy accident as we will see later.

So graphics engineers think like programmers - and lay the elements out like this:

  m[0]  m[1]  m[ 2]  m[3]
  m[4]  m[5]  m[ 6]  m[7]
  m[8]  m[9]  m[10]  m[11]
  m[12] m[13] m[14]  m[15]

I'm going to continue to talk about them as rows and columns "the programmer way" - and not "the mathematicians way".

If you are dealing with a matrix which only deals with rigid bodies (ie no scale, shear, squash, etc) then the last column (array elements 3,7,11 and 15) are always 0,0,0 and 1 respectively and so long as they always maintain those values, we can safely forget about them...at least for now.

The first three elements of the bottommost row of the matrix is just the overall translation. If you imagine some kind of neat little compact object (like a teapot), then array elements 12,13 and 14 tell you where it is in the world. It doesn't matter what combinations of rotations and translations it took to produce the matrix, the rightmost column tells you where the object basically is. It is often fortunate that the OpenGL matrix array is laid out the way it is because it results in those three elements being consecutive in memory.

OK, so now we are down to only nine random-looking numbers. These are the top three elements of each of the first three columns - and collectively they represent the rotation of the object.

The easy way to decode those numbers is to imagine what happens to four points near to the origin after they are transformed by the matrix:


  (0,1,0)
       |  /(0,0,1)
       | /
       |/___(1,0,0)
  (0,0,0)

These are four vertices on a 1x1x1 cube that has one corner at the origin. After the matrix has transformed this cube, where does it end up?

Well, if we neglect the translation part, then the pure rotation part simply describes the new location of the points on the cube:


   (1,0,0)  --->  ( m[0], m[1], m[2] )
   (0,1,0)  --->  ( m[4], m[5], m[6] )
   (0,0,1)  --->  ( m[8], m[9], m[10])
   (0,0,0)  --->  ( 0, 0, 0 )

After that, you just add the translation onto each point so that:

   (1,0,0)  --->  ( m[0], m[1], m[2] ) + ( m[12], m[13], m[14] )
   (0,1,0)  --->  ( m[4], m[5], m[6] ) + ( m[12], m[13], m[14] )
   (0,0,1)  --->  ( m[8], m[9], m[10]) + ( m[12], m[13], m[14] )
   (0,0,0)  --->  ( 0, 0, 0 ) + ( m[12], m[13], m[14] )

Once you know this, it becomes quite easy to use matrices to position objects exactly where you need them without messing around with multiple calls to glRotate (which is just as well because this is obsolete in modern OpenGL!).

Just imagine a little cube at the origin - pretend it's firmly attached to your model. Think about where the cube ends up as the model moves - write down where it's vertices would end up and there is your matrix.

So, if I gave you this matrix:

  0.707,  0.707,  0,  0
 -0.707,  0.707,  0,  0
  0    ,  0    ,  1,  0
 10    , 10    ,  0,  1

...you could easily see that the X axis of that little cube is now pointing somewhere between the X and Y axes, the Y axis is pointing somewhere between Y and negative X and the Z axis is unchanged. The entire cube has been moved 10 units off in X and Y. This is a 45 degree rotation about Z and a 10,10,0 translation! You didn't need any hard math - just a mental picture of what the little cube did - and no concerns about the order of operations or anything hard like that. What would have happened to something out at 100,100,0? Well, just imagine it was glued to the cube (on the end of a long stick)...as the cube rotated, the thing at 100,100 would have moved quite a bit too - in fact, you can see that the rotation would put it onto the Y axis and the translation would have moved it 10 units up and to the right. With practice, you can figure out what that last row of numbers does to the little cube too.

So, would you like to know how to use a matrix to squash, stretch, shear, etc? Just think about where the axes of that little cube end up - write them down and you are done. What does a cube of jello look like when there is a strong wind blowing from X=-infinity?

 1  , 0  , 0, 0
 0.3, 0.9, 0, 0
 0  , 0  , 1, 0
 0  , 0  , 0, 1

Look - the Y axis is leaning a third of a unit to the right and the cube got a bit shorter. Suppose your cartoon character is going to jump vertically, and you want to do a bit of pre-squash before the jump... and post-stretch during the jump. Just gradually vary the matrix from:


 1  , 0  , 0, 0         1  , 0  , 0, 0
 0  , 0.8, 0, 0         0  , 1.2, 0, 0
 0  , 0  , 1, 0   ===>  0  , 0  , 1, 0
 0  , 0  , 0, 1         0  , 0  , 0, 1

Not bad - he got shorter then longer - how about getting a bit fatter too (conservation of cartoon volume) ?

 1.2, 0  , 0  , 0      0.9,0  , 0 , 0
 0  , 0.8, 0  , 0      0  ,1.2, 0 , 0
 0  , 0  , 1.2, 0 ===> 0  ,0  ,0.9, 0
 0  , 0  , 0  , 1      0  ,0  , 0 , 1

Now the cube got smaller in Y and bigger in X and Z then got bigger in Y and smaller in X/Z...easy! Not only is it easier to think transforms out this way, but it's invariably more efficient too. By seeing the entire transformation as one whole operation on a unit cube, you save a long sequence of rotate, translate, scale matrix multiplications - which each imply a complicated set of multiply/add steps to concatenate the new transform with whatever was there before.

Finally, there is one matrix that we all need to know - the "Identity" matrix:


 1, 0, 0, 0
 0, 1, 0, 0
 0, 0, 1, 0
 0, 0, 0, 1

As you can see, this matrix leaves all the axes completely alone and performs no translation. This is a "do nothing" matrix. Matrices are really easy - it's just a matter of looking at them pictorially.