A Short Guide to Data Structures

From Wikiid
Jump to: navigation, search

This document makes a quick trip through the most common data structures encountered in day-to-day programming. I'll use C++ syntax here - but the same principles apply in other C-like languages.

== Built-in data structures: I'll skim quickly through these since you should already know them:

POD

Plain Old Data. This is a term that covers the basic, built-in data types in C-like languages, this pretty much boils down to:

 char, unsigned char
 short, unsigned short
 int, unsigned int
 long, unsigned long
 bool
 float
 double

There are variations on this basic set in some languages - but these are the essentials. POD also includes pointers to these basic types.

Pointers

A pointer is a small variable (typically 4 or 8 bytes depending on whether you're using a 32 or 64 bit computer) that points to some other variable. The pointer contains the address of whatever it's pointing at:

 int *x ;
 int y ;
 x = & y ;   // X equals the address of Y.

Diagrammatically:

 X --------> Y 

Structure

Generally refers to the idea (in C) that you can group together a bunch of objects (which could be POD, pointers, other data structures, etc) and wrap them in a 'structure'. eg:

 struct MyStruct
 {
   int A ;
   bool B ;
   double C ;
 } ;

Class

A structure that also contains some functions that operate on the data elements of the class objects. In C++, 'structures' and 'classes' are really the same thing - except that the contents of a 'struct' are public by default and those of a 'class' are private by default. eg:

 class MyClass
 {
   public:
     int A ;
     void DoSomeWork () { A = A + 1 ; }
 }

Most of the more complex data structures will be implemented as some kind of a class object.

Arrays

A simple linear collection of variables, all of the same type, stored consecutively in memory so that they can be indexed with a simple integer:

  float X [ 10 ] ;

Notice that you can have arrays of more complex types too:

  MyStruct X [ 10 ] ;

...declares an array of ten 'struct XYZ' objects.

  X [ 6 ]

...picks the SEVENTH object out of the array - because we start counting at zero.

Stack

Sometimes known as a LIFO (Last-In, First-Out), a stack is like a pile of objects which you can pile more objects on top of - or remove objects, but ONLY from the top. It generally comprises an array to hold the objects and a pointer (called the 'stack pointer') that shows where the top of the pile is.

                      +-------------+ Top of stack
                      |             |
  StackPointer ------>|             |
                      |/////////////|
                      |/////////////|
                      |/////////////|
                      |/////////////|
                      |/////////////|
                      |/////////////|
                      +-------------+ Bottom of stack

It is conventional to have the stack pointer point to the first EMPTY space in the array because that allows you to initialise it to point to the first slot in the array.

   StackPointer = & Array [ 0 ] ;

The two most common operations you can do to a stack are called 'push' and 'pop':

  • Push adds an object to the stack: It writes the new item of data where the stack pointer is pointing and increments the stack pointer to the next free slot.
   void Push ( int x )   // For a stack of integers
   {
      *StackPointer = x ;
      StackPointer++ ;
   }
  • Pop removes the topmost objects from the stack: It decrements the stack pointer (so it's now pointing at the most recently pushed object) and returns whatever it points at. That location is now 'free' again.
   int Pop ()
   {
     StackPointer-- ;
     return *StackPointer ;
   }

Additionally, there needs to be a way to check how many items there are in the stack and to handle the error conditions when too many items are pushed and you run out of space in the array - or when someone tries to pop something when the stack is empty.

Queue (aka Ring buffer, aka Circular buffer, aka FIFO)

A Queue (or FIFO...First-In, First-Out) is similar in concept to a stack - except that you when you remove something from the pile, you always take from the bottom (not from the top). You can think of it as a queue of people waiting in line at a bank counter. As the teller becomes free, a person can be popped off the 'head' of the queue - when more people arrive, they politely add themselves to the tail of the queue. If you use a structure similar to a stack - you need two pointers - one where the data will be removed (called the 'head' pointer) and one where new data will be added (the 'tail' pointer).

                      +-------------+ Top of stack
                      |             |
  TailPointer ------->|             |
                      |/////////////|
                      |/////////////|
                      |/////////////|
  HeadPointer ------->|/////////////|
                      |             |
                      +-------------+ Bottom of stack

As with a stack, the TailPointer points to the next empty space. The HeadPointer points to the next data item to remove. When the queue is empty, the head and tail pointer point to the same address.

  HeadPointer = TailPointer = & Array [ 0 ] ;

We also have operations called push and pop that add and remove data. The 'push' operation is almost identical to the stack example - but 'pop' has to take data from where the HeadPointer points and increment it to point at the next data item.

However, there is a problem. Because we keep adding data at one end and removing it from the other - the section of the array with data in it creeps upwards - and sooner or later, the TailPointer will be pointing off the end of the array. When this happens with a stack - the stack is full and that's probably an error condition. But in a queue, there may well be a ton of space at the bottom of the array where we'd previously popped a bunch of stuff from. So what we have to do is to pretend that the top of the array is wrapped around and glued to the bottom end. That's why this data structure is sometimes called a 'ring buffer' or a 'circular buffer'. Hence, you might at some time see this:

                      +-------------+ Top of stack
                      |/////////////|
                      |/////////////|
  HeadPointer ------->|/////////////|
                      |             |
                      |             |
  TailPointer ------->|             |
                      |/////////////|
                      |/////////////|
                      +-------------+ Bottom of stack

This is easy enough to implement - after you increment either HeadPointer and TailPointer when you pop and push - you check to see if they are pointing off the end of the array - and if so, wrap them back to the beginning. That, in turn leads to some additional complexity with error checking because if the queue fills up completely, the head and tail pointers will wind up pointing to the same location...which is exactly the same as they are when the queue is completely empty. There are several ways to avoid this problem - the simplest is to have another variable that contains the number of items in the queue - increment it on every 'push' and decrement it on every 'pop'.

Singly Linked List

This is the first of many similar data structures that involves a 'struct' or 'class' object that links to others of it's own kind. Declare your data class with an extra member called something like "Next":

 class MyClass
 {
   MyClass *Next ;
   ...other stuff...
 } ;

Additionally, declare a pointer outside of the class called "First":

 MyClass *First = NULL ;

When First is NULL, the list is empty:

 First ----> NULL

When First points at something there can be any number of nodes - with the last one pointing at NULL:

              _________         _________
             | MyClass |       | MyClass |
 First ----> |     Next| ----> |     Next| ----> NULL
             |_________|       |_________|

With a singly-linked list, it is most common to add new items at the front of the list:

 void MyClass::Add ()  // Add yourself to the list
 {
   Next = First ;
   First = this ;
 }

You can remove items from the front of the linked list (making it a LIFO...like a stack):

 MyClass *RemoveFirstItem ()
 {
   if ( First == NULL ) return NULL ;   // The list was already empty.
   MyClass *result = First ;
   First = First -> Next ;  // Make 'First' point to the rest of the list after the first object.
   return result ;
 }

If you need to remove items from the end of the list (making it a FIFO - like a queue) then it's a bit of a pain because you don't have a handy pointer to the last object in the chain. Instead you have to write a loop:

 void RemoveLastItem ()
 {
   if ( First == NULL ) return NULL ;  // The list was already empty.
   MyClass *result ;
   // It the list only has one item - then now it's empty.
   if ( First -> Next == NULL )
   {
      result = First ;
      First = NULL ;
      return result ;
   }
   // If there are two or more things in the list - then we need to find the next-to-last one:
   // Start at the front:
   MyClass *nextToLast = First ;
   // While your next's next isn't empty - you aren't the next to last...
   while ( thing -> Next -> Next != NULL )
     thing = thing -> Next ;  // Move on to the next one
   // OK - so now nextToLast really is the next-to-last node,
   // so set it's next to be null so it's now the last node.
   result = thing -> Next ;
   thing -> Next = NULL ;
   return result ;
 }

But, there is a quicker way:

Doubly-linked list

Similar to a singly-linked list but we add links that point back in the reverse direction:

 class MyClass 
 {
   MyClass *Next ;
   MyClass *Prev ;
   ...data...
 } ;

We still need a First pointer - but perhaps we'd like a Last pointer too:

 MyClass *First = NULL ;
 MyClass *Last  = NULL ;

...and diagrammatically, we have:

              _________         _________
             | MyClass |       | MyClass |
 First ----> |     Next| ----> |     Next| ----> NULL
  NULL <---- |Prev     | <---- |Prev     | <---- Last
             |_________|       |_________|

As you can see, there is a kind of symmetry here - the list is linked in both directions. This takes a bit of bookkeeping - but if you take care, it's not hard:

 void MyClass::AddToFront ()   // Add this object to the front
 {
   Next = First ;
   First = this ;
   Prev = NULL ;
   if ( Next != NULL )
   {
     Next -> Prev = this ;
   }
   else
   {
     Last = this ;
   }
 }