GNU Astronomy Utilities



12.3.8 Linked lists (list.h)

An array is a contiguous region of memory that is very efficient and easy to use for recording and later accessing any random element as fast as any other. This makes array the primary data container when you have many elements (for example, an image which has millions of pixels). One major problem with an array is that the number of elements that go into it must be known in advance and adding or removing an element will require a re-set of all the other elements. For example, if you want to remove the 3rd element in a 1000 element array, all 997 subsequent elements have to pulled back by one position, the reverse will happen if you need to add an element.

In many contexts such situations never come up, for example, you do not want to shift all the pixels in an image by one or two pixels from some random position in the image: their positions have scientific value. But in other contexts you will find yourself frequently adding/removing an a-priori unknown number of elements. Linked lists (or lists for short) are the data-container of choice in such situations. As in a chain, each node in a list is an independent C structure, keeping its own data along with pointer(s) to its immediate neighbor(s). Below, you can see one simple linked list node structure along with an ASCII art schematic of how we can use the next pointer to add any number of elements to the list that we want. By convention, a list is terminated when next is the NULL pointer.

struct list_float          /*     ---------    ---------           */
{                          /*     | Value |    | Value |           */
  float             value; /*     |  ---  |    |  ---  |           */
  struct list_float *next; /*     |  next-|--> |  next-|--> NULL   */
}                          /*     ---------    ---------           */

The schematic shows another great advantage of linked lists: it is very easy to add or remove/pop a node anywhere in the list. If you want to modify the first node, you just have to change one pointer. If it is in the middle, you just have to change two. You initially define a variable of this type with a NULL pointer as shown below:

struct list_float *list=NULL;

To add or remove/pop a node from the list you can use functions provided for the respective type in the sections below.

When you add an element to the list, it is conventionally added to the “top” of the list: the general list pointer will point to the newly created node, which will point to the previously created node and so on. So when you “pop” from the top of the list, you are actually retrieving the last value you put in and changing the list pointer to the next youngest node. This is thus known as a “last-in-first-out” list. This is the most efficient type of linked list (easier to implement and faster to process). Alternatively, you can add each newly created node at the end of the list. If you do that, you will get a “first-in-first-out” list. But that will force you to go through the whole list for each new element that is created (this will slow down the processing)263.

The node example above creates the simplest kind of a list. We can define each node with two pointers to both the next and previous neighbors, this is called a “Doubly linked list”. In general, lists are very powerful and simple constructs that can be very useful. But going into more detail would be out of the scope of this short introduction in this book. Wikipedia has a nice and more thorough discussion of the various types of lists. To appreciate/use the beauty and elegance of these powerful constructs even further, see Chapter 2 (Information Structures, in volume 1) of Donald Knuth’s “The art of computer programming”.

In this section we will review the functions and structures that are available in Gnuastro for working on lists. They differ by the type of data that each node can keep. For each linked-list node structure, we will first introduce the structure, then the functions for working on the structure. All these structures and functions are defined and declared in gnuastro/list.h.


Footnotes

(263)

A better way to get a first-in-first-out is to first keep the data as last-in-first-out until they are all read. Afterwards, reverse the list by popping each node and immediately add it to the new list. This practically reverses the last-in-first-out list to a first-in-first-out one. All the list types discussed in this chapter have a function with a _reverse suffix for this job.