In many contexts, it is desirable to slice the dataset into subsets or tiles (overlapping or not).
In such a way that you can work on each tile independently.
One method would be to copy that region to a separate allocated space, but in many contexts this is not necessary and in fact can be a big burden on CPU/Memory usage.
The block
pointer in Gnuastro’s Generic data container (gal_data_t
) is defined for such situations: where allocation is not necessary.
You just want to read the data or write to it independently (or in coordination with) other regions of the dataset.
Added with parallel processing, this can greatly improve the time/memory consumption.
See the figure below for example: assume the larger
dataset is a contiguous block of memory that you are interpreting as a 2D array.
But you only want to work on the smaller tile
region.
larger --------------------------------- | | | tile | | ---------- | | | | | | |_ | | | |*| | | | ---------- | | tile->block = larger | |_ | |*| | ---------------------------------
To use gal_data_t
’s block
concept, you allocate a gal_data_t *tile
which is initialized with the pointer to the first element in the sub-array (as its array
argument).
Note that this is not necessarily the first element in the larger array.
You can set the size of the tile along with the initialization as you please.
Recall that, when given a non-NULL
pointer as array
, gal_data_initialize
(and thus gal_data_alloc
) do not allocate any space and just uses the given pointer for the new array
element of the gal_data_t
.
So your tile
data structure will not be pointing to a separately allocated space.
After the allocation is done, you just point tile->block
to the larger
dataset which hosts the full block of memory.
Where relevant, Gnuastro’s library functions will check the block
pointer of their input dataset to see how to deal with dimensions and increments so they can always remain within the tile.
The tools introduced in this section are designed to help in defining and working with tiles that are created in this manner.
Since the block structure is defined as a pointer, arbitrary levels of tessellation/grid-ing are possible (tile->block
may itself be a tile in an even larger allocated space).
Therefore, just like a linked-list (see Linked lists (list.h)), it is important to have the block
pointer of the largest (allocated) dataset set to NULL
.
Normally, you will not have to worry about this, because gal_data_initialize
(and thus gal_data_alloc
) will set the block
element to NULL
by default, just remember not to change it.
You can then only change the block
element for the tiles you define over the allocated space.
Below, we will first review constructs for Independent tiles and then define the current approach to fully tessellating a dataset (or covering every pixel/data-element with a non-overlapping tile grid in Tile grid. This approach to dealing with parts of a larger block was inspired from a similarly named concept in the GNU Scientific Library (GSL), see its “Vectors and Matrices” chapter for their implementation.
GNU Astronomy Utilities 0.23 manual, July 2024.