gal_data_t
) ¶To be able to deal with any dataset (various dimensions, numeric data types, units and higher-level structures), Gnuastro defines the gal_data_t
type which is the input/output container of choice for many of Gnuastro library’s functions.
It is defined in gnuastro/data.h.
If you will be using (‘# include
’ing) those libraries, you do not need to include this header explicitly, it is already included by any library header that uses gal_data_t
.
struct
): gal_data_t ¶The main container for datasets in Gnuastro.
It can host data of any dimensions, with any numeric data type.
It is actually a structure, but typedef
’d as a new type to avoid having to write the struct
before any declaration.
The actual structure is shown below which is followed by a description of each element.
typedef struct gal_data_t { void *restrict array; /* Basic array information. */ uint8_t type; size_t ndim; size_t *dsize; size_t size; int quietmmap; char *mmapname; size_t minmapsize; int nwcs; /* WCS information. */ struct wcsprm *wcs; uint8_t flag; /* Content description. */ int status; char *name; char *unit; char *comment; int disp_fmt; /* For text printing. */ int disp_width; int disp_precision; struct gal_data_t *next; /* For higher-level datasets. */ struct gal_data_t *block; } gal_data_t;
The list below contains a description for each gal_data_t
element.
void *restrict array
This is the pointer to the main array of the dataset containing the raw data (values).
All the other elements in this data-structure are actually meta-data enabling us to use/understand the series of values in this array.
It must allow data of any type (see Numeric data types), so it is defined as a void *
pointer.
A void *
array is not directly usable in C, so you have to cast it to proper type before using it, please see Library demo - reading a FITS image for a demonstration.
The restrict
keyword was formally introduced in C99 and is used to tell the compiler that at any moment only this pointer will modify what it points to (a pixel in an image for example)260.
This extra piece of information can greatly help in compiler optimizations and thus the running time of the program.
But older compilers might not have this capability, so at ./configure
time, Gnuastro checks this feature and if the user’s compiler does not support restrict
, it will be removed from this definition.
uint8_t type
¶A fixed code (integer) used to identify the type of data in array
(see Numeric data types).
For the list of acceptable values to this variable, please see Library data types (type.h).
size_t ndim
The dataset’s number of dimensions.
size_t *dsize
¶The size of the dataset along each dimension.
This is an array (with ndim
elements), of positive integers in row-major order261 (based on C).
When a data file is read into memory with Gnuastro’s libraries, this array is dynamically allocated based on the number of dimensions that the dataset has.
It is important to remember that C’s row-major ordering is the opposite of the FITS standard which is in column-major order: in the FITS standard the fastest dimension’s size is specified by NAXIS1
, and slower dimensions follow.
The FITS standard was defined mainly based on the FORTRAN language which is the opposite of C’s approach to multi-dimensional arrays (and also starts counting from 1 not 0).
Hence if a FITS image has NAXIS1==20
and NAXIS2==50
, the dsize
array must be filled with dsize[0]==50
and dsize[1]==20
.
The fastest dimension is the one that is contiguous in memory: to increment by one along that dimension, just go to the next element in the array. As we go to slower dimensions, the number of memory cells we have to skip for an increment along that dimension becomes larger.
size_t size
The total number of elements in the dataset.
This is actually a multiplication of all the values in the dsize
array, so it is not an independent parameter.
However, low-level operations with the dataset (irrespective of its dimensions) commonly need this number, so this element is designed to avoid calculating it every time.
int quietmmap
When this value is zero, and the dataset must not be allocated in RAM (see mmapname
and minmapsize
below), a warning will be printed to inform the user when the file is created and when it is deleted.
The warning includes the filename, the size in bytes, and the fact that they can toggle this behavior through --minmapsize
option in Gnuastro’s programs.
char *mmapname
Name of file hosting the mmap
’d contents of array
.
If the value of this variable is NULL
, then the contents of array
are actually stored in RAM, not in a file on the HDD/SSD.
See the description of minmapsize
below for more.
If a file is used, it will be kept in the gnuastro_mmap directory of the running directory.
Its name is randomly selected to allow multiple arrays at the same time, see description of --minmapsize in Processing options.
When gal_data_free
is called the randomly named file will be deleted.
size_t minmapsize
The minimum size of an array (in bytes) to store the contents of array
as a file (on the non-volatile HDD/SSD), not in RAM.
This can be very useful for large datasets which can be very memory intensive and the user’s RAM might not be sufficient to keep/process it.
A random filename is assigned to the array which is available in the mmapname
element of gal_data_t
(above), see there for more.
minmapsize
is stored in each gal_data_t
, so it can be passed on to subsequent/derived datasets.
See the description of the --minmapsize option in Processing options for more on using this value.
nwcs
The number of WCS coordinate representations (for WCSLIB).
struct wcsprm *wcs
The main WCSLIB structure keeping all the relevant information necessary for WCSLIB to do its processing and convert data-set positions into real-world positions.
When it is given a NULL
value, all possible WCS calculations/measurements will be ignored.
uint8_t flag
Bitwise flags to describe general properties of the dataset.
The number of bytes available in this flag is stored in the GAL_DATA_FLAG_SIZE
macro.
Note that you should use bitwise operators262 to check these flags.
The currently recognized bits are stored in these macros:
GAL_DATA_FLAG_BLANK_CH
¶Marking that the dataset has been checked for blank values or not.
When a dataset does not have any blank values, the GAL_DATA_FLAG_HASBLANK
bit will be zero.
But upon initialization, all bits also get a value of zero.
Therefore, a checker needs this flag to see if the value in GAL_DATA_FLAG_HASBLANK
is reliable (dataset has actually been parsed for a blank value) or not.
Also, if it is necessary to re-check the presence of flags, you just have to set this flag to zero and call gal_blank_present
for example, to parse the dataset and check for blank values.
Note that for improved efficiency, when this flag is set, gal_blank_present
will not actually parse the dataset, it will just use GAL_DATA_FLAG_HASBLANK
.
GAL_DATA_FLAG_HASBLANK
This bit has a value of 1
when the given dataset has blank values.
If this bit is 0
and GAL_DATA_FLAG_BLANK_CH
is 1
, then the dataset has been checked and it did not have any blank values, so there is no more need for further checks.
GAL_DATA_FLAG_SORT_CH
Marking that the dataset is already checked for being sorted or not and thus that the possible 0
values in GAL_DATA_FLAG_SORTED_I
and GAL_DATA_FLAG_SORTED_D
are meaningful.
The logic behind this is similar to that in GAL_DATA_FLAG_BLANK_CH
.
GAL_DATA_FLAG_SORTED_I
This bit has a value of 1
when the given dataset is sorted in an increasing manner.
If this bit is 0
and GAL_DATA_FLAG_SORT_CH
is 1
, then the dataset has been checked and was not sorted (increasing), so there is no more need for further checks.
GAL_DATA_FLAG_SORTED_D
This bit has a value of 1
when the given dataset is sorted in a decreasing manner.
If this bit is 0
and GAL_DATA_FLAG_SORT_CH
is 1
, then the dataset has been checked and was not sorted (decreasing), so there is no more need for further checks.
The macro GAL_DATA_FLAG_MAXFLAG
contains the largest internally used bit-position.
Higher-level flags can be defined with the bitwise shift operators using this macro to define internal flags for libraries/programs that depend on Gnuastro without causing any possible conflict with the internal flags discussed above or having to check the values manually on every release.
int status
A context-specific status values for this data-structure. This integer will not be set by Gnuastro’s libraries. You can use it keep some additional information about the dataset (with integer constants) depending on your applications.
char *name
The name of the dataset.
If the dataset is a multi-dimensional array and read/written as a FITS image, this will be the value in the EXTNAME
FITS keyword.
If the dataset is a one-dimensional table column, this will be the column name.
If it is set to NULL
(by default), it will be ignored.
char *unit
The units of the dataset (for example, BUNIT
in the standard FITS keywords) that will be read from or written to files/tables along with the dataset.
If it is set to NULL
(by default), it will be ignored.
char *comment
Any further explanation about the dataset which will be written to any output file if present.
disp_fmt
Format to use for printing each element of the dataset to a plain text file, the acceptable values to this element are defined in Table input output (table.h).
Based on C’s printf
standards.
disp_width
Width of printing each element of the dataset to a plain text file, the acceptable values to this element are defined in Table input output (table.h).
Based on C’s printf
standards.
disp_precision
Precision of printing each element of the dataset to a plain text file, the acceptable values to this element are defined in Table input output (table.h).
Based on C’s printf
standards.
gal_data_t *next
Through this pointer, you can link a gal_data_t
with other datasets related datasets, for example, the different columns in a dataset each have one gal_data_t
associate with them and they are linked to each other using this element.
There are several functions described below to facilitate using gal_data_t
as a linked list.
See Linked lists (list.h) for more on these wonderful high-level constructs.
gal_data_t *block
Pointer to the start of the complete allocated block of memory.
When this pointer is not NULL
, the dataset is not treated as a contiguous patch of memory.
Rather, it is seen as covering only a portion of the larger patch of memory that block
points to.
See Tessellation library (tile.h) for a more thorough explanation and functions to help work with tiles that are created from this pointer.
Also see https://en.wikipedia.org/wiki/Restrict.
Also see https://en.wikipedia.org/wiki/Row-_and_column-major_order.
GNU Astronomy Utilities 0.23 manual, July 2024.