GNU Astronomy Utilities



12.3.23 Fitting functions (fit.h)

After doing a measurement, it is usually necessary to parameterize the relation that has been found. The functions in this section are wrappers over the GNU Scientific Library (GSL) Linear Least-Squares Fitting, to make them easily accessible using Gnuastro’s Generic data container (gal_data_t). The respective GSL function is mentioned under each function.

Global integer: GAL_FIT_INVALID
Global integer: GAL_FIT_LINEAR
Global integer: GAL_FIT_LINEAR_WEIGHTED
Global integer: GAL_FIT_LINEAR_NO_CONSTANT
Global integer: GAL_FIT_LINEAR_NO_CONSTANT_WEIGHTED
Global integer: GAL_FIT_POLYNOMIAL
Global integer: GAL_FIT_POLYNOMIAL_WEIGHTED
Global integer: GAL_FIT_POLYNOMIAL_NUMBER

Identifiers for the various types of fitting functions. These can be used by the callers of these functions to select between various fitting types. They can easily be converted to, and from, fixed human-readable strings using the gal_fit_name_* functions below. The last one GAL_FIT_ROBUST_NUMBER is the total number of available fitting methods (can be used to add more macros in the calling program and to avoid overlaps with existing codes).

Global integer: GAL_FIT_ROBUST_INVALID
Global integer: GAL_FIT_ROBUST_DEFAULT
Global integer: GAL_FIT_ROBUST_BISQUARE
Global integer: GAL_FIT_ROBUST_CAUCHY
Global integer: GAL_FIT_ROBUST_FAIR
Global integer: GAL_FIT_ROBUST_HUBER
Global integer: GAL_FIT_ROBUST_OLS
Global integer: GAL_FIT_ROBUST_WELSCH
Global integer: GAL_FIT_ROBUST_NUMBER

Identifiers for the various types of robust polynomial fitting functions. For a description of each, see https://www.gnu.org/s/gsl/doc/html/lls.html#c.gsl_multifit_robust_alloc. The last one GAL_FIT_ROBUST_NUMBER is the total number of available functions (can be used to add more macros in the calling program and to avoid overlaps with existing codes).

Function:
uint8_t
gal_fit_name_to_id (char *name)

Return the internal code of a standard human-readable name for the various fitting functions. If the name is not recognized, the returned value will be GAL_FIT_INVALID.

Function:
char *
gal_fit_name_from_id (uint8_t fitid)

Return a standard human-readable name for the fitting function identified with the fitid (read as “fitting ID”). If the fitting ID couldn’t be recognized, a NULL pointer is returned.

Function:
uint8_t
gal_fit_name_robust_to_id (char *name)

Return the internal code of a standard human-readable name for the various robust fitting types. If the name is not recognized, the returned value will be GAL_FIT_INVALID.

Function:
char *
gal_fit_name_robust_from_id (uint8_t robustid)

Return a standard human-readable name for the input robust fitting type. If the fitting ID couldn’t be recognized, a NULL pointer is returned.

Function:
gal_data_t *
gal_fit_1d_linear (gal_data_t *xin, gal_data_t *yin, gal_data_t *ywht)

Preform a 1D linear regression fit with a constant term272 in the form of \(Y=c_0+c_1X\). The input xin contains the independent variable values and yin contains the measured variable values for each independent variable. When ywht!=NULL, it is assumed to contain the “weight” of each Y measurement (if you don’t have weights on your measured values, simply set this to NULL). The weight of each measurement is the inverse of its variance. For a Gaussian error distribution with standard deviation \(\sigma\), the weight is therefore \(1/\sigma^2\).

If any of the values in any of the inputs is blank (NaN in floating point), the final fitted parameters will all be NaN. To remove rows with a NaN/blank, you can use gal_blank_remove_rows (which will remove all rows with a blank values in any of the columns with a single call).

The output is a single dataset with a GAL_TYPE_FLOAT64 type with 6 elements:

  1. \(c_0\): the constant in \(Y=c_0+c_1X\).
  2. \(c_1\): the multiple in \(Y=c_0+c_1X\).
  3. First element of variance-covariance matrix.
  4. Second and third (which are equal) elements of the variance-covariance matrix.
  5. Fourth element of the variance-covariance matrix.
  6. The reduced \(\chi^2\) of the fit.
Function:
gal_data_t *
gal_fit_1d_linear_no_constant (gal_data_t *xin, gal_data_t *yin, gal_data_t *ywht)

Preform a 1D linear regression fit without a constant term273, formally: \(Y=c_1X\). The input xin contains the independent variable values and yin contains the measured variable values for each independent variable. When ywht!=NULL, it is assumed to contain the “weight” of each Y measurement (if you don’t have weights on your measured values, simply set this to NULL). The weight of each measurement is the inverse of its variance. For a Gaussian error distribution with standard deviation \(\sigma\), the weight is therefore \(1/\sigma^2\).

If any of the values in any of the inputs is blank (NaN in floating point), the final fitted parameters will all be NaN. To remove rows with a NaN/blank, you can use gal_blank_remove_rows (which will remove all rows with a blank values in any of the columns with a single call).

The output is a single dataset with a GAL_TYPE_FLOAT64 type with 3 elements:

  1. \(c_1\): the multiple in \(Y=c_0+c_1X\).
  2. Variance of \(c_1\).
  3. The reduced \(\chi^2\) of the fit.
Function:
gal_data_t *
gal_fit_1d_linear_estimate (gal_data_t *fit, gal_data_t *xin)

Given a linear least squares fit output (fit), estimate the fit on an arbitrary number of independent variable (horizontal axis, or X, in an X-Y plot) within xin. fit is assumed to be the output of either gal_fit_1d_linear or gal_fit_1d_linear_no_constant. In case you haven’t used those functions to obtain the constants and covariance matrix elements, see the description of those functions for the expected format of fit.

This function returns two columns (as a List of gal_data_t): The top node of the list is the estimated values at the input X-axis positions, and the next node is the errors in the estimation. Naturally, both have the same number of elements as xin. Being a list, helps in easily printing the output columns to a table (see Table input output (table.h)).

Function:
gal_data_t *
gal_fit_1d_polynomial (gal_data_t *xin, gal_data_t *yin, gal_data_t *ywht, size_t maxpower, double *redchisq)

Preform a 1D polynomial fit, formally: \(Y=c+0+c_1X+c_2X^2+\cdots+c_nX^n\) (using GSL’s multi-parameter regression274). The largest power of \(X\) is determined with the maxpower argument (which is \(n\) in the equation above). The reduced \(\chi^2\) of the fit is written in the space that *redchisq points to.

The input xin contains the independent variable values and the input yin contains the measured variable values for each independent variable. When ywht!=NULL, it is assumed to contain the “weight” of each Y measurement (if you don’t have weights on your measured values, simply set this to NULL). The weight of each measurement is the inverse of its variance. For a Gaussian error distribution with standard deviation \(\sigma\), the weight is therefore \(1/\sigma^2\).

If any of the values in any of the inputs is blank (NaN in floating point), the final fitted parameters will all be NaN. To remove rows with a NaN/blank, you can use gal_blank_remove_rows (which will remove all rows with a blank values in any of the columns with a single call).

The output of this function is a list of two datasets, linked as a list (as a List of gal_data_t). Both have a GAL_TYPE_FLOAT64 type, and are described below (in order).

  1. A one dimensional and contains \(n+1\) elements (for the \(n+1\) constants that have been found \((c_0, c_1, c_2, \cdots, c_n)\).
  2. A two dimensional variance-covariance matrix with \((n+1)\times(n+1)\) elements.
Function:
gal_data_t *
gal_fit_1d_polynomial_robust (gal_data_t *xin, gal_data_t *yin, size_t maxpower, uint8_t robustid, double *redchisq)

Preform a 1D robust polynomial fit, formally: \(Y=c+0+c_1X+c_2X^2+\cdots+c_nX^n\) (using GSL’s robust linear regression275). See the description there for the details.

The inputs and outputs of this function are almost identical to gal_fit_1d_polynomial, with the difference that you need to specify the function to reject outliers through the robustid input argument. You can pass any of the GAL_FIT_ROBUST_* codes defined at the top of this section to this (the names are identical to the names in GSL).

Function:
gal_data_t *
gal_fit_1d_polynomial_estimate (gal_data_t *fit, gal_data_t *xin)

Given a 1D polynomial fit output (fit), estimate the fit on an arbitrary number of independent variable (horizontal axis, or X, in an X-Y plot) within xin. fit is assumed to be the output of gal_fit_1d_polynomial. In case you haven’t used this function to obtain the constants and covariance matrix, see the description of that function for the expected format of fit.

This function returns two columns (as a List of gal_data_t): The top node of the list is the estimated values at the input X-axis positions, and the next node is the errors in the estimation. Naturally, both have the same number of elements as xin. Being a list, helps in easily printing the output columns to a table (see Table input output (table.h)).


Footnotes

(272)

https://www.gnu.org/s/gsl/doc/html/lls.html#linear-regression-with-a-constant-term

(273)

https://www.gnu.org/s/gsl/doc/html/lls.html#linear-regression-without-a-constant-term

(274)

https://www.gnu.org/s/gsl/doc/html/lls.html#multi-parameter-regression

(275)

https://www.gnu.org/software/gsl/doc/html/lls.html#robust-linear-regression