When clipping outliers, it is important that the used measure of dispersion is itself not strongly affected by the outliers. Previously (in Sigma clipping), we saw that the standard deviation is not a good measure of dispersion because of its strong dependency on outliers. In this section, we’ll introduce clipping operators that are based on the median absolute deviation (MAD).
The median absolute deviation is defined as the median of the differences from the median (MAD requires taking two rounds of medians). As mathematically derived in the Wikipedia page above, for a pure Gaussian distribution, the median absolute deviation will be roughly \(0.67449\sigma\). We can confirm this numerically from the images with pure noise that we created previously in Building inputs and analysis without clipping. With the first command below we can see the raw standard deviation and median absolute deviation values and the second command shows their division:
$ aststatistics build/in-1.fits --std --mad 1.00137568561776e+01 6.74662296703343e+00 $ aststatistics build/in-1.fits --std --mad | awk '{print $2/$1}' 0.673735
The algorithm of MAD-clipping is identical to \(\sigma\)-clipping, except that instead of \(\sigma\), it uses the median absolute deviation. Since the median absolute deviation is smaller than the standard deviation by roughly 0.67, if you regularly use \(3\sigma\) there, you should use \((3/0.67)\rm{MAD}=(4.48)\rm{MAD}\) when doing MAD-clipping. The usual tolerance should also be changed due to the differing (discrete) nature of the median absolute deviation (based on sorted differences) in relation to the standard deviation (based on the sum of squared differences; which is more smooth). A tolerance of 0.01 is better suited to the termination criteria of MAD-clipping.
To demonstrate the steps in practice, let’s assume you have the original script in Building inputs and analysis without clipping with the changes shown in the first command below and With the second command we’ll execute the script.
$ grep '^clip_\|^profsum' script.sh profsum=1e5 clip_operator=madclip clip_multiple=4.5 clip_tolerance=0.01 $ bash ./script.sh
Let’s start by applying MAD-clipping on the image with the bright circle:
$ aststatistics build/in-9.fits --madclip Statistics (GNU Astronomy Utilities) 0.23 ------- Input: build/in-9.fits (hdu: 1) ------- 4.5-MAD clipping steps until relative change in MAD (median absolute deviation) is less than 0.01: round number median MAD 1 40401 1.09295e+01 7.38609e+00 2 38812 1.04313e+01 7.04036e+00 3 38567 1.03497e+01 6.98680e+00 ------- Statistics (after clipping): Number of input elements: 40401 Number of clips: 2 Final number of elements: 38567 Median: 1.034968e+01 Mean: 1.070246e+01 Standard deviation: 1.063998e+01 Median Absolute Deviation: 6.986797e+00
We see that the median, mean and standard deviation after MAD-clipping are much better than the \(\sigma\)-clipping (see Sigma clipping): the median is now 10.3 (was 10.5 in \(\sigma\)-clipping), mean is 10.7 (was 10.11) and the standard deviation is 10.6 (was 10.12).
Let’s compare the MAD-clipped stacks with the results of the previous section.
Since we want the images shown in a certain order, we’ll first construct the list of images (with a for
loop that will fill the imgs variable).
Note that this assumes you have ran and carefully read/understand all the commands in the previous sections (Building inputs and analysis without clipping and Sigma clipping).
$ imgs="" $ p=build/stack # 'p' is short for "prefix" $ for m in std mean median mad; do \ imgs="$imgs $p-$m.fits $p-sigclip-$m.fits $p-madclip-$m.fits"; \ done $ astscript-fits-view $imgs --ds9extra="-tile grid layout 3 4"
The first column shows the non-clipped stacks for each statistic (generated in Building inputs and analysis without clipping), the second column are \(\sigma\)-clipped stacks (generated in Sigma clipping), and the third column shows the newly created MAD-clipped stacks. We see that the circle is much more weaker in the MAD-clipped stacks than in the \(\sigma\)-clipped stacks in all rows (different statistics). Let’s confirm this with the numbers images of the two clipping methods:
$ astscript-fits-view -g2 \ build/stack-sigclip-median.fits \ build/stack-madclip-median.fits -g2 \ --ds9extra="-scale limits 1 9 -lock scalelimits yes"
In the numbers image of the MAD-clipped stack, you see the circle much more clearly. However, you also see that in the regions outside the circle, many random pixels have also been stacked with less than 9 input images! This is a caveat of MAD clipping and is expected: by nature MAD clipping is much more “noisy”. With the command below, let’s have a look at the statistics of the numbers image of the MAD-clipping. With the second, let’s see how many pixels used fewer than 5 input images:
$ aststatistics build/stack-madclip-median.fits -h2 \ --numasciibins=60 Statistics (GNU Astronomy Utilities) 0.23 ------- Input: build/stack-madclip-median.fits (hdu: 2) ------- Number of elements: 40401 Minimum: 2 Maximum: 9 Median: 9 Mean: 8.500284646 Standard deviation: 1.1275244 ------- | * | * | * | * | * | * | * | * | * * | * * * |* * * * * * * * |------------------------------------------------------------ $ aststatistics build/stack-madclip-median.fits -h2 \ --lessthan=5 --number 686
Almost 700 pixels were made with less than 5 inputs! The large number of random pixels that have been clipped is not good and can make it hard to understand the noise statistics of the stack.
Ultimately, even MAD-clipping is not perfect and even though the circle is weaker, we still see the circle in all four cases, even with the MAD-clipped median (more clearly: after smoothing/blocking).
The reason is similar to what was described in \(\sigma\)-clipping (using the original profsum=3e5
: the “holes” in the numbers image.
Because the circle’s pixel values are not too distant from the noise and the noisy nature of the MAD, some of its elements do not get clipped, and their stacked value gets systematically higher than the rest of the image.
Fortunately all is not lost! In Gnuastro, we have a fix for such contiguous outliers that is described fully in the next section (Contiguous outliers).
JavaScript license information
GNU Astronomy Utilities 0.23 manual, July 2024.