GNU Astronomy Utilities



5.2.2 Recognized file formats

The various standards and the file name extensions recognized by ConvertType are listed below. For a review on the difference between Raster and Vector graphics, see Raster and Vector graphics. For a review on the concept of color and channels, see Color. Currently, except for the FITS format, Gnuastro uses the file name’s suffix to identify the format, so if the file’s name does not end with one of the suffixes mentioned below, it will not be recognized.

FITS or IMH

Astronomical data are commonly stored in the FITS format (or the older data IRAF .imh format), a list of file name suffixes which indicate that the file is in this format is given in Arguments. FITS is a raster graphics format.

Each image extension of a FITS file only has one value per pixel/element. Therefore, when used as input, each input FITS image contributes as one color channel. If you want multiple extensions in one FITS file for different color channels, you have to repeat the file name multiple times and use the --hdu, --hdu2, --hdu3 or --hdu4 options to specify the different extensions.

JPEG

The JPEG standard was created by the Joint photographic experts group. It is currently one of the most commonly used image formats. Its major advantage is the compression algorithm that is defined by the standard. Like the FITS standard, this is a raster graphics format, which means that it is pixelated.

A JPEG file can have 1 (for gray-scale), 3 (for RGB) and 4 (for CMYK) color channels. If you only want to convert one JPEG image into other formats, there is no problem, however, if you want to use it in combination with other input files, make sure that the final number of color channels does not exceed four. If it does, then ConvertType will abort and notify you.

The file name endings that are recognized as a JPEG file for input are: .jpg, .JPG, .jpeg, .JPEG, .jpe, .jif, .jfif and .jfi.

TIFF

TIFF (or Tagged Image File Format) was originally designed as a common format for scanners in the early 90s and since then it has grown to become very general. In many aspects, the TIFF standard is similar to the FITS image standard: it can allow data of many types (see Numeric data types), and also allows multiple images to be stored in a single file (like a FITS extension: each image in the file is called a ‘directory’ in the TIFF standard). However, unlike FITS, it can only store images, it has no constructs for tables. Also unlike FITS, each ‘directory’ of a TIFF file can have a multi-channel (e.g., RGB) image. Another (inconvenient) difference with the FITS standard is that keyword names are stored as numbers, not human-readable text.

However, outside of astronomy, because of its support of different numeric data types, many fields use TIFF images for accurate (for example, 16-bit integer or floating point for example) imaging data.

EPS

The Encapsulated PostScript (EPS) format is essentially a one page PostScript file which has a specified size. Postscript is used to store a full document like this whole Gnuastro book. PostScript therefore also includes non-image data, for example, lines and texts. It is a fully functional programming language to describe a document. A PostScript file is a plain text file that can be edited like any program source with any plain-text editor. Therefore in ConvertType, EPS is only an output format and cannot be used as input. Contrary to the FITS or JPEG formats, PostScript is not a raster format, but is categorized as vector graphics.

With these features in mind, you can see that when you are compiling a document with TeX or LaTeX, using an EPS file is much more low level than a JPEG and thus you have much greater control and therefore quality. Since it also includes vector graphic lines we also use such lines to make a thin border around the image to make its appearance in the document much better. Furthermore, through EPS, you can add marks over the image in many shapes and colors. No matter the resolution of the display or printer, these lines will always be clear and not pixelated. However, this can be done better with tools within TeX or LaTeX such as PGF/Tikz133.

If the final input image (possibly after all operations on the flux explained below) is a binary image or only has two colors of black and white (in segmentation maps for example), then PostScript has another great advantage compared to other formats. It allows for 1 bit pixels (pixels with a value of 0 or 1), this can decrease the output file size by 8 times. So if a gray-scale image is binary, ConvertType will exploit this property in the EPS and PDF (see below) outputs.

The standard formats for an EPS file are .eps, .EPS, .epsf and .epsi. The EPS outputs of ConvertType have the .eps suffix.

PDF

The Portable Document Format (PDF) is currently the most common format for documents. It is a vector graphics format, allowing abstract constructs like marks or borders.

The PDF format is based on Postscript, so it shares all the features mentioned above for EPS. To be able to display it is programmed content or print, a Postscript file needs to pass through a processor or compiler. A PDF file can be thought of as the processed output of the PostScript compiler. PostScript, EPS and PDF were created and are registered by Adobe Systems.

As explained under EPS above, a PDF document is a static document description format, viewing its result is therefore much faster and more efficient than PostScript. To create a PDF output, ConvertType will make an EPS file and convert that to PDF using GPL Ghostscript. The suffixes recognized for a PDF file are: .pdf, .PDF. If GPL Ghostscript cannot be run on the PostScript file, The EPS will remain and a warning will be printed (see Optional dependencies).

blank

This is not actually a file type! But can be used to fill one color channel with a blank value. If this argument is given for any color channel, that channel will not be used in the output.

Plain text

The value of each pixel in a 2D image can be written as a 2D matrix in a plain-text file. Therefore, for the purpose of ConvertType, plain-text files are a single-channel raster graphics file format.

Plain text files have the advantage that they can be viewed with any text editor or on the command-line. Most programs also support input as plain text files. As input, each plain text file is considered to contain one color channel.

In ConvertType, the recognized extensions for plain text files are .txt and .dat. As described in Invoking ConvertType, if you just give these extensions, (and not a full filename) as output, then automatic output will be preformed to determine the final output name (see Automatic output). Besides these, when the format of a file cannot be recognized from its name, ConvertType will fall back to plain text mode. So you can use any name (even without an extension) for a plain text input or output. Just note that when the suffix is not recognized, automatic output will not be preformed.

The basic input/output on plain text images is very similar to how tables are read/written as described in Gnuastro text table format. Simply put, the restrictions are very loose, and there is a convention to define a name, units, data type (see Numeric data types), and comments for the data in a commented line. The only difference is that as a table, a text file can contain many datasets (columns), but as a 2D image, it can only contain one dataset. As a result, only one information comment line is necessary for a 2D image, and instead of the starting ‘# Column N’ (N is the column number), the information line for a 2D image must start with ‘# Image 1’. When ConvertType is asked to output to plain text file, this information comment line is written before the image pixel values.

When converting an image to plain text, consider the fact that if the image is large, the number of columns in each line will become very large, possibly making it very hard to open in some text editors.

Standard output (command-line)

This is very similar to the plain text output, but instead of creating a file to keep the printed values, they are printed on the command-line. This can be very useful when you want to redirect the results directly to another program in one command with no intermediate file. The only difference is that only the pixel values are printed (with no information comment line). To print to the standard output, set the output name to ‘stdout’.


Footnotes

(133)

http://sourceforge.net/projects/pgf/