Next: Readline Support, Previous: Data Files, Up: Units Conversion [Contents][Index]
The standard units data file is in Unicode, using UTF-8 encoding. Most definitions use only ASCII characters (i.e., code points U+0000 through U+007F); definitions using non-ASCII characters appear in blocks beginning with ‘!utf8’ and ending with ‘!endutf8’.
The non-ASCII definitions are loaded only if the platform and the locale
support UTF-8. Platform support is determined when units
is compiled; the locale is checked at every invocation of
units
. To see if your version of units
includes
Unicode support, invoke the program with the --version option.
When Unicode support is available, units
checks every line
within UTF-8 blocks in all of the units data files for invalid or
non-printing UTF-8 sequences; if such sequences occur, units
ignores the entire line. In addition to checking validity,
units
determines the display width of non-ASCII characters to
ensure proper positioning of the pointer in some error messages and to
align columns for the ‘search’ and ‘?’ commands.
Microsoft Windows supports UTF-8 in console applications running in Windows Terminal; UTF-8 is not supported in applications running in the older Windows Console Host—see Unicode Support on Windows. The UTF-16 and UTF-32 encodings are not supported on any platforms.
If Unicode support is available and definitions that contain non-ASCII UTF-8 characters are added to a units data file, those definitions should be enclosed within ‘!utf8’ … ‘!endutf8’ to ensure that they are only loaded when Unicode support is available. As usual, the ‘!’ must appear as the first character on the line. As discussed in Units Data Files, it’s usually best to put such definitions in supplemental data files linked by an ‘!include’ command or in a personal units data file.
When Unicode support is not available, units
makes no assumptions
about character encoding, except that characters in the range 00–7F
hexadecimal correspond to ASCII encoding. Non-ASCII characters are
simply sequences of bytes, and have no special meanings; for definitions
in supplementary units data files, you can use any encoding consistent
with this assumption. For example, if you wish to use non-ASCII
characters in definitions when running units
under Windows,
you can use a character set such as Windows “ANSI” (code page 1252 in
the US and Western Europe); if this is done, the console code page must
be set to the same encoding for the characters to display properly.
You can even use UTF-8, though some messages may be improperly aligned,
and units
will not detect invalid UTF-8 sequences. If you use
UTF-8 encoding when Unicode support is not available, you should place any
definitions with non-ASCII characters outside ‘!utf8’
… ‘!endutf8’ blocks—otherwise, they will be ignored.
Except for code examples, typeset material usually uses the Unicode
symbols for mathematical operators.
To facilitate copying and pasting from such sources, several
typographical characters are converted to the ASCII operators
used in units
:
the figure dash (U+2012),
minus (‘-’; U+2212),
and en dash (‘–’; U+2013) are converted to the operator ‘-’;
the multiplication sign (‘×’; U+00D7),
N-ary times operator (U+2A09),
dot operator (‘⋅’; U+22C5),
and middle dot (‘·’; U+00B7)
are converted to the operator ‘*’;
the division sign (‘÷’; U+00F7)
is converted to the operator ‘/’;
and the fraction slash (U+2044) is converted to the operator ‘|’.
Next: Readline Support, Previous: Data Files, Up: Units Conversion [Contents][Index]