Using Symbols (The GNU Troff Manual)

5.19.4 Using Symbols

A glyph is a graphical representation of a character. While a character is an abstraction of semantic information, a glyph is something that can be seen on screen or paper. A character has many possible representation forms (for example, the character ‘A’ can be written in an upright or slanted typeface, producing distinct glyphs). Sometimes, a sequence of characters map to a single glyph: this is a ligature—the most common is ‘fi’.

Space characters never become glyphs in GNU troff. If not discarded (as when trailing on text lines), they are represented by horizontal motions in the output.

A symbol is simply a named glyph. Within gtroff, all glyph names of a particular font are defined in its font file. If the user requests a glyph not available in this font, gtroff looks up an ordered list of special fonts. By default, the PostScript output device supports the two special fonts ‘SS’ (slanted symbols) and ‘S’ (symbols) (the former is looked up before the latter). Other output devices use different names for special fonts. Fonts mounted with the fonts keyword in the DESC file are globally available. To install additional special fonts locally (i.e., for a particular font), use the fspecial request.

Here are the exact rules how gtroff searches a given symbol:

If the symbol has been defined with the char request, use it. This hides a symbol with the same name in the current font.
Check the current font.
If the symbol has been defined with the fchar request, use it.
Check whether the current font has a font-specific list of special fonts; test all fonts in the order of appearance in the last fspecial call if appropriate.
If the symbol has been defined with the fschar request for the current font, use it.
Check all fonts in the order of appearance in the last special call.
If the symbol has been defined with the schar request, use it.
As a last resort, consult all fonts loaded up to now for special fonts and check them, starting with the lowest font number. This can sometimes lead to surprising results since the fonts line in the DESC file often contains empty positions, which are filled later on. For example, consider the following:
```
fonts 3 0 0 FOO
```
This mounts font foo at font position 3. We assume that FOO is a special font, containing glyph foo, and that no font has been loaded yet. The line
```
.fspecial BAR BAZ
```
makes font BAZ special only if font BAR is active. We further assume that BAZ is really a special font, i.e., the font description file contains the special keyword, and that it also contains glyph foo with a special shape fitting to font BAR. After executing fspecial, font BAR is loaded at font position 1, and BAZ at position 2.

We now switch to a new font XXX, trying to access glyph foo that is assumed to be missing. There are neither font-specific special fonts for XXX nor any other fonts made special with the special request, so gtroff starts the search for special fonts in the list of already mounted fonts, with increasing font positions. Consequently, it finds BAZ before FOO even for XXX, which is not the intended behaviour.

See Device and Font Description Files, and Special Fonts, for more details.

The groff_char(7) man page houses a complete list of predefined special character names, but the availability of any as a glyph is device- and font-dependent. For example, say

man -Tdvi groff_char > groff_char.dvi

to obtain those available with the DVI device and default font configuration.⁷⁷ If you want to use an additional macro package to change the fonts used, groff (or gtroff) must be run directly.

groff -Tdvi -mec -man groff_char.7 > groff_char.dvi

Special character names not listed in groff_char(7) are derived algorithmically, using a simplified version of the Adobe Glyph List (AGL) algorithm, which is described in https://github.com/adobe-type-tools/agl-aglfn. The (frozen) set of names that can’t be derived algorithmically is called the groff glyph list (GGL).

A glyph for Unicode character U+XXXX[X[X]], which is not a composite character is named uXXXX[X[X]]. X must be an uppercase hexadecimal digit. Examples: u1234, u008E, u12DB8. The largest Unicode value is 0x10FFFF. There must be at least four X digits; if necessary, add leading zeroes (after the ‘u’). No zero padding is allowed for character codes greater than 0xFFFF. Surrogates (i.e., Unicode values greater than 0xFFFF represented with character codes from the surrogate area U+D800-U+DFFF) are not allowed either.
A glyph representing more than a single input character is named
```
‘u’ component1 ‘_’ component2 ‘_’ component3 …
```
Example: u0045_0302_0301.

For simplicity, all Unicode characters that are composites must be maximally decomposed to NFD;⁷⁸ for example, u00CA_0301 is not a valid glyph name since U+00CA (LATIN CAPITAL LETTER E WITH CIRCUMFLEX) can be further decomposed into U+0045 (LATIN CAPITAL LETTER E) and U+0302 (COMBINING CIRCUMFLEX ACCENT). u0045_0302_0301 is thus the glyph name for U+1EBE, LATIN CAPITAL LETTER E WITH CIRCUMFLEX AND ACUTE.
groff maintains a table to decompose all algorithmically derived glyph names that are composites itself. For example, u0100 (LATIN LETTER A WITH MACRON) is automatically decomposed into u0041_0304. Additionally, a glyph name of the GGL is preferred to an algorithmically derived glyph name; groff also automatically does the mapping. Example: The glyph u0045_0302 is mapped to ^E.
glyph names of the GGL can’t be used in composite glyph names; for example, ^E_u0301 is invalid.

Escape sequence: \(nm

Escape sequence: \[name]

Escape sequence: \[base-glyph combining-component …]

Typeset a special character name (two-character name nm) or a composite glyph consisting of base-glyph overlaid with one or more combining-components. For example, ‘\[A ho]’ is a capital letter “A” with a “hook accent” (ogonek).

There is no special syntax for one-character names—the analogous form ‘\n’ would collide with other escape sequences. However, the four escape sequences \', \-, \_, and \`, are translated on input to the special character escape sequences \[aa], \[-], \[ul], and \[ga], respectively.

A special character name of length one is not the same thing as an ordinary character: that is, the character a is not the same as \[a].

If name is undefined, a warning in category ‘char’ is produced and the escape is ignored. See Warnings, for information about the enablement and suppression of warnings.

GNU troff resolves \[…] with more than a single component as follows:

Any component that is found in the GGL is converted to the uXXXX form.
Any component uXXXX that is found in the list of decomposable glyphs is decomposed.
The resulting elements are then concatenated with ‘_’ in between, dropping the leading ‘u’ in all elements but the first.

No check for the existence of any component (similar to tr request) is done.

Examples:

\[A ho]: ‘A’ maps to u0041, ‘ho’ maps to u02DB, thus the final glyph name would be u0041_02DB. This is not the expected result: the ogonek glyph ‘ho’ is a spacing ogonek, but for a proper composite a non-spacing ogonek (U+0328) is necessary. Looking into the file composite.tmac, one can find ‘.composite ho u0328’, which changes the mapping of ‘ho’ while a composite glyph name is constructed, causing the final glyph name to be u0041_0328.
\[^E u0301]
\[^E aa]
\[E a^ aa]
\[E ^ ']: ‘^E’ maps to u0045_0302, thus the final glyph name is u0045_0302_0301 in all forms (assuming proper calls of the composite request).

It is not possible to define glyphs with names like ‘A ho’ within a groff font file. This is not really a limitation; instead, you have to define u0041_0328.

Escape sequence: \C'xxx': Typeset the glyph of the special character xxx. Normally, it is more convenient to use \[xxx], but \C has some advantages: it is compatible with AT&T device-independent troff (and therefore available in compatibility mode⁷⁹) and can interpolate special characters with ‘]’ in their names. The delimiter need not be a neutral apostrophe; see Delimiters.

Request: .composite id1 id2: Map special character name id1 to id2 if id1 is used in \[...] with more than one component. See above for examples. This is a strict rewriting of the special character name; no check is performed for the existence of a glyph for either. A set of default mappings for many accents can be found in the file composite.tmac, loaded by the default troffrc at startup.

Escape sequence: \N'n'

Typeset the glyph with code n in the current font (n is not the input character code). The number n can be any non-negative decimal integer. Most devices only have glyphs with codes between 0 and 255; the Unicode output device uses codes in the range 0–65535. If the current font does not contain a glyph with that code, special fonts are not searched. The \N escape sequence can be conveniently used in conjunction with the char request:

.char \[phone] \f[ZD]\N'37'

The code of each glyph is given in the fourth column in the font description file after the charset command. It is possible to include unnamed glyphs in the font description file by using a name of ‘---’; the \N escape sequence is the only way to use these.

No kerning is applied to glyphs accessed with \N. The delimiter need not be a neutral apostrophe; see Delimiters.

A few escape sequences are also special characters.

Escape sequence: \': An escaped neutral apostrophe is a synonym for \[aa] (acute accent).

Escape sequence: \`: An escaped grave accent is a synonym for \[ga] (grave accent).

Escape sequence: \-: An escaped hyphen-minus is a synonym for \[-] (minus sign).

Escape sequence: \_: An escaped underscore (“low line”) is a synonym for \[ul] (underrule). On typesetting devices, the underrule is font-invariant and drawn lower than the underscore ‘_’.

Request: .cflags n c1 c2 …

Assign properties encoded by the number n to characters c1, c2, and so on.

Input characters, including special characters introduced by an escape, have certain properties associated with them.⁸⁰ These properties can be modified with this request. The first argument is the sum of the desired flags and the remaining arguments are the characters to be assigned those properties. Spaces between the cn arguments are optional. Any argument cn can be a character class defined with the class request rather than an individual character. See Character Classes.

The non-negative integer n is the sum of any of the following. Some combinations are nonsensical, such as ‘33’ (1 + 32).

1

Recognize the character as ending a sentence if followed by a newline or two spaces. Initially, characters ‘.?!’ have this property.

2

Enable breaks before the character. A line is not broken at a character with this property unless the characters on each side both have non-zero hyphenation codes. This exception can be overridden by adding 64. Initially, no characters have this property.

4

Enable breaks after the character. A line is not broken at a character with this property unless the characters on each side both have non-zero hyphenation codes. This exception can be overridden by adding 64. Initially, characters ‘\-\[hy]\[em]’ have this property.

8

Mark the glyph associated with this character as overlapping other instances of itself horizontally. Initially, characters ‘\[ul]\[rn]\[ru]\[radicalex]\[sqrtex]’ have this property.

16

Mark the glyph associated with this character as overlapping other instances of itself vertically. Initially, the character ‘\[br]’ has this property.

32

Mark the character as transparent for the purpose of end-of-sentence recognition. In other words, an end-of-sentence character followed by any number of characters with this property is treated as the end of a sentence if followed by a newline or two spaces. This is the same as having a zero space factor in TeX. Initially, characters ‘"')]*\[dg]\[dd]\[rq]\[cq]’ have this property.

64

Ignore hyphenation codes of the surrounding characters. Use this in combination with values 2 and 4 (initially, no characters have this property).

For example, if you need an automatic break point after the en-dash in numeric ranges like “3000–5000”, insert

.cflags 68 \[en]

into your document. However, this practice can lead to bad layout if done thoughtlessly; in most situations, a better solution instead of changing the cflags value is to insert \: right after the hyphen at the places that really need a break point.

The remaining values were implemented for East Asian language support; those who use alphabetic scripts exclusively can disregard them.

128: Prohibit a line break before the character, but allow a line break after the character. This works only in combination with flags 256 and 512 and has no effect otherwise. Initially, no characters have this property.
256: Prohibit a line break after the character, but allow a line break before the character. This works only in combination with flags 128 and 512 and has no effect otherwise. Initially, no characters have this property.
512: Allow line break before or after the character. This works only in combination with flags 128 and 256 and has no effect otherwise. Initially, no characters have this property.

In contrast to values 2 and 4, the values 128, 256, and 512 work pairwise. If, for example, the left character has value 512, and the right character 128, no break will be automatically inserted between them. If we use value 6 instead for the left character, a break after the character can’t be suppressed since the neighboring character on the right doesn’t get examined.

Request: .char c [contents]

Request: .fchar c [contents]

Request: .fschar f c [contents]

Request: .schar c [contents]

Define a new character or glyph c to be contents, which can be empty. More precisely, char defines a groff object (or redefines an existing one) that is accessed with the name c on input, and produces contents on output. Every time glyph c needs to be printed, contents is processed in a temporary environment and the result is wrapped up into a single object. Compatibility mode is turned off and the escape character is set to \ while contents is processed. Any emboldening, constant spacing, or track kerning is applied to this object rather than to individual glyphs in contents.

An object defined by these requests can be used just like a normal glyph provided by the output device. In particular, other characters can be translated to it with the tr or trin requests; it can be made the leader character with the lc request; repeated patterns can be drawn with it using the \l and \L escape sequences; and words containing c can be hyphenated correctly if the hcode request is used to give the object a hyphenation code.

There is a special anti-recursion feature: use of the object within its own definition is handled like a normal character (not defined with char).

The tr and trin requests take precedence if char accesses the same symbol.

.tr XY
X
    ⇒ Y
.char X Z
X
    ⇒ Y
.tr XX
X
    ⇒ Z

The fchar request defines a fallback glyph: gtroff only checks for glyphs defined with fchar if it cannot find the glyph in the current font. gtroff carries out this test before checking special fonts.

fschar defines a fallback glyph for font f: gtroff checks for glyphs defined with fschar after the list of fonts declared as font-specific special fonts with the fspecial request, but before the list of fonts declared as global special fonts with the special request.

Finally, the schar request defines a global fallback glyph: gtroff checks for glyphs defined with schar after the list of fonts declared as global special fonts with the special request, but before the already mounted special fonts.

See Character Classes.

Request: .rchar c …

Request: .rfschar f c …

Remove definition of each ordinary or special character c, undoing the effect of a char, fchar, or schar request. Those supplied by font description files cannot be removed. Spaces and tabs may separate c arguments.

The request rfschar removes glyph definitions defined with fschar for font f.