Next: Manipulating Spacing, Previous: Manipulating Filling and Adjustment, Up: GNU troff Reference [Contents][Index]
When filling, GNU troff
hyphenates words as needed at
user-specified and automatically determined hyphenation points. The
machine-driven determination of hyphenation points in words requires
algorithms and data, and is susceptible to conventions and preferences.
Before tackling such automatic hyphenation, let us consider how
hyphenation points can be set explicitly.
Explicitly hyphenated words such as “mother-in-law” are eligible for
breaking after each of their hyphens. Relatively few words in a
language offer such obvious break points, however, and automatic
detection of syllabic (or phonetic) boundaries for hyphenation is not
perfect,56 particularly for
unusual words found in technical literature. We can instruct GNU
troff
how to hyphenate specific words if the need arises.
Define each hyphenation exception word with each hyphen ‘-’ in the word indicating a hyphenation point. For example, the request
.hw in-sa-lub-rious alpha
marks potential hyphenation points in “insalubrious”, and prevents “alpha” from being hyphenated at all.
Besides the space character, any character whose hyphenation code is
zero can be used to separate the arguments of hw
(see the
hcode
request below). In addition, this request can be used more
than once.
Hyphenation points specified with hw
are not subject to the
within-word placement restrictions imposed by the hy
request (see
below).
Hyphenation exceptions specified with the hw
request are
associated with the hyphenation language (see the hla
request
below) and environment (see Environments); invoking the hw
request in the absence of a hyphenation language is an error.
The request is ignored if there are no parameters.
These are known as hyphenation exceptions in the expectation
that most users will avail themselves of automatic hyphenation; these
exceptions override any rules that would normally apply to a word
matching a hyphenation exception defined with hw
.
Situations also arise when only a specific occurrence of a word needs its hyphenation altered or suppressed, or when a URL or similar string needs to be breakable in sensible places without hyphenation.
To tell GNU troff
how to hyphenate words as they occur in input,
use the \%
escape sequence; it is the default hyphenation
character. Each instance within a word indicates to GNU troff
that the word may be hyphenated at that point, while prefixing a word
with this escape sequence prevents it from being otherwise hyphenated.
This mechanism affects only that occurrence of the word; to change the
hyphenation of a word for the remainder of input processing, use the
hw
request.
GNU troff
regards the escape sequences \X
and \Y
as
starting a word; that is, the \%
escape sequence in, say,
‘\X'...'\%foobar’ or ‘\Y'...'\%foobar’ no longer
prevents hyphenation of ‘foobar’ but inserts a hyphenation point
just prior to it; most likely this isn’t what you want.
See Postprocessor Access.
\:
inserts a non-printing break point; that is, a word can break
there, but the soft hyphen glyph (see below) is not written to the
output if it does. This escape sequence is an input word boundary, so
the remainder of the word is subject to hyphenation as normal.
You can combine \:
and \%
to control breaking of a file
name or URL, or to permit hyphenation only after certain explicit
hyphens within a word.
The \%Lethbridge-Stewart-\:\%Sackville-Baggins divorce was, in retrospect, inevitable once the contents of \%/var/log/\:\%httpd/\:\%access_log on the family web server came to light, revealing visitors from Hogwarts.
Change the hyphenation character to char. This character then
works as the \%
escape sequence normally does, and thus no longer
appears in the output.57 Without an
argument, hc
resets the hyphenation character to \%
(the
default). The hyphenation character is associated with the environment
(see Environments).
Set the soft hyphen character, inserted when a word is hyphenated
automatically or at a hyphenation character, to the ordinary or special
character c.58 If the argument is omitted, the soft
hyphen character is set to the default, \[hy]
. If no glyph for
c exists in the font in use at a potential hyphenation point, then
the line is not broken there. Neither character definitions (specified
with the char
and similar requests) nor translations (specified
with the tr
request) are applied to c.
Several requests influence automatic hyphenation. Because conventions
vary, a variety of hyphenation modes is available to the hy
request; these determine whether hyphenation will apply to a
word prior to breaking a line at the end of a page (more or less; see
below for details), and at which positions within that word
automatically determined hyphenation points are permissible. The places
within a word that are eligible for hyphenation are determined by
language-specific data and lettercase relationships. Furthermore,
hyphenation of a word might be suppressed due to a limit on
consecutive hyphenated lines (hlm
), a minimum line length
threshold (hym
), or because the line can instead be adjusted with
additional inter-word space (hys
).
Set automatic hyphenation mode to mode, an integer encoding
conditions for hyphenation; if omitted, ‘1’ is implied. The
hyphenation mode is available in the read-only register ‘.hy’; it
is associated with the environment (see Environments). The default
hyphenation mode depends on the localization file loaded when GNU
troff
starts up; see the hpf
request below.
Typesetting practice generally does not avail itself of every
opportunity for hyphenation, but the details differ by language and site
mandates. The hyphenation modes of AT&T troff
were
implemented with English-language publishing practices of the 1970s in
mind, not a scrupulous enumeration of conceivable parameters. GNU
troff
extends those modes such that finer-grained control is
possible, favoring compatibility with older implementations over a more
intuitive arrangement. The means of hyphenation mode control is a set
of numbers that can be added up to encode the behavior
sought.59 The entries in the
following table are termed values; the sum of the desired
values is the mode.
0
disables hyphenation.
1
enables hyphenation except after the first and before the last character of a word.
The remaining values “imply” 1; that is, they enable hyphenation under the same conditions as ‘.hy 1’, and then apply or lift restrictions relative to that basis.
2
disables hyphenation of the last word on a page,60 even for explicitly hyphenated words.
4
disables hyphenation before the last two characters of a word.
8
disables hyphenation after the first two characters of a word.
16
enables hyphenation before the last character of a word.
32
enables hyphenation after the first character of a word.
Apart from value 2, restrictions imposed by the hyphenation mode
are not respected for words whose hyphenations have been
specified with the hyphenation character (‘\%’ by default) or the
hw
request.
Nonzero values in the previous table are additive. For example,
mode 12 causes GNU troff
to hyphenate neither the last two
nor the first two characters of a word. Some values cannot be used
together because they contradict; for instance, values 4 and 16,
and values 8 and 32. As noted, it is superfluous to add 1 to any
non-zero even mode.
The automatic placement of hyphens in words is determined by pattern files, which are derived from TeX and available for several languages. The number of characters at the beginning of a word after which the first hyphenation point should be inserted is determined by the patterns themselves; it can’t be reduced further without introducing additional, invalid hyphenation points (unfortunately, this information is not part of a pattern file—you have to know it in advance). The same is true for the number of characters at the end of a word before the last hyphenation point should be inserted. For example, you can supply the following input to ‘echo $(nroff)’.
.ll 1 .hy 48 splitting
You will get
s- plit- t- in- g
instead of the correct ‘split- ting’. English patterns as distributed
with GNU troff
need two characters at the beginning and three
characters at the end; this means that value 4 of hy
is
mandatory. Value 8 is possible as an additional restriction, but
values 16 and 32 should be avoided, as should mode 1.
Modes 4 and 6 are typical.
A table of left and right minimum character counts for hyphenation as
needed by the patterns distributed with GNU troff
follows; see
the groff_tmac(5) man page for more information on GNU
troff
’s language macro files.
language | pattern name | left min | right min |
---|---|---|---|
Czech | cs | 2 | 2 |
English | en | 2 | 3 |
French | fr | 2 | 3 |
German traditional | det | 2 | 2 |
German reformed | den | 2 | 2 |
Italian | it | 2 | 2 |
Swedish | sv | 1 | 2 |
Hyphenation exceptions within pattern files (i.e., the words within a
TeX \hyphenation
group) obey the hyphenation restrictions
given by hy
.
Disable automatic hyphenation; i.e., set the hyphenation mode to 0
(see above). The hyphenation mode of the last call to hy
is not
remembered.
Read hyphenation patterns from pattern-file, which is sought
in the same way that macro files are with the mso
request or the
-mname command-line option to groff
. The
pattern-file should have the same format as (simple) TeX
pattern files. More specifically, the following scanning rules are
implemented.
\$
are not supported.
^^xx
(where each x is 0–9 or a–f) and
^^c
(character c in the code point range 0–127
decimal) are recognized; other uses of ^
cause an error.
hpf
checks for the expression \patterns{…}
(possibly with whitespace before or after the braces). Everything
between the braces is taken as hyphenation patterns. Consequently,
{
and }
are not allowed in patterns.
\hyphenation{…}
gives a list of hyphenation
exceptions.
\endinput
is recognized also.
\patterns
is missing, the whole
file is treated as a list of hyphenation patterns (except that the
%
character is recognized as the start of a comment).
The hpfa
request appends a file of patterns to the current list.
The hpfcode
request defines mapping values for character codes in
pattern files. It is an older mechanism no longer used by GNU
troff
’s own macro files; for its successor, see hcode
below. hpf
or hpfa
apply the mapping after reading the
patterns but before replacing or appending to the active list of
patterns. Its arguments are pairs of character codes—integers from 0
to 255. The request maps character code a to
code b, code c to code d, and so on.
Character codes that would otherwise be invalid in GNU troff
can
be used. By default, every code maps to itself except those for letters
‘A’ to ‘Z’, which map to those for ‘a’ to ‘z’.
The set of hyphenation patterns is associated with the language set by
the hla
request (see below). The hpf
request is usually
invoked by a localization file loaded by the troffrc
file.61
A second call to hpf
(for the same language) replaces the
hyphenation patterns with the new ones. Invoking hpf
or
hpfa
causes an error if there is no hyphenation language. If no
hpf
request is specified (either in the document, in a file
loaded at startup, or in a macro package), GNU troff
won’t
automatically hyphenate at all.
Set the hyphenation code of character c1 to code1, that of c2 to code2, and so on. A hyphenation code must be an ordinary character (not a special character escape sequence) other than a digit or a space. The request is ignored if given no arguments.
For hyphenation to work, hyphenation codes must be set up. At
startup, GNU troff
assigns hyphenation codes to the letters
‘a’–‘z’ (mapped to themselves), to the letters
‘A’–‘Z’ (mapped to ‘a’–‘z’), and zero to all other
characters. Normally, hyphenation patterns contain only lowercase
letters which should be applied regardless of case. In other words,
they assume that the words ‘FOO’ and ‘Foo’ should be hyphenated exactly
as ‘foo’ is. The hcode
request extends this principle to letters
outside the Unicode basic Latin alphabet; without it, words containing
such letters won’t be hyphenated properly even if the corresponding
hyphenation patterns contain them.
For example, the following hcode
requests are necessary to assign
hyphenation codes to the letters ‘ÄäÖöÜüß’, needed for German.
.hcode ä ä Ä ä .hcode ö ö Ö ö .hcode ü ü Ü ü .hcode ß ß
Without these assignments, GNU troff
treats the German word
‘Kindergärten’ (the plural form of ‘kindergarten’) as two words
‘kinderg’ and ‘rten’ because the hyphenation code of the
umlaut a is zero by default, just like a space. There is a German
hyphenation pattern that covers ‘kinder’, so GNU troff
finds
the hyphenation ‘kin-der’. The other two hyphenation points
(‘kin-der-gär-ten’) are missed.
Set the hyphenation language to lang. Hyphenation exceptions
specified with the hw
request and hyphenation patterns and
exceptions specified with the hpf
and hpfa
requests are
associated with the hyphenation language. The hla
request is
usually invoked by a localization file, which is turn loaded by the
troffrc or troffrc-end file; see the hpf
request
above.
The hyphenation language is available in the read-only string-valued register ‘.hla’; it is associated with the environment (see Environments).
Set the maximum quantity of consecutive hyphenated lines to n. If
n is negative, there is no maximum. If omitted, n
is -1. This value is associated with the environment
(see Environments). Only lines output from a given environment
count toward the maximum associated with that environment. Hyphens
resulting from \%
are counted; explicit hyphens are not.
The .hlm
read-only register stores this maximum. The count of
immediately preceding consecutive hyphenated lines is available in the
read-only register .hlc
.
Set the (right) hyphenation margin to length. If the adjustment mode is not ‘b’ or ‘n’, the line is not hyphenated if it is shorter than length. Without an argument, the hyphenation margin is reset to its default value, 0. The default scaling unit is ‘m’. The hyphenation margin is associated with the environment (see Environments).
A negative argument resets the hyphenation margin to zero, emitting a warning in category ‘range’.
The hyphenation margin is available in the .hym
read-only
register.
Suppress hyphenation of the line in adjustment modes ‘b’ or ‘n’ if it can be justified by adding no more than hyphenation-space extra space to each inter-word space. Without an argument, the hyphenation space adjustment threshold is set to its default value, 0. The default scaling unit is ‘m’. The hyphenation space adjustment threshold is associated with the environment (see Environments).
A negative argument resets the hyphenation space adjustment threshold to zero, emitting a warning in category ‘range’.
The hyphenation space adjustment threshold is available in the
.hys
read-only register.
Next: Manipulating Spacing, Previous: Manipulating Filling and Adjustment, Up: GNU troff Reference [Contents][Index]