Next: Locale Considerations, Previous: Back-references and Subexpressions, Up: sed regular expressions [Contents][Index]
Until this chapter, we have only encountered escapes of the form
‘\^’, which tell sed
not to interpret the circumflex
as a special character, but rather to take it literally. For
example, ‘\*’ matches a single asterisk rather than zero
or more backslashes.
This chapter introduces another kind of escape6—that
is, escapes that are applied to a character or sequence of characters
that ordinarily are taken literally, and that sed
replaces
with a special character. This provides a way
of encoding non-printable characters in patterns in a visible manner.
There is no restriction on the appearance of non-printing characters
in a sed
script but when a script is being prepared in the
shell or by text editing, it is usually easier to use one of
the following escape sequences than the binary character it
represents:
The list of these escapes is:
\a
Produces or matches a BEL character, that is an “alert” (ASCII 7).
\f
Produces or matches a form feed (ASCII 12).
\n
Produces or matches a newline (ASCII 10).
\r
Produces or matches a carriage return (ASCII 13).
\t
Produces or matches a horizontal tab (ASCII 9).
\v
Produces or matches a so called “vertical tab” (ASCII 11).
\cx
Produces or matches CONTROL-x, where x is any character. The precise effect of ‘\cx’ is as follows: if x is a lower case letter, it is converted to upper case. Then bit 6 of the character (hex 40) is inverted. Thus ‘\cz’ becomes hex 1A, but ‘\c{’ becomes hex 3B, while ‘\c;’ becomes hex 7B.
\dxxx
Produces or matches a character whose decimal ASCII value is xxx.
\oxxx
Produces or matches a character whose octal ASCII value is xxx.
\xxx
Produces or matches a character whose hexadecimal ASCII value is xx.
‘\b’ (backspace) was omitted because of the conflict with the existing “word boundary” meaning.
GNU sed
processes escape sequences before passing
the text onto the regular-expression matching of the s///
command
and Address matching. Thus the follwing two commands are equivalent
(‘0x5e’ is the hexadecimal ASCII value of the character ‘^’):
$ echo 'a^c' | sed 's/^/b/' ba^c $ echo 'a^c' | sed 's/\x5e/b/' ba^c
As are the following (‘0x5b’,‘0x5d’ are the hexadecimal ASCII values of ‘[’,‘]’, respectively):
$ echo abc | sed 's/[a]/x/' Xbc $ echo abc | sed 's/\x5ba\x5d/x/' Xbc
However it is recommended to avoid such special characters due to unexpected edge-cases. For example, the following are not equivalent:
$ echo 'a^c' | sed 's/\^/b/' abc $ echo 'a^c' | sed 's/\\\x5e/b/' a^c
All
the escapes introduced here are GNU
extensions, with the exception of \n
. In basic regular
expression mode, setting POSIXLY_CORRECT
disables them inside
bracket expressions.
Next: Locale Considerations, Previous: Back-references and Subexpressions, Up: sed regular expressions [Contents][Index]