Escapes (sed, a stream editor)

5.8 Escape Sequences - specifying special characters

Until this chapter, we have only encountered escapes of the form ‘\^’, which tell sed not to interpret the circumflex as a special character, but rather to take it literally. For example, ‘\*’ matches a single asterisk rather than zero or more backslashes.

This chapter introduces another kind of escape⁶—that is, escapes that are applied to a character or sequence of characters that ordinarily are taken literally, and that sed replaces with a special character. This provides a way of encoding non-printable characters in patterns in a visible manner. There is no restriction on the appearance of non-printing characters in a sed script but when a script is being prepared in the shell or by text editing, it is usually easier to use one of the following escape sequences than the binary character it represents:

The list of these escapes is:

\a: Produces or matches a BEL character, that is an “alert” (ASCII 7).
\f: Produces or matches a form feed (ASCII 12).
\n: Produces or matches a newline (ASCII 10).
\r: Produces or matches a carriage return (ASCII 13).
\t: Produces or matches a horizontal tab (ASCII 9).
\v: Produces or matches a so called “vertical tab” (ASCII 11).
\cx: Produces or matches CONTROL-x, where x is any character. The precise effect of ‘\cx’ is as follows: if x is a lower case letter, it is converted to upper case. Then bit 6 of the character (hex 40) is inverted. Thus ‘\cz’ becomes hex 1A, but ‘\c{’ becomes hex 3B, while ‘\c;’ becomes hex 7B.
\dxxx: Produces or matches a character whose decimal ASCII value is xxx.
\oxxx: Produces or matches a character whose octal ASCII value is xxx.
\xxx: Produces or matches a character whose hexadecimal ASCII value is xx.

‘\b’ (backspace) was omitted because of the conflict with the existing “word boundary” meaning.

5.8.1 Escaping Precedence

GNU sed processes escape sequences before passing the text onto the regular-expression matching of the s/// command and Address matching. Thus the follwing two commands are equivalent (‘0x5e’ is the hexadecimal ASCII value of the character ‘^’):

$ echo 'a^c' | sed 's/^/b/'
ba^c

$ echo 'a^c' | sed 's/\x5e/b/'
ba^c

As are the following (‘0x5b’,‘0x5d’ are the hexadecimal ASCII values of ‘[’,‘]’, respectively):

$ echo abc | sed 's/[a]/x/'
Xbc
$ echo abc | sed 's/\x5ba\x5d/x/'
Xbc

However it is recommended to avoid such special characters due to unexpected edge-cases. For example, the following are not equivalent:

$ echo 'a^c' | sed 's/\^/b/'
abc

$ echo 'a^c' | sed 's/\\\x5e/b/'
a^c

Footnotes

(6)

All the escapes introduced here are GNU extensions, with the exception of \n. In basic regular expression mode, setting POSIXLY_CORRECT disables them inside bracket expressions.