Next: regexp extensions, Previous: ERE syntax, Up: sed regular expressions [Contents][Index]
A bracket expression is a list of characters enclosed by ‘[’ and ‘]’. It matches any single character in that list; if the first character of the list is the caret ‘^’, then it matches any character not in the list. For example, the following command replaces the words ‘gray’ or ‘grey’ with ‘blue’:
sed 's/gr[ae]y/blue/'
Bracket expressions can be used in both basic and extended regular expressions (that is, with or without the -E/-r options).
Within a bracket expression, a range expression consists of two characters separated by a hyphen. It matches any single character that sorts between the two characters, inclusive. In the default C locale, the sorting sequence is the native character order; for example, ‘[a-d]’ is equivalent to ‘[abcd]’.
Finally, certain named classes of characters are predefined within bracket expressions, as follows.
These named classes must be used inside brackets themselves. Correct usage:
$ echo 1 | sed 's/[[:digit:]]/X/' X
Incorrect usage is rejected by newer sed
versions.
Older versions accepted it but treated it as a single bracket expression
(which is equivalent to ‘[dgit:]’,
that is, only the characters d/g/i/t/:):
# current GNU sed versions - incorrect usage rejected $ echo 1 | sed 's/[:digit:]/X/' sed: character class syntax is [[:space:]], not [:space:] # older GNU sed versions $ echo 1 | sed 's/[:digit:]/X/' 1
Alphanumeric characters: ‘[:alpha:]’ and ‘[:digit:]’; in the ‘C’ locale and ASCII character encoding, this is the same as ‘[0-9A-Za-z]’.
Alphabetic characters: ‘[:lower:]’ and ‘[:upper:]’; in the ‘C’ locale and ASCII character encoding, this is the same as ‘[A-Za-z]’.
Blank characters: space and tab.
Control characters. In ASCII, these characters have octal codes 000 through 037, and 177 (DEL). In other character sets, these are the equivalent characters, if any.
Digits: 0 1 2 3 4 5 6 7 8 9
.
Graphical characters: ‘[:alnum:]’ and ‘[:punct:]’.
Lower-case letters; in the ‘C’ locale and ASCII character
encoding, this is
a b c d e f g h i j k l m n o p q r s t u v w x y z
.
Printable characters: ‘[:alnum:]’, ‘[:punct:]’, and space.
Punctuation characters; in the ‘C’ locale and ASCII character
encoding, this is
! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [ \ ] ^ _ ` { | } ~
.
Space characters: in the ‘C’ locale, this is tab, newline, vertical tab, form feed, carriage return, and space.
Upper-case letters: in the ‘C’ locale and ASCII character
encoding, this is
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
.
Hexadecimal digits:
0 1 2 3 4 5 6 7 8 9 A B C D E F a b c d e f
.
Note that the brackets in these class names are part of the symbolic names, and must be included in addition to the brackets delimiting the bracket expression.
Most meta-characters lose their special meaning inside bracket expressions:
ends the bracket expression if it’s not the first list item. So, if you want to make the ‘]’ character a list item, you must put it first.
represents the range if it’s not first or last in a list or the ending point of a range.
represents the characters not in the list. If you want to make the ‘^’ character a list item, place it anywhere but first.
TODO: incorporate this paragraph (copied verbatim from BRE section).
The characters $
, *
, .
, [
, and \
are normally not special within list. For example, [\*]
matches either ‘\’ or ‘*’, because the \
is not
special here. However, strings like [.ch.]
, [=a=]
, and
[:space:]
are special within list and represent collating
symbols, equivalence classes, and character classes, respectively, and
[
is therefore special within list when it is followed by
.
, =
, or :
. Also, when not in
POSIXLY_CORRECT
mode, special escapes like \n
and
\t
are recognized within list. See Escapes.
represents the open collating symbol.
represents the close collating symbol.
represents the open equivalence class.
represents the close equivalence class.
represents the open character class symbol, and should be followed by a valid character class name.
represents the close character class symbol.
Next: regexp extensions, Previous: ERE syntax, Up: sed regular expressions [Contents][Index]