8.5.6 ‘awk’ regular expression syntax

The character ‘.’ matches any single character except the null character.

+

indicates that the regular expression should match one or more occurrences of the previous atom or regexp.

?

indicates that the regular expression should match zero or one occurrence of the previous atom or regexp.

\+

matches a ‘+

\?

matches a ‘?’.

Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example ‘[z-a]’, are invalid. Within square brackets, ‘\’ can be used to quote the following character. Character classes are supported; for example ‘[[:digit:]]’ will match a single decimal digit.

GNU extensions are not supported and so ‘\w’, ‘\W’, ‘\<’, ‘\>’, ‘\b’, ‘\B’, ‘\`’, and ‘\'’ match ‘w’, ‘W’, ‘<’, ‘>’, ‘b’, ‘B’, ‘`’, and ‘'’ respectively.

Grouping is performed with parentheses ‘()’. An unmatched ‘)’ matches just itself. A backslash followed by a digit matches that digit.

The alternation operator is ‘|’.

The characters ‘^’ and ‘$’ always represent the beginning and end of a string respectively, except within square brackets. Within brackets, ‘^’ can be used to invert the membership of the character class being specified.

*’, ‘+’ and ‘?’ are special at any point in a regular expression except:

  1. At the beginning of a regular expression
  2. After an open-group, signified by ‘(
  3. After the alternation operator ‘|

The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups.