If the syntax bit RE_NO_BK_REF
isn’t set, then Regex recognizes
back-references. A back-reference matches a specified preceding group.
The back-reference operator is represented by ‘\digit’
anywhere after the end of a regular expression’s digit-th
group (see Grouping Operators ((
… )
or \(
… \)
)).
digit must be between ‘1’ and ‘9’. The matcher assigns numbers 1 through 9 to the first nine groups it encounters. By using one of ‘\1’ through ‘\9’ after the corresponding group’s close-group operator, you can match a substring identical to the one that the group does.
Back-references match according to the following (in all examples below, ‘(’ represents the open-group, ‘)’ the close-group, ‘{’ the open-interval and ‘}’ the close-interval operator):
RE_DOT_NEWLINE
isn’t set) string that is composed of two
identical halves; the ‘(.*)’ matches the first half and the
‘\1’ matches the second half.
You can use a back-reference as an argument to a repetition operator. For example, ‘(a(b))\2*’ matches ‘a’ followed by two or more ‘b’s. Similarly, ‘(a(b))\2{3}’ matches ‘abbbb’.
If there is no preceding digit-th subexpression, the regular expression is invalid.
Back-references can greatly slow down matching, as they can generate exponentially many matching possibilities that can consume both time and memory to explore. Also, the POSIX specification for back-references is at times unclear. Furthermore, many regular expression implementations have back-reference bugs that can cause programs to return incorrect answers or even crash, and fixing these bugs has often been low-priority: for example, as of 2020 the GNU C library bug database contained back-reference bugs 52, 10844, 11053, 24269 and 25322, with little sign of forthcoming fixes. Luckily, back-references are rarely useful and it should be little trouble to avoid them in practical applications.