rx
regexps ¶The various forms in rx
regexps are described below. The
shorthand rx represents any rx
form. rx…
means zero or more rx
forms and, unless stated otherwise,
matches these forms in sequence as if wrapped in a (seq …)
subform.
These are all valid arguments to the rx
macro. All forms are
defined by their described semantics; the corresponding string regexps
are provided for ease of understanding only. A, B, …
denote (suitably bracketed) string regexp subexpressions therein.
"some-string"
Match the string ‘some-string’ literally. There are no characters with special meaning, unlike in string regexps.
?C
Match the character ‘C’ literally.
(seq rx…)
¶(sequence rx…)
(: rx…)
(and rx…)
Match the rxs in sequence. Without arguments, the expression
matches the empty string.
Corresponding string regexp: ‘AB…’
(subexpressions in sequence).
(or rx…)
¶(| rx…)
Match exactly one of the rxs.
If all arguments are strings, characters, or or
forms
so constrained, the longest possible match will always be used.
Otherwise, either the longest match or the
first (in left-to-right order) will be used.
Without arguments, the expression will not match anything at all.
Corresponding string regexp: ‘A\|B\|…’.
unmatchable
¶Refuse any match. Equivalent to (or)
.
See regexp-unmatchable.
Normally, repetition forms are greedy, in that they attempt to match as many times as possible. Some forms are non-greedy; they try to match as few times as possible (see Non-greedy repetition).
(zero-or-more rx…)
¶(0+ rx…)
Match the rxs zero or more times. Greedy by default.
Corresponding string regexp: ‘A*’ (greedy),
‘A*?’ (non-greedy)
(one-or-more rx…)
¶(1+ rx…)
Match the rxs one or more times. Greedy by default.
Corresponding string regexp: ‘A+’ (greedy),
‘A+?’ (non-greedy)
(zero-or-one rx…)
¶(optional rx…)
(opt rx…)
Match the rxs once or an empty string. Greedy by default.
Corresponding string regexp: ‘A?’ (greedy),
‘A??’ (non-greedy).
(* rx…)
¶Match the rxs zero or more times. Greedy.
Corresponding string regexp: ‘A*’
(+ rx…)
¶Match the rxs one or more times. Greedy.
Corresponding string regexp: ‘A+’
(? rx…)
¶Match the rxs once or an empty string. Greedy.
Corresponding string regexp: ‘A?’
(*? rx…)
¶Match the rxs zero or more times. Non-greedy.
Corresponding string regexp: ‘A*?’
(+? rx…)
¶Match the rxs one or more times. Non-greedy.
Corresponding string regexp: ‘A+?’
(?? rx…)
¶Match the rxs or an empty string. Non-greedy.
Corresponding string regexp: ‘A??’
(= n rx…)
(repeat n rx)
Match the rxs exactly n times.
Corresponding string regexp: ‘A\{n\}’
(>= n rx…)
¶Match the rxs n or more times. Greedy.
Corresponding string regexp: ‘A\{n,\}’
(** n m rx…)
¶(repeat n m rx…)
Match the rxs at least n but no more than m times. Greedy.
Corresponding string regexp: ‘A\{n,m\}’
The greediness of some repetition forms can be controlled using the following constructs. However, it is usually better to use the explicit non-greedy forms above when such matching is required.
(minimal-match rx)
¶Match rx, with zero-or-more
, 0+
,
one-or-more
, 1+
, zero-or-one
, opt
and
optional
using non-greedy matching.
(maximal-match rx)
¶Match rx, with zero-or-more
, 0+
,
one-or-more
, 1+
, zero-or-one
, opt
and
optional
using greedy matching. This is the default.
(any set…)
¶(char set…)
(in set…)
Match a single character from one of the sets. Each set
is a character, a string representing the set of its characters, a
range or a character class (see below). A range is either a
hyphen-separated string like "A-Z"
, or a cons of characters
like (?A . ?Z)
.
Note that hyphen (-
) is special in strings in this construct,
since it acts as a range separator. To include a hyphen, add it as a
separate character or single-character string.
Corresponding string regexp: ‘[…]’
(not charspec)
¶Match a character not included in charspec. charspec can
be a character, a single-character string, an any
, not
,
or
, intersection
, syntax
or category
form,
or a character class.
If charspec is an or
form, its arguments have the same
restrictions as those of intersection
; see below.
Corresponding string regexp: ‘[^…]’, ‘\Scode’,
‘\Ccode’
(intersection charset…)
¶Match a character included in all of the charsets.
Each charset can be a character, a single-character string, an
any
form without character classes, or an intersection
,
or
or not
form whose arguments are also charsets.
not-newline
, nonl
¶Match any character except a newline.
Corresponding string regexp: ‘.’ (dot)
anychar
, anything
¶Match any character.
Corresponding string regexp: ‘.\|\n’ (for example)
Match a character from a named character class:
alpha
, alphabetic
, letter
Match alphabetic characters. More precisely, match characters whose Unicode ‘general-category’ property indicates that they are alphabetic.
alnum
, alphanumeric
Match alphabetic characters and digits. More precisely, match characters whose Unicode ‘general-category’ property indicates that they are alphabetic or decimal digits.
digit
, numeric
, num
Match the digits ‘0’–‘9’.
xdigit
, hex-digit
, hex
Match the hexadecimal digits ‘0’–‘9’, ‘A’–‘F’ and ‘a’–‘f’.
cntrl
, control
Match any character whose code is in the range 0–31.
blank
Match horizontal whitespace. More precisely, match characters whose Unicode ‘general-category’ property indicates that they are spacing separators.
space
, whitespace
, white
Match any character that has whitespace syntax (see Table of Syntax Classes).
lower
, lower-case
Match anything lower-case, as determined by the current case table.
If case-fold-search
is non-nil
, this also matches any
upper-case letter.
upper
, upper-case
Match anything upper-case, as determined by the current case table.
If case-fold-search
is non-nil
, this also matches any
lower-case letter.
graph
, graphic
Match any character except whitespace, ASCII and non-ASCII control characters, surrogates, and codepoints unassigned by Unicode, as indicated by the Unicode ‘general-category’ property.
print
, printing
Match whitespace or a character matched by graph
.
punct
, punctuation
Match any punctuation character. (At present, for multibyte characters, anything that has non-word syntax.)
word
, wordchar
Match any character that has word syntax (see Table of Syntax Classes).
ascii
Match any ASCII character (codes 0–127).
nonascii
Match any non-ASCII character (but not raw bytes).
Corresponding string regexp: ‘[[:class:]]’
(syntax syntax)
¶Match a character with syntax syntax, being one of the following names:
Syntax name | Syntax character |
---|---|
whitespace | - |
punctuation | . |
word | w |
symbol | _ |
open-parenthesis | ( |
close-parenthesis | ) |
expression-prefix | ' |
string-quote | " |
paired-delimiter | $ |
escape | \ |
character-quote | / |
comment-start | < |
comment-end | > |
string-delimiter | | |
comment-delimiter | ! |
For details, see Table of Syntax Classes. Please note that
(syntax punctuation)
is not equivalent to the character class
punctuation
.
Corresponding string regexp: ‘\schar’ where char is the
syntax character.
(category category)
¶Match a character in category category, which is either one of the names below or its category character.
Category name | Category character |
---|---|
space-for-indent | space |
base | . |
consonant | 0 |
base-vowel | 1 |
upper-diacritical-mark | 2 |
lower-diacritical-mark | 3 |
tone-mark | 4 |
symbol | 5 |
digit | 6 |
vowel-modifying-diacritical-mark | 7 |
vowel-sign | 8 |
semivowel-lower | 9 |
not-at-end-of-line | < |
not-at-beginning-of-line | > |
alpha-numeric-two-byte | A |
chinese-two-byte | C |
greek-two-byte | G |
japanese-hiragana-two-byte | H |
indian-two-byte | I |
japanese-katakana-two-byte | K |
strong-left-to-right | L |
korean-hangul-two-byte | N |
strong-right-to-left | R |
cyrillic-two-byte | Y |
combining-diacritic | ^ |
ascii | a |
arabic | b |
chinese | c |
ethiopic | e |
greek | g |
korean | h |
indian | i |
japanese | j |
japanese-katakana | k |
latin | l |
lao | o |
tibetan | q |
japanese-roman | r |
thai | t |
vietnamese | v |
hebrew | w |
cyrillic | y |
can-break | | |
For more information about currently defined categories, run the
command M-x describe-categories RET. For how to define
new categories, see Categories.
Corresponding string regexp: ‘\cchar’ where char is the
category character.
These all match the empty string, but only in specific places.
line-start
, bol
¶Match at the beginning of a line.
Corresponding string regexp: ‘^’
line-end
, eol
¶Match at the end of a line.
Corresponding string regexp: ‘$’
string-start
, bos
, buffer-start
, bot
¶Match at the start of the string or buffer being matched against.
Corresponding string regexp: ‘\`’
string-end
, eos
, buffer-end
, eot
¶Match at the end of the string or buffer being matched against.
Corresponding string regexp: ‘\'’
point
¶Match at point.
Corresponding string regexp: ‘\=’
word-start
, bow
¶Match at the beginning of a word.
Corresponding string regexp: ‘\<’
word-end
, eow
¶Match at the end of a word.
Corresponding string regexp: ‘\>’
word-boundary
¶Match at the beginning or end of a word.
Corresponding string regexp: ‘\b’
not-word-boundary
¶Match anywhere but at the beginning or end of a word.
Corresponding string regexp: ‘\B’
symbol-start
¶Match at the beginning of a symbol.
Corresponding string regexp: ‘\_<’
symbol-end
¶Match at the end of a symbol.
Corresponding string regexp: ‘\_>’
(group rx…)
¶(submatch rx…)
Match the rxs, making the matched text and position accessible
in the match data. The first group in a regexp is numbered 1;
subsequent groups will be numbered one above the previously
highest-numbered group in the pattern so far.
Corresponding string regexp: ‘\(…\)’
(group-n n rx…)
¶(submatch-n n rx…)
Like group
, but explicitly assign the group number n.
n must be positive.
Corresponding string regexp: ‘\(?n:…\)’
(backref n)
¶Match the text previously matched by group number n.
n must be in the range 1–9.
Corresponding string regexp: ‘\n’
(literal expr)
¶Match the literal string that is the result from evaluating the Lisp expression expr. The evaluation takes place at call time, in the current lexical environment.
(regexp expr)
¶(regex expr)
Match the string regexp that is the result from evaluating the Lisp expression expr. The evaluation takes place at call time, in the current lexical environment.
(eval expr)
¶Match the rx form that is the result from evaluating the Lisp
expression expr. The evaluation takes place at macro-expansion
time for rx
, at call time for rx-to-string
,
in the current global environment.