Next: Parsing Program Source, Previous: Syntax Tables, Up: Emacs Lisp [Contents][Index]
Emacs Lisp provides several tools for parsing and matching text, from regular expressions (see Regular Expressions) to full left-to-right (a.k.a. LL) grammar parsers (see (bovine)Bovine parser development). Parsing Expression Grammars (PEG) are another approach to text parsing that offer more structure and composability than regular expressions, but less complexity than context-free grammars.
A Parsing Expression Grammar (PEG) describes a formal language
in terms of a set of rules for recognizing strings in the language. In
Emacs, a PEG parser is defined as a list of named rules, each
of which matches text patterns and/or contains references to other
rules. Parsing is initiated with the function peg-run
or the
macro peg-parse
(see below), and parses text after point in the
current buffer, using a given set of rules.
Each rule in a PEG is referred to as a parsing expression (PEX), and can be specified a literal string, a regexp-like character range or set, a peg-specific construct resembling an Emacs Lisp function call, a reference to another rule, or a combination of any of these. A grammar is expressed as a tree of rules in which one rule is typically treated as a “root” or “entry-point” rule. For instance:
((number sign digit (* digit)) (sign (or "+" "-" "")) (digit [0-9]))
Once defined, grammars can be used to parse text after point in the
current buffer, in a number of ways. The peg-parse
macro is the
simplest:
Match pexs at point.
(peg-parse (number sign digit (* digit)) (sign (or "+" "-" "")) (digit [0-9]))
While this macro is simple it is also inflexible, as the rules must be written directly into the source code. More flexibility can be gained by using a combination of other functions and macros.
Execute body with rules, a list of PEXs, in
effect. Within BODY, parsing is initiated with a call to
peg-run
.
This function accepts a single peg-matcher, which is the result of
calling peg
(see below) on a named rule, usually the entry-point
of a larger grammar.
At the end of parsing, one of failure-function or success-function is called, depending on whether the parsing succeeded or not. If success-function is provided, it should be a function that receives as its only argument an anonymous function that runs all the actions collected on the stack during parsing. By default this anonymous function is simply executed. If parsing fails, a function provided as failure-function will be called with a list of PEG expressions that failed during parsing. By default this list is discarded.
The peg-matcher passed to peg-run
is produced by a call to
peg
:
Convert pexs into a single peg-matcher suitable for passing to
peg-run
.
The peg-parse
example above expands to a set of calls to these
functions, and could be written in full as:
(with-peg-rules ((number sign digit (* digit)) (sign (or "+" "-" "")) (digit [0-9])) (peg-run (peg number)))
This approach allows more explicit control over the “entry-point” of parsing, and allows the combination of rules from different sources.
Individual rules can also be defined using a more defun
-like
syntax, using the macro define-peg-rule
:
Define name as a PEG rule that accepts args and matches pexs at point.
For instance:
(define-peg-rule digit () [0-9])
Arguments can be supplied to rules by the funcall
PEG rule
(see PEX Definitions).
Another possibility is to define a named set of rules with
define-peg-ruleset
:
Define name as an identifier for rules.
(define-peg-ruleset number-grammar '((number sign digit (* digit)) digit ;; A reference to the definition above. (sign (or "+" "-" ""))))
Rules and rulesets defined this way can be referred to by name in
later calls to peg-run
or with-peg-rules
:
(with-peg-rules number-grammar (peg-run (peg number)))
By default, calls to peg-run
or peg-parse
produce no
output: parsing simply moves point. In order to return or otherwise
act upon parsed strings, rules can include actions, see
Parsing Actions.
Next: Parsing Program Source, Previous: Syntax Tables, Up: Emacs Lisp [Contents][Index]