Next: Writing PEG Rules, Previous: PEX Definitions, Up: Parsing Expression Grammars [Contents][Index]
By default the process of parsing simply moves point in the current
buffer, ultimately returning t
if the parsing succeeds, and
nil
if it doesn’t. It’s also possible to define parsing
actions that can run arbitrary Elisp at certain points in the parsed
text. These actions can optionally affect something called the
parsing stack, which is a list of values returned by the parsing
process. These actions only run (and only return values) if the parsing
process ultimately succeeds; if it fails the action code is not run at
all.
Actions can be added anywhere in the definition of a rule. They are distinguished from parsing expressions by an initial backquote (‘`’), followed by a parenthetical form that must contain a pair of hyphens (‘--’) somewhere within it. Symbols to the left of the hyphens are bound to values popped from the stack (they are somewhat analogous to the argument list of a lambda form). Values produced by code to the right of the hyphens are pushed onto the stack (analogous to the return value of the lambda). For instance, the previous grammar can be augmented with actions to return the parsed number as an actual integer:
(with-peg-rules ((number sign digit (* digit `(a b -- (+ (* a 10) b))) `(sign val -- (* sign val))) (sign (or (and "+" `(-- 1)) (and "-" `(-- -1)) (and "" `(-- 1)))) (digit [0-9] `(-- (- (char-before) ?0)))) (peg-run (peg number)))
There must be values on the stack before they can be popped and
returned – if there aren’t enough stack values to bind to an action’s
left-hand terms, they will be bound to nil
. An action with
only right-hand terms will push values to the stack; an action with
only left-hand terms will consume (and discard) values from the stack.
At the end of parsing, stack values are returned as a flat list.
To return the string matched by a PEX (instead of simply moving point over it), a grammar can use a rule like this:
(one-word `(-- (point)) (+ [word]) `(start -- (buffer-substring start (point))))
The first action above pushes the initial value of point to the stack.
The intervening PEX moves point over the next word. The
second action pops the previous value from the stack (binding it to the
variable start
), then uses that value to extract a substring from
the buffer and push it to the stack. This pattern is so common that
PEG provides a shorthand function that does exactly the above,
along with a few other shorthands for common scenarios:
(substring e)
Match PEX e and push the matched string onto the stack.
(region e)
Match e and push the start and end positions of the matched region onto the stack.
(replace e replacement)
Match e and replaced the matched region with the string replacement.
(list e)
Match e, collect all values produced by e (and its sub-expressions) into a list, and push that list onto the stack. Stack values are typically returned as a flat list; this is a way of “grouping” values together.
Next: Writing PEG Rules, Previous: PEX Definitions, Up: Parsing Expression Grammars [Contents][Index]