Parsing Actions (GNU Emacs Lisp Reference Manual)

37.2 Parsing Actions

By default the process of parsing simply moves point in the current buffer, ultimately returning t if the parsing succeeds, and nil if it doesn’t. It’s also possible to define parsing actions that can run arbitrary Elisp at certain points in the parsed text. These actions can optionally affect something called the parsing stack, which is a list of values returned by the parsing process. These actions only run (and only return values) if the parsing process ultimately succeeds; if it fails the action code is not run at all.

Actions can be added anywhere in the definition of a rule. They are distinguished from parsing expressions by an initial backquote (‘`’), followed by a parenthetical form that must contain a pair of hyphens (‘--’) somewhere within it. Symbols to the left of the hyphens are bound to values popped from the stack (they are somewhat analogous to the argument list of a lambda form). Values produced by code to the right of the hyphens are pushed onto the stack (analogous to the return value of the lambda). For instance, the previous grammar can be augmented with actions to return the parsed number as an actual integer:

(with-peg-rules ((number sign digit (* digit
                                       `(a b -- (+ (* a 10) b)))
                         `(sign val -- (* sign val)))
                 (sign (or (and "+" `(-- 1))
                           (and "-" `(-- -1))
                           (and ""  `(-- 1))))
                 (digit [0-9] `(-- (- (char-before) ?0))))
  (peg-run (peg number)))

There must be values on the stack before they can be popped and returned – if there aren’t enough stack values to bind to an action’s left-hand terms, they will be bound to nil. An action with only right-hand terms will push values to the stack; an action with only left-hand terms will consume (and discard) values from the stack. At the end of parsing, stack values are returned as a flat list.

To return the string matched by a PEX (instead of simply moving point over it), a grammar can use a rule like this:

(one-word
  `(-- (point))
  (+ [word])
  `(start -- (buffer-substring start (point))))

The first action above pushes the initial value of point to the stack. The intervening PEX moves point over the next word. The second action pops the previous value from the stack (binding it to the variable start), then uses that value to extract a substring from the buffer and push it to the stack. This pattern is so common that PEG provides a shorthand function that does exactly the above, along with a few other shorthands for common scenarios:

(substring e): Match PEX e and push the matched string onto the stack.
(region e): Match e and push the start and end positions of the matched region onto the stack.
(replace e replacement): Match e and replaced the matched region with the string replacement.
(list e): Match e, collect all values produced by e (and its sub-expressions) into a list, and push that list onto the stack. Stack values are typically returned as a flat list; this is a way of “grouping” values together.