Next: Parser-language Macros, Previous: *Matcher, Up: Parser Language [Contents][Index]
The parser language is a declarative language for specifying a
parser procedure. A parser procedure is a procedure that
accepts a single parser-buffer argument and parses some of the input
from the buffer. If the parse is successful, the procedure returns a
vector of objects that are the result of the parse, and the internal
pointer of the parser buffer is advanced past the input that was
parsed. If the parse fails, the procedure returns #f
and the
internal pointer is unchanged. This interface is much like that of a
matcher procedure, except that on success the parser procedure returns
a vector of values rather than #t
.
The *parser
special form is the interface between the parser
language and Scheme.
The operand pexp is an expression in the parser language. The
*parser
expression expands into Scheme code that implements a
parser procedure.
There are several primitive expressions in the parser language. The first two provide a bridge to the matcher language (see *Matcher):
The match
expression performs a match on the parser buffer.
The match to be performed is specified by mexp, which is an
expression in the matcher language. If the match is successful, the
result of the match
expression is a vector of one element: a
string containing that text.
The noise
expression performs a match on the parser buffer.
The match to be performed is specified by mexp, which is an
expression in the matcher language. If the match is successful, the
result of the noise
expression is a vector of zero elements.
(In other words, the text is matched and then thrown away.)
The mexp operand is often a known character or string, so in the
case that mexp is a character or string literal, the
noise
expression can be abbreviated as the literal. In other
words, ‘(noise "foo")’ can be abbreviated just ‘"foo"’.
Sometimes it is useful to be able to insert arbitrary values into the
parser result. The values
expression supports this. The
expression arguments are arbitrary Scheme expressions that are
evaluated at run time and returned in a vector. The values
expression always succeeds and never modifies the internal pointer of
the parser buffer.
The discard-matched
expression always succeeds, returning a
vector of zero elements. In all other respects it is identical to the
discard-matched
expression in the matcher language.
Next there are several combinator expressions. Parameters named pexp are arbitrary expressions in the parser language. The first few combinators are direct equivalents of those in the matcher language.
The seq
expression parses each of the pexp operands in
order. If all of the pexp operands successfully match, the
result is the concatenation of their values (by vector-append
).
The alt
expression attempts to parse each pexp operand in
order from left to right. The first one that successfully parses
produces the result for the entire alt
expression.
Like the alt
expression in the matcher language, this
expression participates in backtracking.
The *
expression parses zero or more occurrences of pexp.
The results of the parsed occurrences are concatenated together (by
vector-append
) to produce the expression’s result.
Like the *
expression in the matcher language, this expression
participates in backtracking.
The *
expression parses one or more occurrences of pexp.
It is equivalent to
(seq pexp (* pexp))
The *
expression parses zero or one occurrences of pexp.
It is equivalent to
(alt pexp (seq))
The next three expressions do not have equivalents in the matcher language. Each accepts a single pexp argument, which is parsed in the usual way. These expressions perform transformations on the returned values of a successful match.
The transform
expression performs an arbitrary transformation
of the values returned by parsing pexp. Expression is a
Scheme expression that must evaluate to a procedure at run time. If
pexp is successfully parsed, the procedure is called with the
vector of values as its argument, and must return a vector or
#f
. If it returns a vector, the parse is successful, and those
are the resulting values. If it returns #f
, the parse fails
and the internal pointer of the parser buffer is returned to what it
was before pexp was parsed.
For example:
(transform (lambda (v) (if (= 0 (vector-length v)) #f v)) …)
The encapsulate
expression transforms the values returned by
parsing pexp into a single value. Expression is a Scheme
expression that must evaluate to a procedure at run time. If
pexp is successfully parsed, the procedure is called with the
vector of values as its argument, and may return any Scheme object.
The result of the encapsulate
expression is a vector of length
one containing that object. (And consequently encapsulate
doesn’t change the success or failure of pexp, only its value.)
For example:
(encapsulate vector->list …)
The map
expression performs a per-element transform on the
values returned by parsing pexp. Expression is a Scheme
expression that must evaluate to a procedure at run time. If
pexp is successfully parsed, the procedure is mapped (by
vector-map
) over the values returned from the parse. The
mapped values are returned as the result of the map
expression.
(And consequently map
doesn’t change the success or failure of
pexp, nor the number of values returned.)
For example:
(map string->symbol …)
Finally, as in the matcher language, we have sexp
and
with-pointer
to support embedding Scheme code in the parser.
The sexp
expression allows arbitrary Scheme code to be embedded
inside a parser. The expression operand must evaluate to a
parser procedure at run time; the procedure is called to parse the
parser buffer. This is the parser-language equivalent of the
sexp
expression in the matcher language.
The case in which expression is a symbol is so common that it has an abbreviation: ‘(sexp symbol)’ may be abbreviated as just symbol.
The with-pointer
expression fetches the parser buffer’s
internal pointer (using get-parser-buffer-pointer
), binds it to
identifier, and then parses the pattern specified by pexp.
Identifier must be a symbol. This is the parser-language
equivalent of the with-pointer
expression in the matcher
language.
Next: Parser-language Macros, Previous: *Matcher, Up: Parser Language [Contents][Index]