The error recovery mechanism of the Wisent’s parser conforms to the one Bison uses. See (bison)Error Recovery, in the Bison manual for details.
To recover from a syntax error you must write rules to recognize the
special token error
. This is a terminal symbol that is
automatically defined and reserved for error handling.
When the parser encounters a syntax error, it pops the state stack
until it finds a state that allows shifting the error
token.
After it has been shifted, if the old look-ahead token is not
acceptable to be shifted next, the parser reads tokens and discards
them until it finds a token which is acceptable.
Strategies for error recovery depend on the choice of error rules in the grammar. A simple and useful strategy is simply to skip the rest of the current statement if an error is detected:
(statement (( error ?; )) ;; on error, skip until ';' is read )
It is also useful to recover to the matching close-delimiter of an opening-delimiter that has already been parsed:
(primary (( ?{ expr ?} )) (( ?{ error ?} )) ... )
Note that error recovery rules may have actions, just as any other rules can. Here are some predefined hooks, variables, functions or macros, useful in such actions:
The number of parse errors encountered so far.
non-nil
means that the parser is recovering.
This variable only has meaning in the scope of wisent-parse
.
Call the user supplied error reporting function with message msg (see The error reporting function).
For an example of use, See wisent-skip-token.
Resume generating error messages immediately for subsequent syntax errors.
The parser suppress error message for syntax errors that happens shortly after the first, until three consecutive input tokens have been successfully shifted.
Calling wisent-errok
in an action, make error messages resume
immediately. No error messages will be suppressed if you call it in
an error rule’s action.
For an example of use, See wisent-skip-token.
Discard the current lookahead token. This will cause a new lexical token to be read.
In an error rule’s action the previous lookahead token is reanalyzed
immediately. wisent-clearin
may be called to clear this token.
For example, suppose that on a parse error, an error handling routine
is called that advances the input stream to some point where parsing
should once again commence. The next symbol returned by the lexical
scanner is probably correct. The previous lookahead token ought to
be discarded with wisent-clearin
.
For an example of use, See wisent-skip-token.
Abort parsing and save the lookahead token.
Change the region of text matched by the current nonterminal.
start and end are respectively the beginning and end
positions of the region occupied by the group of components associated
to this nonterminal. If start or end values are not a
valid positions the region is set to nil
.
For an example of use, See wisent-skip-token.
List of functions to be called when discarding a lexical token.
These functions receive the lexical token discarded.
When the parser encounters unexpected tokens, it can discards them,
based on what directed by error recovery rules. Either when the
parser reads tokens until one is found that can be shifted, or when an
semantic action calls the function wisent-skip-token
or
wisent-skip-block
.
For language specific hooks, make sure you define this as a local
hook.
For example, in Semantic, this hook is set to the function
wisent-collect-unmatched-syntax
to collect unmatched lexical
tokens (see Useful functions).
Skip the lookahead token in order to resume parsing.
Return nil
.
Must be used in error recovery semantic actions.
It typically looks like this:
(wisent-message "%s: skip %s" $action (wisent-token-to-string wisent-input)) (run-hook-with-args 'wisent-discarding-token-functions wisent-input) (wisent-clearin) (wisent-errok)))
Safely skip a block in order to resume parsing.
Return nil
.
Must be used in error recovery semantic actions.
A block is data between an open-delimiter (syntax class (
) and
a matching close-delimiter (syntax class )
):
(a parenthesized block) [a block between brackets] {a block between braces}
The following example uses wisent-skip-block
to safely skip a
block delimited by ‘LBRACE’ ({
) and ‘RBRACE’
(}
) tokens, when a syntax error occurs in
‘other-components’:
(block ((LBRACE other-components RBRACE)) ((LBRACE RBRACE)) ((LBRACE error) (wisent-skip-block)) )