Next: GLR Semantic Actions, Previous: Using GLR on Unambiguous Grammars, Up: Writing GLR Parsers [Contents][Index]
Let’s consider an example, vastly simplified from a C++ grammar.1
%{ #include <stdio.h> int yylex (void); void yyerror (char const *); %} %define api.value.type {char const *} %token TYPENAME ID %right '=' %left '+' %glr-parser %% prog: %empty | prog stmt { printf ("\n"); } ; stmt: expr ';' %dprec 1 | decl %dprec 2 ; expr: ID { printf ("%s ", $$); } | TYPENAME '(' expr ')' { printf ("%s <cast> ", $1); } | expr '+' expr { printf ("+ "); } | expr '=' expr { printf ("= "); } ; decl: TYPENAME declarator ';' { printf ("%s <declare> ", $1); } | TYPENAME declarator '=' expr ';' { printf ("%s <init-declare> ", $1); } ; declarator: ID { printf ("\"%s\" ", $1); } | '(' declarator ')' ;
This models a problematic part of the C++ grammar—the ambiguity between certain declarations and statements. For example,
T (x) = y+z;
parses as either an expr
or a stmt
(assuming that ‘T’ is recognized as a TYPENAME
and
‘x’ as an ID
).
Bison detects this as a reduce/reduce conflict between the rules
expr : ID
and declarator : ID
, which it cannot resolve at the
time it encounters x
in the example above. Since this is a
GLR parser, it therefore splits the problem into two parses, one for
each choice of resolving the reduce/reduce conflict.
Unlike the example from the previous section (see Using GLR on Unambiguous Grammars),
however, neither of these parses “dies,” because the grammar as it stands is
ambiguous. One of the parsers eventually reduces stmt : expr ';'
and
the other reduces stmt : decl
, after which both parsers are in an
identical state: they’ve seen ‘prog stmt’ and have the same unprocessed
input remaining. We say that these parses have merged.
At this point, the GLR parser requires a specification in the
grammar of how to choose between the competing parses.
In the example above, the two %dprec
declarations specify that Bison is to give precedence
to the parse that interprets the example as a
decl
, which implies that x
is a declarator.
The parser therefore prints
"x" y z + T <init-declare>
The %dprec
declarations only come into play when more than one
parse survives. Consider a different input string for this parser:
T (x) + y;
This is another example of using GLR to parse an unambiguous
construct, as shown in the previous section (see Using GLR on Unambiguous Grammars).
Here, there is no ambiguity (this cannot be parsed as a declaration).
However, at the time the Bison parser encounters x
, it does not
have enough information to resolve the reduce/reduce conflict (again,
between x
as an expr
or a declarator
). In this
case, no precedence declaration is used. Again, the parser splits
into two, one assuming that x
is an expr
, and the other
assuming x
is a declarator
. The second of these parsers
then vanishes when it sees +
, and the parser prints
x T <cast> y +
Suppose that instead of resolving the ambiguity, you wanted to see all
the possibilities. For this purpose, you must merge the semantic
actions of the two possible parsers, rather than choosing one over the
other. To do so, you could change the declaration of stmt
as
follows:
stmt: expr ';' %merge <stmt_merge> | decl %merge <stmt_merge> ;
and define the stmt_merge
function as:
static YYSTYPE stmt_merge (YYSTYPE x0, YYSTYPE x1) { printf ("<OR> "); return ""; }
with an accompanying forward declaration in the C declarations at the beginning of the file:
%{ static YYSTYPE stmt_merge (YYSTYPE x0, YYSTYPE x1); %}
With these declarations, the resulting parser parses the first example
as both an expr
and a decl
, and prints
"x" y z + T <init-declare> x T <cast> y z + = <OR>
Bison requires that all of the productions that participate in any particular merge have identical ‘%merge’ clauses. Otherwise, the ambiguity would be unresolvable, and the parser will report an error during any parse that results in the offending merge.
The signature of the merger depends on the type of the symbol. In the
previous example, the merged-to symbol (stmt
) does not have a
specific type, and the merger is
YYSTYPE stmt_merge (YYSTYPE x0, YYSTYPE x1);
However, if stmt
had a declared type, e.g.,
%type <Node *> stmt;
or
%union { Node *node; ... };
%type <node> stmt;
then the prototype of the merger must be:
Node *stmt_merge (YYSTYPE x0, YYSTYPE x1);
(This signature might be a mistake originally, and maybe it should have been ‘Node *stmt_merge (Node *x0, Node *x1)’. If you have an opinion about it, please let us know.)
The sources of an extended version of this example are available in C as examples/c/glr, and in C++ as examples/c++/glr.
Next: GLR Semantic Actions, Previous: Using GLR on Unambiguous Grammars, Up: Writing GLR Parsers [Contents][Index]