Next: , Previous: , Up: Writing GLR Parsers   [Contents][Index]


1.5.2 Using GLR to Resolve Ambiguities

Let’s consider an example, vastly simplified from a C++ grammar.1

%{
  #include <stdio.h>
  int yylex (void);
  void yyerror (char const *);
%}

%define api.value.type {char const *}

%token TYPENAME ID

%right '='
%left '+'

%glr-parser

%%

prog:
  %empty
| prog stmt   { printf ("\n"); }
;

stmt:
  expr ';'  %dprec 1
| decl      %dprec 2
;

expr:
  ID               { printf ("%s ", $$); }
| TYPENAME '(' expr ')'
                   { printf ("%s <cast> ", $1); }
| expr '+' expr    { printf ("+ "); }
| expr '=' expr    { printf ("= "); }
;

decl:
  TYPENAME declarator ';'
                   { printf ("%s <declare> ", $1); }
| TYPENAME declarator '=' expr ';'
                   { printf ("%s <init-declare> ", $1); }
;

declarator:
  ID               { printf ("\"%s\" ", $1); }
| '(' declarator ')'
;

This models a problematic part of the C++ grammar—the ambiguity between certain declarations and statements. For example,

T (x) = y+z;

parses as either an expr or a stmt (assuming that ‘T’ is recognized as a TYPENAME and ‘x’ as an ID). Bison detects this as a reduce/reduce conflict between the rules expr : ID and declarator : ID, which it cannot resolve at the time it encounters x in the example above. Since this is a GLR parser, it therefore splits the problem into two parses, one for each choice of resolving the reduce/reduce conflict. Unlike the example from the previous section (see Using GLR on Unambiguous Grammars), however, neither of these parses “dies,” because the grammar as it stands is ambiguous. One of the parsers eventually reduces stmt : expr ';' and the other reduces stmt : decl, after which both parsers are in an identical state: they’ve seen ‘prog stmt’ and have the same unprocessed input remaining. We say that these parses have merged.

At this point, the GLR parser requires a specification in the grammar of how to choose between the competing parses. In the example above, the two %dprec declarations specify that Bison is to give precedence to the parse that interprets the example as a decl, which implies that x is a declarator. The parser therefore prints

"x" y z + T <init-declare>

The %dprec declarations only come into play when more than one parse survives. Consider a different input string for this parser:

T (x) + y;

This is another example of using GLR to parse an unambiguous construct, as shown in the previous section (see Using GLR on Unambiguous Grammars). Here, there is no ambiguity (this cannot be parsed as a declaration). However, at the time the Bison parser encounters x, it does not have enough information to resolve the reduce/reduce conflict (again, between x as an expr or a declarator). In this case, no precedence declaration is used. Again, the parser splits into two, one assuming that x is an expr, and the other assuming x is a declarator. The second of these parsers then vanishes when it sees +, and the parser prints

x T <cast> y +

Suppose that instead of resolving the ambiguity, you wanted to see all the possibilities. For this purpose, you must merge the semantic actions of the two possible parsers, rather than choosing one over the other. To do so, you could change the declaration of stmt as follows:

stmt:
  expr ';'  %merge <stmt_merge>
| decl      %merge <stmt_merge>
;

and define the stmt_merge function as:

static YYSTYPE
stmt_merge (YYSTYPE x0, YYSTYPE x1)
{
  printf ("<OR> ");
  return "";
}

with an accompanying forward declaration in the C declarations at the beginning of the file:

%{
  static YYSTYPE stmt_merge (YYSTYPE x0, YYSTYPE x1);
%}

With these declarations, the resulting parser parses the first example as both an expr and a decl, and prints

"x" y z + T <init-declare> x T <cast> y z + = <OR>

Bison requires that all of the productions that participate in any particular merge have identical ‘%merge’ clauses. Otherwise, the ambiguity would be unresolvable, and the parser will report an error during any parse that results in the offending merge.


The signature of the merger depends on the type of the symbol. In the previous example, the merged-to symbol (stmt) does not have a specific type, and the merger is

YYSTYPE stmt_merge (YYSTYPE x0, YYSTYPE x1);

However, if stmt had a declared type, e.g.,

%type <Node *> stmt;

or

%union {
  Node *node;
  ...
};
%type <node> stmt;

then the prototype of the merger must be:

Node *stmt_merge (YYSTYPE x0, YYSTYPE x1);

(This signature might be a mistake originally, and maybe it should have been ‘Node *stmt_merge (Node *x0, Node *x1)’. If you have an opinion about it, please let us know.)


Footnotes

(1)

The sources of an extended version of this example are available in C as examples/c/glr, and in C++ as examples/c++/glr.


Next: GLR Semantic Actions, Previous: Using GLR on Unambiguous Grammars, Up: Writing GLR Parsers   [Contents][Index]