Next: Understanding Your Parser, Up: Debugging Your Parser [Contents][Index]
Solving conflicts is probably the most delicate part of the design of an LR parser, as demonstrated by the number of sections devoted to them in this very documentation. To solve a conflict, one must understand it: when does it occur? Is it because of a flaw in the grammar? Is it rather because LR(1) cannot cope with this grammar?
One difficulty is that conflicts occur in the automaton, and it can be tricky to relate them to issues in the grammar itself. With experience and patience, analysis of the detailed description of the automaton (see Understanding Your Parser) allows one to find example strings that reach these conflicts.
That task is made much easier thanks to the generation of counterexamples, initially developed by Chinawat Isradisaikul and Andrew Myers (see Isradisaikul 2015).
As a first example, see the grammar of Shift/Reduce Conflicts, which features one shift/reduce conflict:
$ bison else.y else.y: warning: 1 shift/reduce conflict [-Wconflicts-sr] else.y: note: rerun with option '-Wcounterexamples' to generate conflict counterexamples
Let’s rerun bison
with the option
-Wcex/-Wcounterexamples:
else.y: warning: 1 shift/reduce conflict [-Wconflicts-sr] else.y: warning: shift/reduce conflict on token "else" [-Wcounterexamples]
Example: "if" expr "then" "if" expr "then" stmt • "else" stmt Shift derivation if_stmt ↳ 3: "if" expr "then" stmt ↳ 2: if_stmt ↳ 4: "if" expr "then" stmt • "else" stmt Example: "if" expr "then" "if" expr "then" stmt • "else" stmt Reduce derivation if_stmt ↳ 4: "if" expr "then" stmt "else" stmt ↳ 2: if_stmt ↳ 3: "if" expr "then" stmt •
This shows two different derivations for one single expression, which proves that the grammar is ambiguous.
As a more delicate example, consider the example grammar of Reduce/Reduce Conflicts, which features a reduce/reduce conflict:
%% sequence: %empty | maybeword | sequence "word" ; maybeword: %empty | "word" ;
Bison generates the following counterexamples:
$ bison -Wcex sequence.y sequence.y: warning: 1 shift/reduce conflict [-Wconflicts-sr] sequence.y: warning: 2 reduce/reduce conflicts [-Wconflicts-rr]
sequence.y: warning: shift/reduce conflict on token "word" [-Wcounterexamples] Example: • "word" Shift derivation sequence ↳ 2: maybeword ↳ 5: • "word" Example: • "word" Reduce derivation sequence ↳ 3: sequence "word" ↳ 1: •
sequence.y: warning: reduce/reduce conflict on tokens $end, "word" [-Wcounterexamples] Example: • First reduce derivation sequence ↳ 1: • Example: • Second reduce derivation sequence ↳ 2: maybeword ↳ 4: •
sequence.y: warning: shift/reduce conflict on token "word" [-Wcounterexamples] Example: • "word" Shift derivation sequence ↳ 2: maybeword ↳ 5: • "word" Example: • "word" Reduce derivation sequence ↳ 3: sequence "word" ↳ 2: maybeword ↳ 4: •
sequence.y:8.3-45: warning: rule useless in parser due to conflicts [-Wother] 8 | %empty { printf ("empty maybeword\n"); } | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Each of these three conflicts, again, prove that the grammar is ambiguous. For instance, the second conflict (the reduce/reduce one) shows that the grammar accepts the empty input in two different ways.
Sometimes, the search will not find an example that can be derived in two ways. In these cases, counterexample generation will provide two examples that are the same up until the dot. Most notably, this will happen when your grammar requires a stronger parser (more lookahead, LR instead of LALR). The following example isn’t LR(1):
%token ID %% s: a ID a: expr expr: %empty | expr ID ','
bison
reports:
ids.y: warning: 1 shift/reduce conflict [-Wconflicts-sr] ids.y: warning: shift/reduce conflict on token ID [-Wcounterexamples]
First example: expr • ID ',' ID $end Shift derivation $accept ↳ 0: s $end ↳ 1: a ID ↳ 2: expr ↳ 4: expr • ID ',' Second example: expr • ID $end Reduce derivation $accept ↳ 0: s $end ↳ 1: a ID ↳ 2: expr •
ids.y:4.4-7: warning: rule useless in parser due to conflicts [-Wother] 4 | a: expr | ^~~~
This conflict is caused by the parser not having enough information to know
the difference between these two examples. The parser would need an
additional lookahead token to know whether or not a comma follows the
ID
after expr
. These types of conflicts tend to be more
difficult to fix, and usually need a rework of the grammar. In this case,
it can be fixed by changing around the recursion: expr: ID | ',' expr
ID
.
Alternatively, you might also want to consider using a GLR parser (see Writing GLR Parsers).
On occasions, it is useful to look at counterexamples in situ: with the automaton report (See Understanding Your Parser, in particular State 8).
Next: Understanding Your Parser, Up: Debugging Your Parser [Contents][Index]