Counterexamples (Bison 3.8.1)

8.1 Generation of Counterexamples

Solving conflicts is probably the most delicate part of the design of an LR parser, as demonstrated by the number of sections devoted to them in this very documentation. To solve a conflict, one must understand it: when does it occur? Is it because of a flaw in the grammar? Is it rather because LR(1) cannot cope with this grammar?

One difficulty is that conflicts occur in the automaton, and it can be tricky to relate them to issues in the grammar itself. With experience and patience, analysis of the detailed description of the automaton (see Understanding Your Parser) allows one to find example strings that reach these conflicts.

That task is made much easier thanks to the generation of counterexamples, initially developed by Chinawat Isradisaikul and Andrew Myers (see Isradisaikul 2015).

As a first example, see the grammar of Shift/Reduce Conflicts, which features one shift/reduce conflict:

$ bison else.y
else.y: warning: 1 shift/reduce conflict [-Wconflicts-sr]
else.y: note: rerun with option '-Wcounterexamples' to generate conflict counterexamples

Let’s rerun bison with the option -Wcex/-Wcounterexamples:

else.y: warning: 1 shift/reduce conflict [-Wconflicts-sr]
else.y: warning: shift/reduce conflict on token "else" [-Wcounterexamples]

  Example: "if" expr "then" "if" expr "then" stmt • "else" stmt
  Shift derivation
    if_stmt
    ↳ 3: "if" expr "then" stmt
                           ↳ 2: if_stmt
                                 ↳ 4: "if" expr "then" stmt • "else" stmt
  Example: "if" expr "then" "if" expr "then" stmt • "else" stmt
  Reduce derivation
    if_stmt
    ↳ 4: "if" expr "then" stmt                                "else" stmt
                           ↳ 2: if_stmt
                                 ↳ 3: "if" expr "then" stmt •

This shows two different derivations for one single expression, which proves that the grammar is ambiguous.

As a more delicate example, consider the example grammar of Reduce/Reduce Conflicts, which features a reduce/reduce conflict:

%%
sequence:
  %empty
| maybeword
| sequence "word"
;
maybeword:
  %empty
| "word"
;

Bison generates the following counterexamples:

$ bison -Wcex sequence.y
sequence.y: warning: 1 shift/reduce conflict [-Wconflicts-sr]
sequence.y: warning: 2 reduce/reduce conflicts [-Wconflicts-rr]

sequence.y: warning: shift/reduce conflict on token "word" [-Wcounterexamples]
  Example: • "word"
  Shift derivation
    sequence
    ↳ 2: maybeword
          ↳ 5: • "word"
  Example: • "word"
  Reduce derivation
    sequence
    ↳ 3: sequence "word"
          ↳ 1: •

sequence.y: warning: reduce/reduce conflict on tokens $end, "word" [-Wcounterexamples]
  Example: •
  First reduce derivation
    sequence
    ↳ 1: •
  Example: •
  Second reduce derivation
    sequence
    ↳ 2: maybeword
          ↳ 4: •

sequence.y: warning: shift/reduce conflict on token "word" [-Wcounterexamples]
  Example: • "word"
  Shift derivation
    sequence
    ↳ 2: maybeword
          ↳ 5: • "word"
  Example: • "word"
  Reduce derivation
    sequence
    ↳ 3: sequence        "word"
          ↳ 2: maybeword
                ↳ 4: •

sequence.y:8.3-45: warning: rule useless in parser due to conflicts [-Wother]
    8 |   %empty    { printf ("empty maybeword\n"); }
      |   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Each of these three conflicts, again, prove that the grammar is ambiguous. For instance, the second conflict (the reduce/reduce one) shows that the grammar accepts the empty input in two different ways.

Sometimes, the search will not find an example that can be derived in two ways. In these cases, counterexample generation will provide two examples that are the same up until the dot. Most notably, this will happen when your grammar requires a stronger parser (more lookahead, LR instead of LALR). The following example isn’t LR(1):

%token ID
%%
s: a ID
a: expr
expr: %empty | expr ID ','

bison reports:

ids.y: warning: 1 shift/reduce conflict [-Wconflicts-sr]
ids.y: warning: shift/reduce conflict on token ID [-Wcounterexamples]

  First example: expr • ID ',' ID $end
  Shift derivation
    $accept
    ↳ 0: s                                 $end
         ↳ 1: a                        ID
              ↳ 2: expr
                    ↳ 4: expr • ID ','
  Second example: expr • ID $end
  Reduce derivation
    $accept
    ↳ 0: s                   $end
         ↳ 1: a           ID
              ↳ 2: expr •

ids.y:4.4-7: warning: rule useless in parser due to conflicts [-Wother]
    4 | a: expr
      |    ^~~~

This conflict is caused by the parser not having enough information to know the difference between these two examples. The parser would need an additional lookahead token to know whether or not a comma follows the ID after expr. These types of conflicts tend to be more difficult to fix, and usually need a rework of the grammar. In this case, it can be fixed by changing around the recursion: expr: ID | ',' expr ID.

Alternatively, you might also want to consider using a GLR parser (see Writing GLR Parsers).

On occasions, it is useful to look at counterexamples in situ: with the automaton report (See Understanding Your Parser, in particular State 8).