Next: C++ Bison Interface, Up: C++ Parsers [Contents][Index]
This tutorial about C++ parsers is based on a simple, self contained example.7 The following sections are the reference manual for Bison with C++, the last one showing a fully blown example (see A Complete C++ Example).
To look nicer, our example will be in C++14. It is not required: Bison supports the original C++98 standard.
A Bison file has three parts. In the first part, the prologue, we start by making sure we run a version of Bison which is recent enough, and that we generate C++.
%require "3.2" %language "c++"
Let’s dive directly into the middle part: the grammar. Our input is a simple list of strings, that we display once the parsing is done.
%%
result: list { std::cout << $1 << '\n'; } ;
%nterm <std::vector<std::string>> list;
list: %empty { /* Generates an empty string list */ } | list item { $$ = $1; $$.push_back ($2); } ;
We used a vector of strings as a semantic value! To use genuine C++ objects as semantic values—not just PODs—we cannot rely on the union that Bison uses by default to store them, we need variants (see C++ Variants):
%define api.value.type variant
Obviously, the rule for result
needs to print a vector of strings.
In the prologue, we add:
%code { // Print a list of strings. auto operator<< (std::ostream& o, const std::vector<std::string>& ss) -> std::ostream& { o << '{'; const char *sep = "";
for (const auto& s: ss) { o << sep << s; sep = ", "; }
return o << '}'; } }
You may want to move it into the yy
namespace to avoid leaking it in
your default namespace. We recommend that you keep the actions simple, and
move details into auxiliary functions, as we did with operator<<
.
Our list of strings will be built from two types of items: numbers and strings:
%nterm <std::string> item; %token <std::string> TEXT; %token <int> NUMBER;
item: TEXT | NUMBER { $$ = std::to_string ($1); } ;
In the case of TEXT
, the implicit default action applies: $$ = $1
.
Our scanner deserves some attention. The traditional interface of
yylex
is not type safe: since the token kind and the token value are
not correlated, you may return a NUMBER
with a string as semantic
value. To avoid this, we use token constructors (see Complete Symbols). This directive:
%define api.token.constructor
requests that Bison generates the functions make_TEXT
and
make_NUMBER
, but also make_YYEOF
, for the end of input.
Everything is in place for our scanner:
%code { namespace yy { // Return the next token. auto yylex () -> parser::symbol_type { static int count = 0; switch (int stage = count++) {
case 0: return parser::make_TEXT ("I have three numbers for you.");
case 1: case 2: case 3: return parser::make_NUMBER (stage);
case 4: return parser::make_TEXT ("And that's all!");
default: return parser::make_YYEOF ();
} } } }
In the epilogue, the third part of a Bison grammar file, we leave simple details: the error reporting function, and the main function.
%% namespace yy { // Report an error to the user. auto parser::error (const std::string& msg) -> void { std::cerr << msg << '\n'; } } int main () { yy::parser parse; return parse (); }
Compile, and run!
$ bison simple.yy -o simple.cc $ g++ -std=c++14 simple.cc -o simple
$ ./simple {I have three numbers for you., 1, 2, 3, And that's all!}
Next: C++ Bison Interface, Up: C++ Parsers [Contents][Index]