Next: Grammar Rules, Previous: Outline of a Bison Grammar, Up: Bison Grammar Files [Contents][Index]
Symbols in Bison grammars represent the grammatical classifications of the language.
A terminal symbol (also known as a token kind) represents a
class of syntactically equivalent tokens. You use the symbol in grammar
rules to mean that a token in that class is allowed. The symbol is
represented in the Bison parser by a numeric code, and the yylex
function returns a token kind code to indicate what kind of token has been
read. You don’t need to know what the code value is; you can use the symbol
to stand for it.
A nonterminal symbol stands for a class of syntactically equivalent groupings. The symbol name is used in writing grammar rules. By convention, it should be all lower case.
Symbol names can contain letters, underscores, periods, and non-initial digits and dashes. Dashes in symbol names are a GNU extension, incompatible with POSIX Yacc. Periods and dashes make symbol names less convenient to use with named references, which require brackets around such names (see Named References). Terminal symbols that contain periods or dashes make little sense: since they are not valid symbols (in most programming languages) they are not exported as token names.
There are three ways of writing terminal symbols in the grammar:
%token
. See Token Kind Names.
'+'
is a character token kind. A character token kind
doesn’t need to be declared unless you need to specify its semantic value
data type (see Data Types of Semantic Values), associativity, or precedence
(see Operator Precedence).
By convention, a character token kind is used only to represent a token that
consists of that particular character. Thus, the token kind '+'
is
used to represent the character ‘+’ as a token. Nothing enforces this
convention, but if you depart from it, your program will confuse other
readers.
All the usual escape sequences used in character literals in C can be used
in Bison as well, but you must not use the null character as a character
literal because its numeric code, zero, signifies end-of-input
(see Calling Convention for yylex
). Also, unlike standard C, trigraphs have no
special meaning in Bison character literals, nor is backslash-newline
allowed.
"<="
is a literal string token. A literal string token
doesn’t need to be declared unless you need to specify its semantic
value data type (see Data Types of Semantic Values), associativity, or precedence
(see Operator Precedence).
You can associate the literal string token with a symbolic name as an alias,
using the %token
declaration (see Token Kind Names). If you don’t do
that, the lexical analyzer has to retrieve the token code for the literal
string token from the yytname
table (see Calling Convention for yylex
).
Warning: literal string tokens do not work in Yacc.
By convention, a literal string token is used only to represent a token
that consists of that particular string. Thus, you should use the token
kind "<="
to represent the string ‘<=’ as a token. Bison
does not enforce this convention, but if you depart from it, people who
read your program will be confused.
All the escape sequences used in string literals in C can be used in Bison as well, except that you must not use a null character within a string literal. Also, unlike Standard C, trigraphs have no special meaning in Bison string literals, nor is backslash-newline allowed. A literal string token must contain two or more characters; for a token containing just one character, use a character token (see above).
How you choose to write a terminal symbol has no effect on its grammatical meaning. That depends only on where it appears in rules and on when the parser function returns that symbol.
The value returned by yylex
is always one of the terminal
symbols, except that a zero or negative value signifies end-of-input.
Whichever way you write the token kind in the grammar rules, you write
it the same way in the definition of yylex
. The numeric code
for a character token kind is simply the positive numeric code of the
character, so yylex
can use the identical value to generate the
requisite code, though you may need to convert it to unsigned
char
to avoid sign-extension on hosts where char
is signed.
Each named token kind becomes a C macro in the parser implementation
file, so yylex
can use the name to stand for the code. (This
is why periods don’t make sense in terminal symbols.) See Calling Convention for yylex
.
If yylex
is defined in a separate file, you need to arrange for the
token-kind definitions to be available there. Use the -d option
when you run Bison, so that it will write these definitions into a separate
header file name.tab.h which you can include in the other
source files that need it. See Invoking Bison.
If you want to write a grammar that is portable to any Standard C host, you must use only nonnull character tokens taken from the basic execution character set of Standard C. This set consists of the ten digits, the 52 lower- and upper-case English letters, and the characters in the following C-language string:
"\a\b\t\n\v\f\r !\"#%&'()*+,-./:;<=>?[\\]^_{|}~"
The yylex
function and Bison must use a consistent character set
and encoding for character tokens. For example, if you run Bison in an
ASCII environment, but then compile and run the resulting
program in an environment that uses an incompatible character set like
EBCDIC, the resulting program may not work because the tables
generated by Bison will assume ASCII numeric values for
character tokens. It is standard practice for software distributions to
contain C source files that were generated by Bison in an
ASCII environment, so installers on platforms that are
incompatible with ASCII must rebuild those files before
compiling them.
The symbol error
is a terminal symbol reserved for error recovery
(see Error Recovery); you shouldn’t use it for any other purpose.
In particular, yylex
should never return this value. The default
value of the error token is 256, unless you explicitly assigned 256 to
one of your tokens with a %token
declaration.
Next: Grammar Rules, Previous: Outline of a Bison Grammar, Up: Bison Grammar Files [Contents][Index]