Matt Austern, Robert Bowdidge, Geoff Keating
The stree project is based on three fundamental premises. First:
for an important class of development tasks (roughly: GUI programs
written in a relatively simple subset of C++, compiled at -O0
-g
), compilation time is dominated by the C++ front end. Second:
the performance of the C++ front end is dominated by memory allocation
and management. This includes memory allocation, initializing newly
allocated objects, and bookkeeping for garbage collection. Reducing
front end memory usage should thus improve front end
performance. Third: many programs consist of small source files that
include truly enormous header files. Such header files
include <iostream>
(25,000 lines),
Apple's <Carbon/Carbon.h>
(91,000 lines), and the X11
headers. Any given translation unit only uses a tiny fraction of the
declarations in one of these headers.
The goal of this project is to reduce the time and memory required for handling unused declarations.
The main idea of the stree project is to avoid generating decl trees when possible. Instead the parser will generate a compact flat representation for declarations, called an stree, and expand the stree to a decl tree when necessary. Strees are not a substitute for trees. The middle-end and back end will still understand trees, not strees.
Some immediate implications of this basic idea:
void do_nothing();
(which the middle end and back
end don't necessarily have to know about), but we would generate
a full tree for void >do_nothing() { }
(which the
middle end has to expand to RTL).xyzzy
is declared but nobody ever defines it,
takes its address, or calls a function with that name, then
that declaration isn't needed, and generating a decl tree for
it is wasted time and space.cxx_binding
object. (See name_lookup.h
). So we just need to
make cxx_binding
a little bit more complicated: when
we ask a cxx_binding
for the value of the binding, we
check to see whether we have a tree or an stree. If the
latter, we expand it into a tree and cache the expanded
version.current_binding_level
,
current_class_ptr
,
etc.)Consider the front end data structure for a simple enumeration
declaration, enum foo { a, b };
. We have two enumerators.
For each one we need to know its name, its type, the underlying integer
type used to represent it, and its value. At present we represent
enumerators with CONST_DECL
nodes, so each enumerator takes
128 bytes for the tree_decl
node, plus additional memory
for cp-tree.h
's version of lang_decl
.
Each enumerator has an entry in the hash table, an identifier. Each
identifier has a pointer to a binding of type cxx_binding
(this is the bindings field in lang_identifier
, defined in
name_lookup.h
). The binding for foo
itself
points to a tree_type
, and the bindings for a
and b
point to CONST_DECL
nodes. Each CONST_DECL
node
has pointers to the name and to the ENUMERAL_TYPE
node, and
additionally has a pointer to a node representing the enumerator's
value. In simple examples like this one each enumerator's value is
an INTEGER_CST
, giving us another 36 bytes
each. (An INTEGER_CST
node contains a tree_common
subobject, with all the generality that implies.)
We don't need 200 bytes to represent the fact that the
enumerator a
has the value 0. First: as an stree it's
unnecessary to store a pointer to the name of this enumerator. The
stree will only be accessed via a cxx_binding
, so any code
that accesses the stree already knows the name. Second: it isn't
necessary to use anything so large as an INTEGER_CST
to
represent the value "0". Most of the information stored in
an INTEGER_CST
(chain nodes, type pointers, etc.) is
unnecessary, since we already know we're getting to the value through
an enumerator. We only need to store two pieces of information: the
enumeration that this enumerator is associated with, and its initial
value. This allows us to represent the enumerator in six bytes: a
one-byte code for the type of the stree (specifically:
the TREE_CODE
of the tree that this stree corresponds to),
four bytes (a pointer or the equivalent) for the enumeration, and one
byte for the value. Note that this implies a variable-width encoding
for the integer values; some enumerations will require seven or more
bytes.
Our current implementation is limited to enumerations defined at
namespace scope. First, enumerations defined at class scope require
additional context information. Second, enumerators declared at class
scope might have values that depend on template parameters, meaning
that we can't necessarily represent the values as simple
integers. Neither is a serious problem. Because
a cxx_binding
's value can be either a tree or stree, we can
use strees for the common, simple cases, and default to trees
otherwise. Because strees are a variable-sized representation, we can
add additional values needed for building trees for the complex case
as needed without bloating the simpler cases.
The stree data structure itself is defined
in stree.[ch]
. Strees are tightly-packed, serialized
representations of simple declarations
Strees are stored on the gc heap, but not directly: instead, they are stored in multi-page blocks of virtual memory ("chunks"), where a single chunk may contain multiple strees. Each stree is represented by an index; a separate table maps each index to the appropriate chunk and position within that chunk. We thus don't traffic in pointers to strees, but rather in integer indices referencing a location in memory. Storing strees in this manner avoids creating new objects and additional work for the garbage collector, and simplifies precompiled headers by ensuring that the chunks don't need to be placed at a specific address or when reloaded—only the table pointers need to be swizzled.
Clients
access stree data via an iterator: given an
stree with index s
, the function get_s_tree_iter
(declared in
stree.h
) creates an iterator pointing to the beginning
of s
. Other functions declared in stree.h
access the
iterator to extract each serialized value in turn. This scheme allows
us to store data in the most compressed representation possible, and
in a way such that clients are insulated from the details of the
representation. For enumerators, for example, instead of using a
full INTEGER_CST
for each value, we can use one or two bytes
in the (typical) case where the values are small.
Strees are created with build_s_tree
, a varargs function
defined in stree.c
. Its first argument is the stree code, and
its remaining arguments are the contents of that stree and tags to
identify their types. There is no function for creating an stree by
treating it as a "stream" to which values are written one at a time;
eventually there probably will need to be one. It won't be hard to add
it.
The files stree.h
and stree.c
are
language-independent, since, at bottom, strees are just a way of
packing bytes and integers into chunks. Creation and expansion of
strees are language dependent. The present implementation is focused
on C++.
We change cxx_binding::value
from
type tree
to type s_tree_i_or_tree
(a
tagged union), and we change IDENTIFIER_VALUE
so that it
returns the tree value, expanding
the stree if necessary. A few changes are required in functions that
manipulate cxx_binding
directly, but those changes are
largely mechanical and are localized
to cp/name_lookup.[ch]
.
Strees are expanded by the s_tree_to_tree
function,
defined in cp/decl.c
. There are three points to notice about
it. First, as described above, it uses the stree iterator
interface. Second, the first byte of the stree is the stree
code; s_tree_to_tree
uses that code to determine what kind
of tree to create. Third, at present s_tree_to_tree
doesn't
handle any cases other than enumerators.
The major changes required to use strees for enumerators are
in build_enumerator
. First, we need to separate parsing and
error checking from tree generation, deferring the latter until
later. Second, for simple cases we use build_s_tree
to create
the stree and push_s_decl
to enter the stree into the current
lexical scope. In principle push_s_decl
would need to know
all of the same logic that pushdecl
does; in practice we only
use push_s_decl
for the simplest cases, deferring
to pushdecl
(i.e. using trees instead of strees) for the more
complicated cases.
This design has the virtue that most of the C++ front end doesn't
have to know about strees: code that goes through bindings to get trees
looks exactly as before. It has the defect that, as presently written,
it requires code duplication. The code required to generate an
enumerator node is in both build_enumerator
and s_tree_to_tree
. Additionally, s_tree_to_tree
is manageable only because at the moment it only handles a single
case. If this project succeeds, and we're handling many kinds of
strees, it would become a monstrosity. The right solution will
probably be to replace s_tree_to_tree
with a wrapper function
that examines the stree code and dispatches to a function for the
appropriate code, and, for each code, to write an implementation
function that's shared between the tree and stree versions. Similarly,
we can probably achieve better code sharing between pushdecl
and push_s_decl
.
At present the compiler will not generate debugging information for unexpanded strees. This is potentially a serious issue. In principle, there are two ways of dealing with this issue: either figure out a way to generate debugging information without expanding strees, or else decide that it's acceptable to omit debugging information for "unused" declarations. (Note that by "unused" we mean declarations that are irrelevant to the compilation of the code, rather than the weaker definition of "never executed". As soon as a declaration's name is seen elsewhere in the code, we create a decl tree node for the name.)
We don't believe this will be a serious problem. Consider the effect of missing debug information for unused declarations:
More importantly, some gcc versions already remove unneeded
declarations from debug information. GCC 3.4 does not generate DWARF
debug info for function declarations, and does not generate debug info
for unused types unless -fno-eliminate-unused-debug-types
is
specified. Apple's gcc has stripped "unused" symbols out of STABS
debugging format for the last two and a half years. The debugger team
expected many bugs from users trying to examine unused declarations,
but have been surprised at how few bugs they've received. One of the
few complaints was from a user who had a "debug" version of a struct
that was used only for pretty-printing the real structure, and was
stripped out because it was never actually referenced.
cp/decl.c
to avoid code duplication. Or, to put it
differently: develop a general framework for separating the early
phase (parsing and error checking) from the later stages of tree
generation and transformation.tree_or_stree
struct. Instead just
use an ordinary union, and steal one of the unused bits in
cxx_binding
to identify which member we want.stree-branch
from the CVS repository. This branch
is being maintained by Matt Austern, Robert Bowdidge, Geoff Keating
and Mike Stump. Patches should be marked with the
tag [stree]
in the subject line.
Copyright (C) Free Software Foundation, Inc. Verbatim copying and distribution of this entire article is permitted in any medium, provided this notice is preserved.
These pages are maintained by the GCC team. Last modified 2022-10-26.