A Compilation
object manages the classes, methods, and
temporary state generated as a result of compiling a
single top-level ModuleExp
.
class Compilation
{ ...;
ClassType[] classes;
boolean immediate;
public ClassType addClass
(LambdaExp lexp, String name)
{ ... }
public ClassType(ModuleExp exp, ...)
{ ...; addClass (exp, ...); }
}
Each Compilation
may create one or more ClassType
objects, each of which generates
the bytecodes for one class. Each ClassType
is generated
from a LambdaExp
, including the top ModuleExp
.
The boolean immediate
is true if we are compiling for immediate loading,
and is false if the target is one or more .class
files.
The addClass
method does all the work to
compile a given LambdaExp
. It creates a ClassType
,
adds it to Compilation
's classes
array,
and generates Method
objects for the constructor and the
main apply
method.
Once the X
apply
X
Method
has been created, addClass
emits some bytecodes to set up the
incoming parameters, and then invokes the virtual compile
method on the body of the LambdaExp
, which generates
the code that does the actual work of the procedure.
The Compilation
constructor gets a ModuleExp
,
which it passes to addClass
. The compile
method of LambdaExp
(which gets called for all lambda
s
except the dummy top-level) also calls addClass
to generate
the class corresponding to the lambda
, and then it emits
instructions to create a new instance of the generated Procedure
class,
and pushes it on the Java stack.
Most operations in the Java VM leave their result on the VM stack,
where they are available for succeeding operations.
The obvious and general way to compile an expression is therefore to generate
bytecode instructions that leave the result (in the form
of a Object
reference) on the stack.
This handles most cases quite well, but we can do better.
We specify a Target
parameter when invoking
the compile
method; the Target
specifies where to leave the result.
public abstract class Target
{ ...;
public abstract void compileFromStack
(Compilation comp, Type stackType);
public static final Target Ignore
= new IgnoreTarget();
}
An Expression
's compile
method
does not have to handle all the kinds of Target
s,
as long as it can generate code to leave the result on the VM stack,
and then invoke compileFromStack
,
which is responsible for moving the result to the actual target.
The simplest Target
is an
IgnoreTarget
. It is used when the
result of an expression will be ignored, but we still need to evaluate
it for possible side-effects. The implementation of
IgnoreTarget.compileFromStack
just emits
an instruction to pop a value from the VM stack.
Expressions that have no side-effects can check if the target
is an IgnoreTarget
, and then immediately
return. This saves a useless push-pop pair.
The usual Target
is an
StackTarget
. This specifies that an expression
should leave the result on the VM stack.
Normally, the type of the result is Object
,
but a StackTarget
can specify some other
expected type, when that can be determined.
The implementation of StackTarget.compileFromStack
is also trivial: If the type of the result on the stack is a sub-type
of the expected target type, nothing needs to be done;
otherwise, it generates code to do the type conversion.
Things get more interesting when we come to
ConditionalTarget
.
public class ConditionalTarget
extends Target
{ ...;
public Label ifTrue, ifFalse;
}
A ConditionalTarget
is used when compiling
the test expression in a conditional. The expression is
evaluated as a boolean value; if the result is true,
control transfers to ifTrue
; otherwise
control transfers to ifFalse
.
Using ConditionalTarget
makes it straight-forward
to generate optimal code for nested conditionals, including
and
and or
macros, and
(when inlining) functions such as not
and eq?
.
The ClassType
and Method
classes are in a separate
gnu.bytecode
package, which is an intermediate-level
interface to code generation and Java .class
files.
It is essentially independent of Scheme or the rest of Kawa.
class ClassType extends Type
{ ...;
CpoolEntry[] constant_pool;
Method methods; // List of methods.
Field fields; // List of fields.
public Field addField
(String name, Type type, int flags)
{ ...Create new field... }
public method addMethod(String name,...)
{ ...Create new method... }
public void writeToStream
(OutputStream stream) { ... }
public void writeToFile(String filename)
{ ... }
public byte[] writeToArray()
{ ... }
}
The ClassType
class is the main class of the bytecode
package.
It manages a list Field
s, a list of Method
s,
and the constant pool.
There are utility methods for adding new fields, methods, and constant
pool entries.
When the ClassType
has been fully built, the
writeToFile
method
can be used to write out the contents into a file.
The result has the format of a .class
file [JavaVMSpec].
Alternatively, the class can be written
to an internal byte array (that has the same layout as a .class
file) using the writeToArray
method.
The resulting byte array may be used by a ClassLoader
to define a new class for immediate execution.
Both of the these methods are implemented on top of the more
general writeToStream
.
Each method is represented by a Method
object.
class Method implements AttrContainer
{ ...;
Type[] arg_types;
Type return_type;
Attribute attributes;
}
An AttrContainer
is an object that
contains zero or more Attribute
s.
The Java .class
file format is quite extensible.
Much of the information is stored in named attributes.
There are standard attributes, but an application can also define
new ones (that are supposed to be ignored by applications that
do not understand them). Each class file may have a set of top-level
attributes. In addition, each field and method may have attributes.
Some standard attributes may have nested sub-attributes.
public abstract class Attribute
{ ...;
AttrContainer container;
String name;
}
An Attribute
's container
specifies who owns the attribute.
The attribute also has a name
, plus methods
to gets its size, write it out, etc.
The most interesting (and large) standard Attribute
occurs in a method and has the name "Code"
. It
contains the actual bytecode instructions of a non-native non-abstract method,
and we represent it using CodeAttr
.
class CodeAttr extends Attribute
{ ...;
Variable addLocal(Type t, String name)
{ ... }
public void emitLoad(Variable var)
{ ... }
public void emitPushInt(int i)
{ ... }
public void putLineNumber(int lineno)
{ ... }
}
As an example of the level of functionality,
emitPushInt
compiles code to push an integer
i
on stack.
It selects the right instruction, and if i
is too
big for one of the instructions that take an inline value,
it will create a constant pool entry for i
,
and push that.
The method addLocal
creates a new local variable
(and makes sure debugging information is emitted for it),
while emitLoad
pushes the value of
the variable on the stack.
Kawa calls putLineNumber
to indicate that the current location
corresponds to a given line number. These are emitted in the
.class file
, and most Java interpreters will use them
when printing a stack trace.
We use gnu.bytecode
mainly for generating
.class
files, but it also has classes to
read .class
files, and also classes to print
a ClassType
in readable format.
The combination makes for a decent Java dis-assembler.
There are other toolkits for creating or analyzing
.class
files, but gnu.bytecode
was written to provide a lot of support for code generation
while having little overhead. For example, some assemblers
represent each instruction using an Instruction
instance, whereas CodeAttr
just stores all
the instruction in a byte array. Using a linked list of
Instruction
s may be more “object-oriented”,
and it does make it easier to do peep-hole optimizations,
but the time and space overhead compared to using an array of bytes is huge.
(If you do need to do peephole optimizations, then it makes
sense to use a doubly-linked list of Instruction
s,
but to use that in conjunction with CodeAttr
.
You will in any case want a byte-array representation for input and output.)
A Scheme quoted form or self-evaluating form expands to a QuoteExp
.
Compiling a QuoteExp
would seem a trivial exercise, but it is not.
There is no way to embed (say) a list literal in Java code.
Instead we create a static field in the top-level class
for a each (different) QuoteExp
in the body we are compiling.
The code compiled for a QuoteExp
then just needs
to load the value from the corresponding static field.
The tricky part is making sure that the static field gets
initialized when the top-level class is loaded to the
value of the literal. This is easy when compiling for immediate
execution: after the compiled class has been loaded and initialized, we use
reflection to set the field to the literal value. When compiling
to a class file, things are harder.
The basic idea is that for:
(define (foo) '(3 . 4))
we compile the equivalent of:
class foo extends Procedure0 { Object static lit1 = Pair.make(IntNum.make(3), IntNum.make(4)); public Object apply0() { return lit1; } }
When the compiled class foo
is loaded, we do:
Class fooCl = Class.forName("foo"); Procedure fooPr = (Procedure) fooCl.newInstance (); // Using foo: Object result = fooPr.apply0 ();
How does the Kawa compiler generate the appropriate Pair.make
expression as shown above? In earlier versions of Kawa, a class whose
instances could be literals implemented
the Compilable
interface. Then the compiler just called
the methods in Compilable
, and these would generate the
code needed to re-create the literal.
This had the advantage that the compiler was not limited to a few
classes of literals it knew about. One problem is that it caused
cross-package dependencies, since any class that could be a literal
has to implement gnu.expr.Compilable
.
Another problem was that writing the Compilable
methods required knowing how gnu.bytecode
works.
The key insight is that saving compile-time literals and then restoring
them at run-time is a kind of “object persistence”, similar
to that offered by Java serialization. One option considered was to
use standard serialization, but that required some way to store the
serialized data in a class file.
The article Long-Term
Persistence for JavaBeans suggested an alternative.
I realized that using Externalizable
classes provided
the more abstract serialization we needed.
Let us start with a summary: An object that can be a literal must implement
the java.io.Externalizable
interface. When the
compiler need to generate a reference to a literal value, it calls the
object's writeExternal
method. The
writeExternal
method expects an argument that
implements the java.io.ObjectOutput
interface.
The actual argument is a LitTable
object owned
by the compiler. A writeExternal
method
can call the standard ObjectOutput
methods, such
as writeObject
, writeInt
,
and so on. The LitTable
remembers the
values that were passed to it in these writeObject
,
writeInt
, etc calls, and creates an
“argument list” from those values. When the
writeExternal
returns, the LitTable
will take the argument list, and look for a matching constructor in
the literal's class, using reflection. It will also look for methods named
make
or set
.
If found, the compiler will generate a call to the matching method.
Simple values in the argument list will be pushed on the JVM stack directly;
object references will use values generated using previous
calls to writeExternal
.
Here are more details on how the compiler selects which method
to invoke to re-create an object.
If the compiler detects a recursive writeObject
call
with the object as an argument while writeExternal
is called on the object, then the object cyclically references itself.
In that case, the compiler must take care to construct the object so
that the cycle gets reproduced at run-time.
This is done by first generating a call to the default constructor,
and saving the result in a static field. The argument list is evaluated,
with recursive references using the saved static field. Then we
call a method to properly initialize the object using the evaluated
argument list. The compiler looks for a method named set
with a parameter list matching the argument list. If there is no such
method, the compiler gives up. (In the future, we could use JavaBeans
introspection, or look for a Java2 ObjectStreamField
to
determine how to set the properties.)
If an object does not have a cycle, the compiler first looks for a
matching static method named make
. If it
does not find one, it looks a matching constructor. In the latter
case, the compiler must also check to see if there is
a zero-argument readResolve
method, which Java 2
serialization uses to replace an object by a canonical object.
(Kawa does not require Java 2, but LitTable
does respect this Java 2 extension to serialization.)
As a last resort, the compiler looks for a default constructor
followed by set
, as in the cycle case.
The writeExternal
method of some classes
may generate a variable number of values.
Therefore, if a constructor or method takes an array parameter, then
LitTable
will consider it as matching the
argument list if the latter contains an int
(written using writeInt
) followed by that
number of arguments of the component type of the array.
(One could argue that this feature is an unneeded kludge, since
there is the alternative that writeExternal
could just call writeObject
with a single array argument, instead of calling
writeObject
a variable number of times.
However, that would expose the internal array into the externalization
protocol, which is wrong.)
Advantages of this implementation include:
No explicit dependency on any class in the the gnu.expr
package. All a class needs to be used in literals is to implement
Externalizable
, and provide a matching method
for creating object instances.
No need to write special code to handle literals.
Literals are re-created using efficient code at class initialization time.
Automatically handles cycles and duplicate references. (Standard Scheme does not support self-referential constants, but for example Common Lisp does. See section 25.1.4 “Similarity of Constants” in [CommonLisp2].)
Only requires standard JDK 1.1 features.