[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
In this chapter, we describe how to use libjit
with a number of
short tutorial exercises. Full source for these tutorials can be found
in the tutorial
directory of the libjit
source tree.
For simplicity, we will ignore errors such as out of memory conditions, but a real program would be expected to handle such errors.
3.1 Tutorial 1 - mul_add | ||
3.2 Tutorial 2 - gcd | ||
3.3 Tutorial 3 - compiling on-demand | ||
3.4 Tutorial 4 - mul_add, C++ version | ||
3.5 Tutorial 5 - gcd, with tail calls | ||
3.6 Dynamic Pascal - A full JIT example |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
In the first tutorial, we will build and compile the following function
(the source code can be found in tutorial/t1.c
):
int mul_add(int x, int y, int z) { return x * y + z; } |
To use the JIT, we first include the <jit/jit.h>
file:
#include <jit/jit.h> |
All of the header files are placed into the jit
sub-directory,
to separate them out from regular system headers. When libjit
is installed, you will typically find these headers in
/usr/local/include/jit
or /usr/include/jit
, depending upon
how your system is configured. You should also link with the
-ljit
option.
Every program that uses libjit
needs to call jit_context_create
:
jit_context_t context; ... context = jit_context_create(); |
Almost everything that is done with libjit
is done relative
to a context. In particular, a context holds all of the functions
that you have built and compiled.
You can have multiple contexts at any one time, but normally you will only need one. Multiple contexts may be useful if you wish to run multiple virtual machines side by side in the same process, without them interfering with each other.
Whenever we are constructing a function, we need to lock down the context to prevent multiple threads from using the builder at a time:
jit_context_build_start(context); |
The next step is to construct the function object that will represent
our mul_add
function:
jit_function_t function; ... function = jit_function_create(context, signature); |
The signature
is a jit_type_t
object that describes the
function's parameters and return value. This tells libjit
how
to generate the proper calling conventions for the function:
jit_type_t params[3]; jit_type_t signature; ... params[0] = jit_type_int; params[1] = jit_type_int; params[2] = jit_type_int; signature = jit_type_create_signature (jit_abi_cdecl, jit_type_int, params, 3, 1); |
This declares a function that takes three parameters of type
int
and returns a result of type int
. We've requested
that the function use the cdecl
application binary interface (ABI),
which indicates normal C calling conventions. See section Manipulating system types, for
more information on signature types.
Now that we have a function object, we need to construct the instructions in its body. First, we obtain references to each of the function's parameter values:
jit_value_t x, y, z; ... x = jit_value_get_param(function, 0); y = jit_value_get_param(function, 1); z = jit_value_get_param(function, 2); |
Values are one of the two cornerstones of the libjit
process.
Values represent parameters, local variables, and intermediate
temporary results. Once we have the parameters, we compute
the result of x * y + z
as follows:
jit_value_t temp1, temp2; ... temp1 = jit_insn_mul(function, x, y); temp2 = jit_insn_add(function, temp1, z); |
This demonstrates the other cornerstone of the libjit
process:
instructions. Each of these instructions takes two values as arguments
and returns a new temporary value with the result.
Students of compiler design will notice that the above statements look
very suspiciously like the "three address statements" that are described
in compiler textbooks. And that is indeed what they are internally within
libjit
.
If you don't know what three address statements are, then don't worry.
The library hides most of the details from you. All you need to do is
break your code up into simple operation steps (addition, multiplication,
negation, copy, etc). Then perform the steps one at a time, using
the temporary values in subsequent steps. See section Working with instructions in the JIT, for
a complete list of all instructions that are supported by libjit
.
Now that we have computed the desired result, we return it to the caller
using jit_insn_return
:
jit_insn_return(function, temp2); |
We have completed the process of building the function body. Now we compile it into its executable form:
jit_function_compile(function); jit_context_build_end(context); |
As a side-effect, this will discard all of the memory associated with the values and instructions that we constructed while building the function. They are no longer required, because we now have the executable form that we require.
We also unlock the context, because it is now safe for other threads to access the function building process.
Up until this point, we haven't executed the mul_add
function.
All we have done is build and compile it, ready for execution. To execute it,
we call jit_function_apply
:
jit_int arg1, arg2, arg3; void *args[3]; jit_int result; ... arg1 = 3; arg2 = 5; arg3 = 2; args[0] = &arg1; args[1] = &arg2; args[2] = &arg3; jit_function_apply(function, args, &result); printf("mul_add(3, 5, 2) = %d\n", (int)result); |
We pass an array of pointers to jit_function_apply
, each one
pointing to the corresponding argument value. This gives us a very
general purpose mechanism for calling any function that may be
built and compiled using libjit
. If all went well, the
program should print the following:
mul_add(3, 5, 2) = 17 |
You will notice that we used jit_int
as the type of the arguments,
not int
. The jit_int
type is guaranteed to be 32 bits
in size on all platforms, whereas int
varies in size from platform
to platform. Since we wanted our function to work the same everywhere,
we used a type with a predictable size.
If you really wanted the system int
type, you would use
jit_type_sys_int
instead of jit_type_int
when you
created the function's signature. The jit_type_sys_int
type
is guaranteed to match the local system's int
precision.
Finally, we clean up the context and all of the memory that was used:
jit_context_destroy(context); |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
In this second tutorial, we implement the subtracting Euclidean
Greatest Common Divisor (GCD) algorithm over positive integers.
This tutorial demonstrates how to handle conditional branching
and function calls. In C, the code for the gcd
function
is as follows:
unsigned int gcd(unsigned int x, unsigned int y) { if(x == y) { return x; } else if(x < y) { return gcd(x, y - x); } else { return gcd(x - y, y); } } |
The source code for this tutorial can be found in tutorial/t2.c
.
Many of the details are similar to the previous tutorial. We omit
those details here and concentrate on how to build the function body.
See section Tutorial 1 - mul_add, for more information.
We start by checking the condition x == y
:
jit_value_t x, y, temp1; ... x = jit_value_get_param(function, 0); y = jit_value_get_param(function, 1); temp1 = jit_insn_eq(function, x, y); |
This is very similar to our previous tutorial, except that we are using
the eq
operator this time. If the condition is not true, we
want to skip the return
statement. We achieve this with the
jit_insn_branch_if_not
instruction:
jit_label_t label1 = jit_label_undefined; ... jit_insn_branch_if_not(function, temp1, &label1); |
The label must be initialized to jit_label_undefined
. It will be
updated by jit_insn_branch_if_not
to refer to a future position in
the code that we haven't seen yet.
If the condition is true, then execution falls through to the next
instruction where we return x
to the caller:
jit_insn_return(function, x); |
If the condition was not true, then we branched to label1
above.
We fix the location of the label using jit_insn_label
:
jit_insn_label(function, &label1); |
We use similar code to check the condition x < y
, and branch
to label2
if it is not true:
jit_value_t temp2; jit_label_t label2 = jit_label_undefined; ... temp2 = jit_insn_lt(function, x, y); jit_insn_branch_if_not(function, temp2, &label2); |
At this point, we need to call the gcd
function with the
arguments x
and y - x
. The code for this is
fairly straight-forward. The jit_insn_call
instruction calls
the function listed in its third argument. In this case, we are calling
ourselves recursively:
jit_value_t temp_args[2]; jit_value_t temp3; ... temp_args[0] = x; temp_args[1] = jit_insn_sub(function, y, x); temp3 = jit_insn_call (function, "gcd", function, 0, temp_args, 2, 0); jit_insn_return(function, temp3); |
The string "gcd"
in the second argument is for diagnostic purposes
only. It can be helpful when debugging, but the libjit
library
otherwise makes no use of it. You can set it to NULL if you wish.
In general, libjit
does not maintain mappings from names to
jit_function_t
objects. It is assumed that the front end will
take care of that, using whatever naming scheme is appropriate to
its needs.
The final part of the gcd
function is similar to the previous one:
jit_value_t temp4; ... jit_insn_label(function, &label2); temp_args[0] = jit_insn_sub(function, x, y); temp_args[1] = y; temp4 = jit_insn_call (function, "gcd", function, 0, temp_args, 2, 0); jit_insn_return(function, temp4); |
We can now compile the function and execute it in the usual manner.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
In the previous tutorials, we compiled everything that we needed at startup time, and then entered the execution phase. The real power of a JIT becomes apparent when you use it to compile functions only as they are called. You can thus avoid compiling functions that are never called in a given program run, saving memory and startup time.
We demonstrate how to do on-demand compilation by rewriting Tutorial 1.
The source code for the modified version is in tutorial/t3.c
.
When the mul_add
function is created, we don't create its function
body or call jit_function_compile
. We instead provide a
C function called compile_mul_add
that performs on-demand
compilation:
jit_function_t function; ... function = jit_function_create(context, signature); jit_function_set_on_demand_compiler(function, compile_mul_add); |
We can now call this function with jit_function_apply
, and the
system will automatically call compile_mul_add
for us if the
function hasn't been built yet. The contents of compile_mul_add
are fairly obvious:
int compile_mul_add(jit_function_t function) { jit_value_t x, y, z; jit_value_t temp1, temp2; x = jit_value_get_param(function, 0); y = jit_value_get_param(function, 1); z = jit_value_get_param(function, 2); temp1 = jit_insn_mul(function, x, y); temp2 = jit_insn_add(function, temp1, z); jit_insn_return(function, temp2); return 1; } |
When the on-demand compiler returns, libjit
will call
jit_function_compile
and then jump to the newly compiled code.
Upon the second and subsequent calls to the function, libjit
will bypass the on-demand compiler and call the compiled code directly.
Note that in case of on-demand compilation libjit
automatically
locks and unlocks the corresponding context with
jit_context_build_start
and jit_context_build_end
calls.
Sometimes you may wish to force a commonly used function to be recompiled, so that you can apply additional optimization. To do this, you must set the "recompilable" flag just after the function is first created:
jit_function_t function; ... function = jit_function_create(context, signature); jit_function_set_recompilable(function); jit_function_set_on_demand_compiler(function, compile_mul_add); |
Once the function is compiled (either on-demand or up-front) its
intermediate representation built by libjit
is discarded.
To force the function to be recompiled you need to build it again
and call jit_function_compile
after that. As always when
the function is built and compiled manually it is necessary
to take care of context locking:
jit_context_build_start(context); jit_function_get_on_demand_compiler(function)(function); jit_function_compile(function); jit_context_build_end(context); |
After this, any existing references to the function will be redirected to the new version. However, if some thread is currently executing the previous version, then it will keep doing so until the previous version exits. Only after that will subsequent calls go to the new version.
In this tutorial, we use the same on-demand compiler when we
recompile mul_add
. In a real program, you would probably call
jit_function_set_on_demand_compiler
to set a new on-demand
compiler that performs greater levels of optimization.
If you no longer intend to recompile the function, you should call
jit_function_clear_recompilable
so that libjit
can
manage the function more efficiently from then on.
The exact conditions under which a function should be recompiled
are not specified by libjit
. It may be because the function
has been called several times and has reached some threshold.
Or it may be because some other function that it calls has become a
candidate for inlining. It is up to the front end to decide when
recompilation is warranted, usually based on language-specific
heuristics.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
While libjit
can be easily accessed from C++ programs using
the C API's, you may instead wish to use an API that better reflects
the C++ programming paradigm. We demonstrate how to do this by rewriting
Tutorial 3 using the libjitplus
library.
To use the libjitplus
library, we first include
the <jit/jit-plus.h>
file:
#include <jit/jit-plus.h> |
This file incorporates all of the definitions from <jit/jit.h>
,
so you have full access to the underlying C API if you need it.
This time, instead of building the mul_add
function with
jit_function_create
and friends, we define a class to represent it:
class mul_add_function : public jit_function { public: mul_add_function(jit_context& context) : jit_function(context) { create(); set_recompilable(); } virtual void build(); protected: virtual jit_type_t create_signature(); }; |
Where we used jit_function_t
and jit_context_t
before,
we now use the C++ jit_function
and jit_context
classes.
In our constructor, we attach ourselves to the context and then call
the create()
method. This is in turn will call our overridden
virtual method create_signature()
to obtain the signature:
jit_type_t mul_add_function::create_signature() { // Return type, followed by three parameters, // terminated with "end_params". return signature_helper (jit_type_int, jit_type_int, jit_type_int, jit_type_int, end_params); } |
The signature_helper()
method is provided for your convenience,
to help with building function signatures. You can create your own
signature manually using jit_type_create_signature
if you wish.
The final thing we do in the constructor is call set_recompilable()
to mark the mul_add
function as recompilable, just as we did in
Tutorial 3.
The C++ library will create the function as compilable on-demand for
us, so we don't have to do that explicitly. But we do have to override
the virtual build()
method to build the function's body on-demand:
void mul_add_function::build() { jit_value x = get_param(0); jit_value y = get_param(1); jit_value z = get_param(2); insn_return(x * y + z); } |
This is similar to the first version that we wrote in Tutorial 1.
Instructions are created with insn_*
methods that correspond
to their jit_insn_*
counterparts in the C library.
One of the nice things about the C++ API compared to the C API is that we
can use overloaded operators to manipulate jit_value
objects.
This can simplify the function build process considerably when we
have lots of expressions to compile. We could have used insn_mul
and insn_add
instead in this example and the result would have
been the same.
Now that we have our mul_add_function
class, we can create
an instance of the function and apply it as follows:
jit_context context; mul_add_function mul_add(context); jit_int arg1 = 3; jit_int arg2 = 5; jit_int arg3 = 2; jit_int args[3]; args[0] = &arg1; args[1] = &arg2; args[2] = &arg3; mul_add.apply(args, &result); |
See section Using libjit from C++, for more information on the libjitplus
library.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Astute readers would have noticed that Tutorial 2 included two instances
of "tail calls". That is, calls to the same function that are immediately
followed by a return
instruction.
Libjit can optimize tail calls if you provide the JIT_CALL_TAIL
flag to jit_insn_call
. Previously, we used the following code
to call gcd
recursively:
temp3 = jit_insn_call (function, "gcd", function, 0, temp_args, 2, 0); jit_insn_return(function, temp3); |
In Tutorial 5, this is modified to the following:
jit_insn_call(function, "gcd", function, 0, temp_args, 2, JIT_CALL_TAIL); |
There is no need for the jit_insn_return
, because the call
will never return to that point in the code. Behind the scenes,
libjit
will convert the call into a jump back to the head
of the function.
Tail calls can only be used in certain circumstances. The source
and destination of the call must have the same function signatures.
None of the parameters should point to local variables in the current
stack frame. And tail calls cannot be used from any source function
that uses try
or alloca
statements.
Because it can be difficult for libjit
to determine when these
conditions have been met, it relies upon the caller to supply the
JIT_CALL_TAIL
flag when it is appropriate to use a tail call.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This libjit/dpas
directory contains an implementation of
"Dynamic Pascal", or "dpas" as we like to call it. It is provided
as an example of using libjit
in a real working environment.
We also use it to write test programs that exercise the JIT's capabilities.
Other Pascal implementations compile the source to executable form, which is then run separately. Dynamic Pascal loads the source code at runtime, dynamically JIT'ing the program as it goes. It thus has a lot in common with scripting languages like Perl and Python.
If you are writing a bytecode-based virtual machine, you would use a similar approach to Dynamic Pascal. The key difference is that you would build the JIT data structures after loading the bytecode rather than after parsing the source code.
To run a Dynamic Pascal program, use dpas name.pas
. You may also
need to pass the -I
option to specify the location of the system
library if you have used an import
clause in your program. e.g.
dpas -I$HOME/libjit/dpas/library name.pas
.
This Pascal grammar is based on the EBNF description at the following URL:
http://www.cs.qub.ac.uk/~S.Fitzpatrick/Teaching/Pascal/EBNF.html
There are a few differences to "Standard Pascal":
program Name (Input, Output);
. This can
be abbreviated to program Name;
as the program modifiers are ignored.
xor
, shl
, @
, etc
have been added.
Integer
, Cardinal
, LongInt
, etc)
follow those used in GNU Pascal also. The Integer
type is always
32-bits in size, while LongInt
is always 64-bits in size.
SysInt
, SysCard
, SysLong
, SysLongCard
,
SysLongestInt
, and SysLongestCard
are guaranteed to be the
same size as the underlying C system's int
, unsigned int
,
long
, unsigned long
, long long
, and
unsigned long long
types.
Address
is logically equivalent to C's void *
.
Any pointer or array can be implicitly cast to Address
. An explicit
cast is required to cast back to a typed pointer (you cannot cast back
to an array).
String
type is declared as ^Char
. Single-dimensional
arrays of Char
can be implicitly cast to any String
destination. Strings are not bounds-checked, so be careful. Arrays
are bounds-checked.
p[n]
will access the n'th
item of an unbounded array located at p
. Use with care.
file of
types. Data can be written to stdout
using Write
and WriteLn
, but that is the extent of
the I/O facilities.
import Name1, Name2, ...;
can be used at the head of a
program to declare additional files to include. e.g. import stdio
will
import the contents of stdio.pas
. We don't support units.
; ..
can be used at the end of a formal parameter list to
declare that the procedure or function takes a variable number of arguments.
The builtin function va_arg(Type)
is used to extract the arguments.
import("Library")
can be used to declare that a function
or procedure was imported from an external C library. For example, the
following imports the C puts
and printf
functions:
function puts (str : String) : SysInt; import ("libc") function printf (format : String; ..) : SysInt; import ("libc") |
Functions that are imported in this manner have case-sensitive names.
i.e. using Printf
above will fail.
throw
keyword can be used to throw an exception. The argument
must be a pointer. The try
, catch
, and finally
keywords are used to manage such exceptions further up the stack. e.g.
try ... catch Name : Type ... finally ... end |
The catch
block will be invoked with the exception pointer that was
supplied to throw
, after casting it to Type
(which must
be a pointer type). Specifying throw
on its own without an argument
will rethrow the current exception pointer, and can only be used inside a
catch
block.
Dynamic Pascal does not actually check the type of the thrown pointer.
If you have multiple kinds of exceptions, then you must store some kind
of type indicator in the block that is thrown and then inspect ^Name
to see what the indicator says.
exit
keyword can be used to break out of a loop.
XXH
. The first digit
must be between 0 and 9, but the remaining digits can be any hex digit.
(if e1 then e2 else e3)
.
The brackets are required. This is equivalent to C's e1 ? e2 : e3
.
return value;
in C. It isn't necessary to arrange for
execution to flow through to the end of the function as in regular Pascal.
sizeof(Type)
can be used to get the size of a type.
pointer to procedure/function
type.
[ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This document was generated by Klaus Treichel on May, 11 2008 using texi2html 1.78.