Compiling CPS in Guile has three phases: conversion, optimization, and code generation.
CPS conversion is the process of taking a higher-level language and
compiling it to CPS. Source languages can do this directly, or they can
convert to Tree-IL (which is probably easier) and let Tree-IL convert to
CPS later. Going through Tree-IL has the advantage of running Tree-IL
optimization passes, like partial evaluation. Also, the compiler from
Tree-IL to CPS handles assignment conversion, in which assigned local
variables (in Tree-IL, locals that are <lexical-set>
) are
converted to being boxed values on the heap. See Variables and the VM.
After CPS conversion, Guile runs some optimization passes over the CPS. Most optimization in Guile is done on the CPS language. The one major exception is partial evaluation, which for historic reasons is done on Tree-IL.
The major optimization performed on CPS is contification, in which
functions that are always called with the same continuation are
incorporated directly into a function’s body. This opens up space for
more optimizations, and turns procedure calls into goto
. It can
also make loops out of recursive function nests. Guile also does dead
code elimination, common subexpression elimination, loop peeling and
invariant code motion, and range and type inference.
The rest of the optimization passes are really cleanups and canonicalizations. CPS spans the gap between high-level languages and low-level bytecodes, which allows much of the compilation process to be expressed as source-to-source transformations. Such is the case for closure conversion, in which references to variables that are free in a function are converted to closure references, and in which functions are converted to closures. There are a few more passes to ensure that the only primcalls left in the term are those that have a corresponding instruction in the virtual machine, and that their continuations expect the right number of values.
Finally, the backend of the CPS compiler emits bytecode for each function, one by one. To do so, it determines the set of live variables at all points in the function. Using this liveness information, it allocates stack slots to each variable, such that a variable can live in one slot for the duration of its lifetime, without shuffling. (Of course, variables with disjoint lifetimes can share a slot.) Finally the backend emits code, typically just one VM instruction, for each continuation in the function.