I have implemented a framework for efficient transformation of CPS code. The code is too big to be presented in a blog post; I have set up a github account to put it (https://github.com/mlemerre/l-lang/). It has been working for several months, but I have spent a long time to improve its structure, write commented interfaces, and document it to make it easily readable, as I have done with the previous modules. Do not hesitate to tell me about any comments you may have, on the code or documentation.
The code is based on the paper "Compiling with continuations, continued" by Andrew Kennedy (which is very well written and easy to read), itself inspired by "Shrinking lambda expressions in linear time", by Andrew W. Appel and Trevor Jim.
The CPS representation in Andrew Kennedy's paper provides many interesting features:
- It is efficiently compilable and can use a stack; see the translation of this representation to the SSA form of LLVM.
- The representation separates continuation from normal functions; this ensures that continuations do not require heap allocations and are compiled into jumps. This allows to express control flow, and control flow optimizations, in the representation.
- Appel and Jim, and Kennedy have developed a representation that
allows efficient (in-place) rewrite of terms while maintaining the
links between variables, their occurrences, and their binding sites.
This allows to implement shrinking reductions (and other
transformations, such as closure conversion) in linear time.
Shrinking reductions rules are very easy to understand, and can be used as a basis for expressing guaranteed optimizations. For instance, it should be easy to state that functions used only once are inlined, that tuples used only to pass information locally are not heap-allocated, etc.
The modules I have implemented provide means to access, print, or change terms in the CPS intermediary language. The main modules, are represented on the figure below.
The Base
module is the entry point for accessing the CPS
representation, and the first module if trying to understand my code.
It provides read-only direct access to the CPS representation, and to
the links between variables, occurrences, and their binding sites. It
also gives access to the other modules that implement the CPS
manipulation framework:
Ast
: Really a part ofBase
representation, it provides access to the CPS representation using simple algebraic datatypes of syntax tree.Check
: Allows checking for some invariants of terms in the CPS representation. Some information in the representation is redundant, which allows fast access; this module checks that redundant information is in sync. For instance, if a term contains a variable, it checks that the variable's uplink also points to that term.Build
: Provides functions that allows to create new terms in the CPS representation, without worrying about the complexities of the representation. The API of this module is based on the idea of higher-order abstract syntax, i.e. where binding creations in the destination language (L) language correspond to creation of bindings in the source language (OCaml).Traverse
: Allows folding and iteration on the terms, variables, and occurrences of the CPS representation. Using this module allows, in particular, code to be independent of future changes to the CPS data structures, which will be gradually improved.Change
: Provides high-level functions to change CPS terms. This is how transformation passes modify terms in the CPS representation.Print
: Provides a human-readable representation of the CPS form. I have tried to come up with a representation that is "easy" to read; the representation looks more like SSA and/or assembly than classical lambda-calculus (which make it much easier to read on huge terms). This textual representation should also be easily parsable (although I did not write the parser).Def
: Implements and provides accessors and a first level of abstraction to the CPS data structures. It relies onVar
, which implements the relationships between variables and their occurrences (itself based on theUnion_find
data structure described earlier on this blog).
The other modules on the figure are Union_find
and Unique
, which
are "support" modules; and Closure_conversion
and
Shrinking_reductions
, which are transformation passes on the CPS
representation. These two passes are not yet in the github repository.
I have a working closure conversion that gives me a complete basic working compiler for the L programming language, based on this CPS framework. I plan to document it and upload it to github very soon. I also have implemented some basic shrinking reductions.
Next I will concentrate on the parser and the L abstract syntax tree, and I will be will be covering the syntax and semantics of L in a future blog post. But here is already an excerpt of test L code (that uses first-class functions) that can be compiled to LLVM:
assert( { let true = { (x,y) -> x } let false = { (x,y) -> y } let pair = { (first,second) -> boolean -> boolean( first, second) } let first = { p -> p( true)} let second = { p -> p( false)} let p = pair( 7, 5) second( p) * (first( p) + second( p)) } == 60)
No comments:
Post a Comment