2012/12/30

A framework for CPS transformation (and a Github account)

I have implemented a framework for efficient transformation of CPS code. The code is too big to be presented in a blog post; I have set up a github account to put it (https://github.com/mlemerre/l-lang/). It has been working for several months, but I have spent a long time to improve its structure, write commented interfaces, and document it to make it easily readable, as I have done with the previous modules. Do not hesitate to tell me about any comments you may have, on the code or documentation.

The code is based on the paper "Compiling with continuations, continued" by Andrew Kennedy (which is very well written and easy to read), itself inspired by "Shrinking lambda expressions in linear time", by Andrew W. Appel and Trevor Jim.

The CPS representation in Andrew Kennedy's paper provides many interesting features:

  • It is efficiently compilable and can use a stack; see the translation of this representation to the SSA form of LLVM.
  • The representation separates continuation from normal functions; this ensures that continuations do not require heap allocations and are compiled into jumps. This allows to express control flow, and control flow optimizations, in the representation.
  • Appel and Jim, and Kennedy have developed a representation that allows efficient (in-place) rewrite of terms while maintaining the links between variables, their occurrences, and their binding sites. This allows to implement shrinking reductions (and other transformations, such as closure conversion) in linear time.

    Shrinking reductions rules are very easy to understand, and can be used as a basis for expressing guaranteed optimizations. For instance, it should be easy to state that functions used only once are inlined, that tuples used only to pass information locally are not heap-allocated, etc.

The modules I have implemented provide means to access, print, or change terms in the CPS intermediary language. The main modules, are represented on the figure below.

The Base module is the entry point for accessing the CPS representation, and the first module if trying to understand my code. It provides read-only direct access to the CPS representation, and to the links between variables, occurrences, and their binding sites. It also gives access to the other modules that implement the CPS manipulation framework:

  • Ast: Really a part of Base representation, it provides access to the CPS representation using simple algebraic datatypes of syntax tree.
  • Check: Allows checking for some invariants of terms in the CPS representation. Some information in the representation is redundant, which allows fast access; this module checks that redundant information is in sync. For instance, if a term contains a variable, it checks that the variable's uplink also points to that term.
  • Build: Provides functions that allows to create new terms in the CPS representation, without worrying about the complexities of the representation. The API of this module is based on the idea of higher-order abstract syntax, i.e. where binding creations in the destination language (L) language correspond to creation of bindings in the source language (OCaml).
  • Traverse: Allows folding and iteration on the terms, variables, and occurrences of the CPS representation. Using this module allows, in particular, code to be independent of future changes to the CPS data structures, which will be gradually improved.
  • Change: Provides high-level functions to change CPS terms. This is how transformation passes modify terms in the CPS representation.
  • Print: Provides a human-readable representation of the CPS form. I have tried to come up with a representation that is "easy" to read; the representation looks more like SSA and/or assembly than classical lambda-calculus (which make it much easier to read on huge terms). This textual representation should also be easily parsable (although I did not write the parser).
  • Def: Implements and provides accessors and a first level of abstraction to the CPS data structures. It relies on Var, which implements the relationships between variables and their occurrences (itself based on the Union_find data structure described earlier on this blog).

The other modules on the figure are Union_find and Unique, which are "support" modules; and Closure_conversion and Shrinking_reductions, which are transformation passes on the CPS representation. These two passes are not yet in the github repository.

I have a working closure conversion that gives me a complete basic working compiler for the L programming language, based on this CPS framework. I plan to document it and upload it to github very soon. I also have implemented some basic shrinking reductions.

Next I will concentrate on the parser and the L abstract syntax tree, and I will be will be covering the syntax and semantics of L in a future blog post. But here is already an excerpt of test L code (that uses first-class functions) that can be compiled to LLVM:

assert( { let true = { (x,y) -> x }
          let false = { (x,y) -> y }
          let pair = { (first,second) -> boolean -> boolean( first, second) }
          let first = { p -> p( true)}
          let second = { p -> p( false)}
          let p = pair( 7, 5)
          second( p) * (first( p) + second( p)) } == 60)

2012/12/20

Using OCaml packages with ocamlbuild: a recipe

Using OCaml packages with ocamlbuild: a recipe

Using OCaml packages with ocamlbuild can raise different error messages that are difficult to trace; there is a documentation ( http://brion.inria.fr/gallium/index.php/Ocamlbuild_and_module_packs ) that helps, but does not provide a step-by-step guide with common pitfalls, so here is one.

Compilation of bytecode files

First create a project with the following files:

File: plu/bla.ml

let bla = 3;;

File: plu/bli.ml

let bli = Bla.bla;;

File: plu.mlpack

plu/Bla
plu/Bli

File: use.ml

Plu.Bli.bli

Note: all the files do not have to be in a directory named plu/; this is just my convention.

Try to compile:

ocamlbuild use.byte
/usr/bin/ocamlc -c -o use.cmo use.ml
+ /usr/bin/ocamlc -c -o use.cmo use.ml
File "use.ml", line 1, characters 0-11:
Error: Unbound module Plu
Command exited with code 2.

Compilation will not work, probably because by default Ocamlbuild does not traverse directories. It will work if you use the -r option to ocamlbuild, or create a (even empty) _tags file at the root of the project, that will allow ocamlbuild to traverse the directories:

ocamlbuild use.byte
/usr/bin/ocamldep -modules plu/bla.ml > plu/bla.ml.depends
/usr/bin/ocamldep -modules plu/bli.ml > plu/bli.ml.depends
/usr/bin/ocamlc -c -I plu -o plu/bla.cmo plu/bla.ml
/usr/bin/ocamlc -c -I plu -o plu/bli.cmo plu/bli.ml
/usr/bin/ocamlc -pack plu/bla.cmo plu/bli.cmo -o plu.cmo
/usr/bin/ocamlc -c -o use.cmo use.ml
/usr/bin/ocamlc plu.cmo use.cmo -o use.byte

Notice that ocamlbuild has added -I plu when compiling modules of the Plu package; so for instance, Bli can access Bla.

This odd behaviour can be the cause of many headaches, so I prefer to document it here (and I filed a bug report about this issue here).

Also note that if you make a mistake in the plu.mlpack file, for instance an error in the name of the directory or in that of the module, ocamlbuild will fail without much warning. It will fail with a compile error, as before; it may also fail at the linking step, if you make a mistake in the plu.mlpack file, but you had successfully compiled a previous version. This problem can also be painful to track, so I hope this help someone.

ocamlbuild use.byte
/usr/bin/ocamlc use.cmo -o use.byte
+ /usr/bin/ocamlc use.cmo -o use.byte
File "_none_", line 1, characters 0-1:
Error: Error while linking use.cmo:
Reference to undefined global `Plu'
Command exited with code 2.

Adding native compilation

You cannot pack a set of .cmx files (resulting from compilation of a file) if you did not specified that the .cmx can be packed when you compile the file. Failure to do that result in an error like this:

ocamlbuild use.native
/usr/bin/ocamldep -modules use.ml > use.ml.depends
/usr/bin/ocamldep -modules plu/bla.ml > plu/bla.ml.depends
/usr/bin/ocamldep -modules plu/bli.ml > plu/bli.ml.depends
/usr/bin/ocamlc -c -I plu -o plu/bla.cmo plu/bla.ml
/usr/bin/ocamlc -c -I plu -o plu/bli.cmo plu/bli.ml
/usr/bin/ocamlc -pack plu/bla.cmo plu/bli.cmo -o plu.cmo
/usr/bin/ocamlc -c -o use.cmo use.ml
/usr/bin/ocamlopt -c -I plu -o plu/bla.cmx plu/bla.ml
/usr/bin/ocamlopt -c -I plu -o plu/bli.cmx plu/bli.ml
touch plu.mli  ; if  /usr/bin/ocamlopt -pack -I plu plu/bla.cmx plu/bli.cmx -o plu.cmx  ; then  rm -f plu.mli  ; else  rm -f plu.mli  ; exit 1; fi
+ touch plu.mli  ; if  /usr/bin/ocamlopt -pack -I plu plu/bla.cmx plu/bli.cmx -o plu.cmx  ; then  rm -f plu.mli  ; else  rm -f plu.mli  ; exit 1; fi
File "plu.cmx", line 1, characters 0-1:
Error: File plu/bla.cmx
was not compiled with the `-for-pack Plu' option
Command exited with code 1.

The lines:

/usr/bin/ocamlopt -c -I plu -o plu/bla.cmx plu/bla.ml
/usr/bin/ocamlopt -c -I plu -o plu/bli.cmx plu/bli.ml

Need to be compiled with the -for-pack Plu option. This is achieved by a simple change in the _tags file: just fill it with

<plu/*.cmx>: for-pack(Plu)

And now everything works:

ocamlbuild use.native
/usr/bin/ocamldep -modules use.ml > use.ml.depends
/usr/bin/ocamldep -modules plu/bla.ml > plu/bla.ml.depends
/usr/bin/ocamldep -modules plu/bli.ml > plu/bli.ml.depends
/usr/bin/ocamlc -c -I plu -o plu/bla.cmo plu/bla.ml
/usr/bin/ocamlc -c -I plu -o plu/bli.cmo plu/bli.ml
/usr/bin/ocamlc -pack plu/bla.cmo plu/bli.cmo -o plu.cmo
/usr/bin/ocamlc -c -o use.cmo use.ml
/usr/bin/ocamlopt -c -for-pack Plu -I plu -o plu/bla.cmx plu/bla.ml
/usr/bin/ocamlopt -c -for-pack Plu -I plu -o plu/bli.cmx plu/bli.ml
touch plu.mli  ; if  /usr/bin/ocamlopt -pack -I plu plu/bla.cmx plu/bli.cmx -o plu.cmx  ; then  rm -f plu.mli  ; else  rm -f plu.mli  ; exit 1; fi
/usr/bin/ocamlopt -c -o use.cmx use.ml
/usr/bin/ocamlopt plu.cmx use.cmx -o use.native

I will update this post if/when I find more weird error messages using packages with ocamlbuild; or if you find such error, just tell me!

Update: linking problem when refering to modules outside of the package

I discovered a new problem with ocamlbuild. The problems occur when you put code in a library which is included, and referenced only inside of a package; in that case ocamlbuild forgets to link with the library. The problem is well described in this mail. The author of that mail also provides a ocamlbuild plugin that works around the problem.

This seems to be a classical bug, and there is a pending bug report for it here.

2012/10/07

A literate union-find data structure

I have a working closure conversion done using the purely functional CPS data structure presented in my earlier post, but it is somewhat hackish. Thus I am trying to improve it, following Andrew Kennedy's excellent paper "Compiling with continuations, continued".

Kennedy's CPS structure requires a union-find data structure, used to merge the occurences of a variable, and find the binding site of an occurence. I already had a union-find data structure, used to implement first-order unification in type inference, but as is usual a module becomes good on the second time you write it (when you have more experience about its implementation and usage).

There are many possible variations in the interface of a union-find module. The particularities of this one is explicit support for attaching description to sets, and a "partition" type separate from the "element" type. Also, the interface is functorial, and I provided two versions: a Safe one that checks that usage of the structure is correct, and a Fast one with no check. It is easy to shoot yourself in the foot by using this module incorrectly, so using the Safe one is probably a better bet.

So here they are.

Interface for module Union_find

1.  A union-find data structure maintains a partition of elements into disjoint sets.

It allows to add new elements in new partitions, perform the union of two partitions, and retrieve the partition in which is an element. Moreover it allows to attach a description to a partition, which is generally the point of using such a structure.

This module has side effects: adding an element to a union-find data structure changes that element, and the union operation merges the partitions destructively. This make it easy to use this module incorrectly. To that end, a number of protections (using types and dynamic checks) are set that detect such incorrect uses of the module.

Note on the name: there are other data structures that maintain disjoint sets with other operations, such as partition refinement, so "union-find" is a more accurate name for this data structure than "disjoint set".

module type S =
sig

   The t type represents the whole union-find data structure. A partition always belong to some t; elements belong to a t once there has been a "singleton" operation on them.

All the functions (except create) take a t argument; in their safe version this argument is used to checks that other element and partition arguments indeed belong to the t argument.

   type t
   type partition
   type element
   type description

   create() returns a new empty union find data structure.
   val create : unit → t

   singleton t e d adds a new element e to t, and create and returns a new partition p in t, such that e is the only element of p. It also attach the description d to p.

The safe version checks that e was not previously added to another union-find data structure (with the same link).

   val singleton : t → element → description → partition

   find t e returns the partition p of t that contains e.
   val find : t → element → partition

   union t p1 p2 d creates a new partition p3, with description d, that contains the union of all the elements in p1 and p2. The p1 and p2 arguments are consumed, i.e. must not be used after they were passed to union. p1 and p2 must be different partitions.
   val union : t → partition → partition → description → partition

   description t p returns the description associated to p.
   val description : t → partition → description

   description t p changes the description associated to p.
   val set_description : t → partition → description → unit
end
2.  We defined two types of "union-find makers": Fast and Safe. Both propose a link type, and each element of a union-find structure must be "associated" to one different link (generally the link is a mutable field in the element type). Initially, the link value is empty_link.

The Make functor, once told how access the link of an element, returns a module complying to S. Below we given an exemple of usage.

Note: It is possible for an element to be present in two different union-find data structures; it must just have different links.

If the link in an element must be re-used for another union-find data structure, then it must be set to empty_link, and one must stop using the union-find data structure that contained the element (even with other elements).

module type UNION_FIND = sig
   type (α, β) link
   val empty_link:(α,β) link

   module type LINK =
   sig
     type element
     type description
     val get : element → (description, element) link
     val set : element → (description, element) link → unit
   end

   module Make(Link : LINK):S with type description = Link.description and type element = Link.element
end

The difference between the fast and safe version is that safe performs additional checks. The performance difference is small, so the Safe version should be prefered.
module Fast:UNION_FIND

module Safe:UNION_FIND

3.  Exemple of usage:

type test = { x:intmutable z:(string, test) Union_find.Safe.link };;

module Test = struct  type description = string  type element = test  let get_link t = t.z  let set_link t z = t.z ← z  end

module A = Union_find.Safe.Make(Test)

let uf = A.create() in  let elt1 = {x=1; z=Safe.empty_link} in  let part1 = A.singleton t elt1 "1" in  assert(A.description t (A.find t elt1) = "1")

Module Union_find

1.  We represent each disjoint set by a tree : elements are in the same set than the element that they point to.

The root of the tree is the representative of the set, and corresponds to elements of type partition. It points to a "partition descriptor".

type (α,β) baselink = 
   ∣ Partition_descriptor of α partition_descriptor
   ∣ Parent of β

The partition descriptor contains the user-accessible description, and a rank, used to optimize the union operation.

Note that the partition descriptor is not accessible by the users of the module, and the interface make it so that there can be only one link to the partition descriptor (from the representative). This allows to update the partition descriptor destructively.

and α partition_descriptor = { mutable rank:rank; mutable desc:α }

The rank of a partition is is a majorant of the distance of its elements to the root (path compression makes so that the height of the tree can be lower than the rank). The union operation minimizes the rank, and thus the height of the tree.
and rank = int
2.  The implementation is parametrized by the safety checks that we perform (which differs between the Fast and Safe modules).

The safe module identifies all union-find data structures by a unique id, embed that in the links, and checks for all operation that they are equal. It also checks initialization of the link.

module type SAFETY = sig
   type t
   val create: unit → t
   type (α,β) link

   (∗ Create a safe link from a baselink. ∗)
   val securize: t → (α,β) baselink → (α,β) link

   (∗ Returns the base_link from the safe link. ∗)
   val get_base: (α,β) link → (α,β) baselink

   (∗ Check the safe link withat the element (and the safe link) belong to t. ∗)
   val check_membership: t → (α,β) link → unit

   (∗ Check that the element is not yet part of any union find. ∗)
   val check_unused: (α,β) link → unit

   (∗ Initial link. ∗)
   val empty_link: (α,β) link
end

module No_safety:SAFETY = struct
   type t = unit
   let create() = ()
   type (α,β) link = (α,β) baselink

   let securize () l = l

   let check_membership () l = ()

   let check_unused l = ()

   let get_base l = l

   (∗ Note: This cast can make the execution fail without notice. ∗)
   let empty_link = Obj.magic 0 
end

type unique = int

module Unique = Unique.Make(struct end)

module Safety:SAFETY = struct
   type t = Unique.t
   let create() = Unique.fresh()

   type (α,β) link = t option × (α,β) baselink

   let securize u l = (Some u,l)

   let check_membership t (u,_) = 
     (match u with
       ∣ Some(a) → assert (t ≡ a) (∗ The element is in another union-find structure. ∗)
       ∣ None → assert false); () (∗ The element is in no union-find structure. ∗)

   let check_unused (u,_) = 
     (match u with
       ∣ Some(_) → assert false (∗ The element is already in a union-find structure. ∗)
       ∣ None → ())

   let get_base (_,l) = l

   (∗ Note: The cast is not dangerous, because the left-hand part is checked first. ∗)
   let empty_link = (NoneObj.magic 0) 
end

3.  The goal of the below "double functor" is to produce a module with the following signature. In it, partition and element are actually the same underlying type; the difference is that elements returned with type partition are the root of the tree). Hiding this in the interface provides some guarantee that arguments of type partition are the representative of their partition.

Unfortunately, after calling union on two partitions p1 and p2, one of them will stop being the root; that is why the partition arguments of union must not be re-used. Thus, defining the partition type only guarantees that the argument has been a root in the past, and we ensure that by a dynamic test.

module type S = sig
       type t
       type partition
       type element
       type description
       val create: unit → t
       val singleton : t → element → description → partition
       val find : t → element → partition
       val union: t → partition → partition → description → partition
       val description: t → partition → description
       val set_description : t → partition → description → unit
end

This is a double functor with two arguments; Saf allows to differenciate the "Fast" and "Safe" modules, while Link is used to find and change the link.
module Make(Saf:SAFETY):UNION_FIND = 
struct

   type (α,β) link = (α,β) Saf.link

   let empty_link = Saf.empty_link

   module type LINK = sig
     type element
     type description
     val get: element → (description, element) link 
     val set: element → (description, element) link → unit
   end

   module Make(LinkLINK) =
   struct
     type t = Saf.t
     type element = Link.element
     type description = Link.description
     type partition = Link.element

     let create = Saf.create

4.  singleton is the only way to add new elements to the union-find structure, and is the place where we check that the element is not part of another structure.
     let singleton t elt desc = 
       let l = (Link.get elt) in
       Saf.check_unused l;
       Link.set elt (Saf.securize t (Partition_descriptor {rank=0;desc=desc}));
       elt 
5.  Basically, find just walks the tree until it finds the root.

But performance is increased if the length of the path is diminished: traversed nodes are linked to nodes that are closer to the roof. The possibility we have implemented is path compression: when the root is found, the elements are changed to link to the it, so that subsequent calls are faster. We implemented a tail-recursive version of this algorithm (which still requires two pass).

Note: there are alternatives to path compression, such that halving; but in Tarjan’s structure the root is linked to itself, which is not the case here, so halving would require more checks than in Tarjan’s version. Thus we stick with path compression.

Note: we could perform a lighter check in the safe version by checking only the argument, and not all recursive calls; this is probably not worth implementing it, and the heavy check has its uses.

     let find t x = 
       (∗ Tail-recursive function to find the root of the algorithm. ∗)
       let rec find x = 
         let l = (Link.get x) in
         Saf.check_membership t l;
         match Saf.get_base l with
           ∣ Partition_descriptor(s) → x
           ∣ Parent(y) → find y in
       (∗ This is also tail-recursive, but we do not perform the checks the second time. ∗)
       let rec compress x r = 
         let l = (Link.get x) in
         match Saf.get_base l with
           ∣ Partition_descriptor(s) → ()
           ∣ Parent(y) → Link.set x (Saf.securize t (Parent r)) in
       let root = find x in
       compress x root;
       root
6.  The following functions work only when the given element is the root of a partition, but check that.
     let get_partition_descriptor t p = 
       let l = (Link.get p) in
       Saf.check_membership t l;
       match Saf.get_base l with
         ∣ Partition_descriptor(s) → s
         ∣ _ → assert(false(∗ The element is not a partition. ∗)

     let description t x = (get_partition_descriptor t x).desc

     let set_description t x desc = 
       let pd = get_partition_descriptor t x in
       pd.desc ← desc

7.  This function performs the union of two partitions. We use rank to find which should be the root : we attach the smaller tree to the root of the larger tree, so as not to increase the maximum height (i.e. path length) of the resulting tree.

The last argument allows to update the set descriptor along with this operation.

Note that this function takes partitions as argument; one could have instead taken any element, and performed the find inside the function; in particular some efficient algorithms interleave the find and union operations. The reason why we take partition arguments is that it avoids a find when we know that the argument is a partition (for instance when merging with a just-created singleton), and the user needs to perform a find to retrieve and merge the description in the algorithms we use (such as unification).

     let union t p1 p2 newdesc =

       This function also checks that p1 and p2 are partitions.
       let d1 = get_partition_descriptor t p1 in
       let d2 = get_partition_descriptor t p2 in

       Alternatively, the check that p1 and p2 are different could have been done here.
       assert (p1 ≠ p2);
       if( d1.rank < d2.rank) then 
         begin 
           (∗ Keep d2_repr as root. Height of the merge is max(d1_height +1, d2_height) so does not change. ∗)
           Link.set p1 (Saf.securize t (Parent p2));
           d2.desc ← newdesc;
           p2
         end
       else if (d1.rank > d2.rank) then
         begin 
           (∗ Keep d1_repr as root. Height of the merge is max(d2_height +1, d1_height) so does not change. ∗)
           Link.set p2 (Saf.securize t (Parent p1));
           d1.desc ← newdesc;
           p1
         end
       else 
         begin
           (∗ We choose arbitrarily p1 to be the root. The height may have changed, as all elements in the subset with root p2 are 1 step further to the root. ∗)
           Link.set p2 (Saf.securize t (Parent p1));
           d1.rank ← d1.rank + 1; d1.desc ← newdesc;
           p1
         end
   end

end

8.  The double-functor is not shown in the exposed interface, and we only export the following, simpler modules.
module Fast=Make(No_safety)

module Safe=Make(Safety)

9.  For a survey of the implementations of union-find algorithms, one should read "Worst-Case Analysis of Set Union Algorithms", by Tarjan and Van Leeuwen.

Recent performance comparison of these algorithms (and modern enhancements) can be found in "Experiments on Union-Find Algorithms for the Disjoint-Set Data Structure", by Md. Mostofa Ali Patwary, Jean Blair, Fredrik Manne.

2012/08/25

CPS to LLVM SSA conversion in literate programming

My L compiler's toolchain is now complete, in that every necessary transformation pass is here. The various passes are parsing, macro-expansion, type checking and inference, CPS transformation, closure conversion, and compilation to LLVM instructions.

Most of the passes are still simple, and a lot of work remains to obtain something usable. For instance I do not propagate informations about locations, so typing error does not explain where the error is. All values, including integers, are boxed, allocated with malloc and never freed; and L code cannot call external C functions. The CPS transformations are not very efficient, and do not carry type informations. These are the points I am going to improve next.

However having a complete toolchain is nice: it gives a complete overview so now I know how changing a pass can benefit to both the above and below layers.

The nice thing about the passes being simple is that they are easy to understand, so this is a good opportunity to publish the code. To further improve the comprehension, I have decided for the last pass I wrote, which is the transformation from CPS to LLVM, to give a try at literate programming. It basically consists in writing your code in the manner of a text book.

There is a nice tool to do literate programming in ocaml, named ocamlweb. It allows to write the literate parts in standard comments, so that the Ocaml files can either be compiled or transformed into a document. The default HTML output of ocamlweb (based on HEVEA is not very nice however, but some configuration allows to improve it. Here is is mine, that I put in a file heveaprefix.tex. This file changes the HTML output of the code parts of OcamlWeb to look like the HTML output of source code in Emacs Org-mode (to maintain consistency with this blog).

%% Note: The colors code are those of Emacs org-mode output (which I
%% think just put those of Emacs).

%% This makes \url links as clickables urls.
\input{urlhref.hva}

%% Big code blocks.
\renewcommand{\ocwbegincode}{%
\begin{rawhtml}
<div style="border: solid 1px gray; background:#eeeeee;
            margin: 0.5em 1em 0.5em 1em;
            padding: 0.5em 1em 0.5em 1em;
            font-family:mono"><code>
\end{rawhtml}}

\renewcommand{\ocwendcode}{\begin{rawhtml}</code></div>\end{rawhtml}}

%% Inline code blocks inside comments (given with [])
\renewcommand{\ocwbegindcode}{\begin{rawhtml}<code>\end{rawhtml}}
\renewcommand{\ocwenddcode}{\begin{rawhtml}</code>\end{rawhtml}}

%% Keywords. We distinguish some keywords (those that ``create''
%% something, and begin, in blue). We rely on HEVEA native support for
%% the ifthen package.
\newcommand{\spanpurple}[1]{%
\begin{rawhtml}<span style="color: #a020f0; ">\end{rawhtml}#1%
\begin{rawhtml}</span>\end{rawhtml}}

\newcommand{\spanred}[1]{%
\begin{rawhtml}<span style="color: #a52a2a; ">\end{rawhtml}#1%
\begin{rawhtml}</span>\end{rawhtml}}

\newcommand{\spanboldblue}[1]{%
\begin{rawhtml}<span style="color: #0000ff; font-weight: bold;">\end{rawhtml}#1%
\begin{rawhtml}</span>\end{rawhtml}}

\renewcommand{\ocwkw}[1]{%
\ifthenelse{\equal{#1}{let}}{\spanboldblue{#1}}{%
\ifthenelse{\equal{#1}{and}}{\spanboldblue{#1}}{%
\ifthenelse{\equal{#1}{rec}}{\spanboldblue{#1}}{%
\ifthenelse{\equal{#1}{in}}{\spanboldblue{#1}}{%
\ifthenelse{\equal{#1}{type}}{\spanboldblue{#1}}{%
\ifthenelse{\equal{#1}{of}}{\spanred{#1}}{%
\ifthenelse{\equal{#1}{open}}{\spanboldblue{#1}}{%
\ifthenelse{\equal{#1}{struct}}{\spanboldblue{#1}}{%
\ifthenelse{\equal{#1}{sig}}{\spanboldblue{#1}}{%
\ifthenelse{\equal{#1}{functor}}{\spanboldblue{#1}}{%
\ifthenelse{\equal{#1}{module}}{\spanboldblue{#1}}{%
\ifthenelse{\equal{#1}{val}}{\spanboldblue{#1}}{%
\ifthenelse{\equal{#1}{begin}}{\spanboldblue{#1}}{%
\ifthenelse{\equal{#1}{end}}{\spanboldblue{#1}}{%
\spanpurple{#1}}}}}}}}}}}}}}}}


%% Ids that begin in lower case. The textrm command (note: Hevea does
%% not know about mathrm) allows non-italic typesetting. We also
%% consider failwith as a keyword (even if it is a function that calls
%% raise).
\renewcommand{\ocwlowerid}[1]{%
\ifthenelse{\equal{#1}{failwith}}{\spanpurple{\textrm{#1}}}{%
\textrm{#1}}}

%% Ids that begin in upper case.
\newcommand{\spangreen}[1]{%
\begin{rawhtml}<span style="color: #228b22; ">\end{rawhtml}#1%
\begin{rawhtml}</span>\end{rawhtml}}

\renewcommand{\ocwupperid}[1]{\spangreen{\textrm{#1}}}

%% Comments are type set in red, with the leading (* and closing *).
\renewcommand{\ocwbc}{\begin{rawhtml}<span style="color: #b22222">(&#X2217; \end{rawhtml}}
\renewcommand{\ocwec}{\begin{rawhtml} &#X2217;)</span>\end{rawhtml}}

%% Strings are typeset in brown.
\newcommand{\spanbrown}[1]{%
\begin{rawhtml}<span style="color: #8b2252; ">\end{rawhtml}#1%
\begin{rawhtml}</span>\end{rawhtml}}
\renewcommand{\ocwstring}[1]{\spanbrown{\textrm{#1}}}

%% Base types and type variables are in green.
\renewcommand{\ocwbt}[1]{\spangreen{\textrm{#1}}}
\renewcommand{\ocwtv}[1]{\spangreen{#1e}}

The compilation command I use to perform the Ocaml to HTML transformation is:

ocamlweb -p "\usepackage{hevea}\usepackage{url}" --no-index heveaprefix.tex cps/cpsbase.ml llvm/cpsllvm.mli llvm/cpsllvm.ml > web/cpsllvm.tex \ 
&& cd web && hevea -I /usr/share/texmf/tex/latex/misc ocamlweb.sty cpsllvm.tex 

Below is the result (no need to explain it since this is literate programming! :))

Update: Apparently editing the post with blogger's editor mixes up the HTML, but this should be fixed now.

Module Cpsbase

1.  These definitions originates from the "compiling with continuations, continued" paper, by Andrew Kennedy (we currently use the simplified, non-graph version).

CPS (for continuation passing style) puts constraints on functional programs so that a function f never returns; instead it is passed a continuation k, which is a function that represents what is executed on f has finished its execution. So instead of returning a value x, f "returns" by calling k(x). CPS style makes returning from functions, and more generally control flow, explicit, at the expense or more verbosity.

This file presents a particular representation of CPS terms that separates continuations, calling a continuations, variables holding continations from respectively normal functions, normal function calls, and normal variables. This distinction allows to compile the CPS program using a stack (see the Cpsllvm module for an implementation of that).

The representation also forces all values (including constants such as integers) to be held in variables, which simplify later transformation algorithms.


2.  We define variables and continuation variables a unique, to avoid any need for alpha conversion.

module UniqueCPSVarId = Unique.Make(struct end)

module UniqueCPSContVarId = Unique.Make(struct end)

type var = Var of UniqueCPSVarId.t
type contvar = ContVar of UniqueCPSContVarId.t

Many algorithms use sets and maps of variables and continuation variables.
module VarMap = Map.Make(struct
   type t = var
   let compare = compare
end)

module VarSet = Set.Make(struct
   type t = var
   let compare = compare
end)

module ContVarMap = Map.Make(struct
   type t = contvar
   let compare = compare
end)

module ContVarSet = Set.Make(struct
   type t = contvar
   let compare = compare
end)

3.  Values are primitive objects, held in continuation variables.
type value = 
   ∣ Void 
   ∣ Constant of Constant.t
   ∣ Tuple of var list 
   ∣ Lambda of contvar × var × term
4.  The representation of CPS terms separates continuations from usual functions. The various terms are:

  • let x = value; body creates a binding to a primitive value, or to the result of a primitive operation (to be used in body)
  • let k(x) = t; body creates a binding to a continuation k. x is bound in t, but not in body. The k continuation variable is bound both in body and t (this allows loops).
  • k(x) calls the continuation k with x. It can be seen as a "jump with argument x"
  • v(k,x) calls the function v, k being the return continuation, and x a parameter. v does not return; instead it will call k with the "return value" as a parameter.
  • halt(x) is used only as a base case, to stop induction. Its semantics is that it returns the value x, which is the result of the computation, to the caller.


and term = 
   ∣ Let_value of var × value × term
   ∣ Let_primop of var × primitive_operation × term
   ∣ Let_cont of contvar × var × term × term
   ∣ Apply_cont of contvar × var
   ∣ Apply of var × contvar × var
   ∣ Halt of var
5.  Primitive operations return a value. The various operations do not take values as parameters (even constants such as int), only variables: the representation forces all values to be bound in a variable. This allows a uniform treatment that helps transformation passes.

The various operations are:

  • x[i] get the ith element out of x. x is a variable bound to a tuple.
  • x1 op x2 applies binary op to two arguments.

Note that there are no primitive that would allow to write let x = y, where y is a variable; thus there cannot be two variables that directly share the same value.


and primitive_operation = 
   ∣ Projection of var × int
   ∣ Integer_binary_op of Constant.integer_binary_op × var × var

Interface for module Cpsllvm

6.  This module translates CPS representation to the LLVM IR. CPS terms must observe that

  • Functions do not have free (unbound) variables or continuation variables (use closure conversion to get rid of free variables in functions)
  • Constants functions (such as +,−) have been η-expanded, and translated to the use of CPS primitive operations.


7.  All translations are done using Llvm.global_context(), and in a single Llvm module named the_module.

val the_module : Llvm.llmodule
8.  build_nodef name expr builds an expr, an expression in CPS form that is not part of a function, (for instance if it was typed in the interactive prompt). It is translated to a Llvm function that take no argument, named name.
val build_nodef : string → Cpsbase.term → Llvm.llvalue

Module Cpsllvm

9.  This module translates a term written in CPS representation to LLVM instructions in SSA form.

The CPS representations stems from the paper "Compiling with continuations, continued" by Andrew Kennedy. In particular this representation separates continuations from standard lambda functions, which allows calling and returning from functions using the normal stack.

This module assumes that functions have no free variables (or continuation variables). Closure conversion removes free variables from functions. Free continuation variables should never happen when translating normal terms to CPS.

The module also assumes that the CPS values do not refer to primitive operations, such as +,-,*,/. Previous passes must transform calls to primitive operations to let x = primitive(args); and η-expand primitive operations passed as functions (e.g. let x = f() must have been transformed).

To keep things simple in this first version, no external functions is called (only lambdas defined in the body of the expression, and primitive operations, can be called).

In addition, all data is boxed, allocated using malloc (and never freed; this could be improved by using libgc). Unboxed data would requires to carry typing information in the CPS terms.
10.  To get an overview of the translation algorithm, the best is to understand how the CPS concepts are mapped to the SSA concepts. In the following, we denote by [x] the translation of x.

  • Lambda are translated to LLVM functions with one argument and one return value.
  • Other values (i.e. int, floats, and tuples) are all translated boxed. Thus they all have a single llvm type, which is i8 *.
  • A CPS variable x is mapped to a SSA variables (of type Llvm.llvalue). CPS variables are introduced as arguments to lambda and continuations, and in the let x = ... form.
  • A CPS continuation variable k introduced by λ k. x. t corresponds to the return from the lambda. A call k(y) to this continuation with a value y is translated to a "ret" instruction returning the translation of y.
  • A CPS continuation variable k introduced by let k(x) = t1; t2 is mapped to the SSA basic block [t1] (of type Llvm.basicblock). The x formal argument of k corresponds to a phi node at the start of [t1]. A call k( y to this continuation with a value y is translated to a "jmp" instruction to the basic block [t1], that binds [y] to the phi node at the start of [t1].
  • A call f( k, x) of a regular (non-continuation) function f with first argument being a continuation variable argument k and second argument being a variable v is translated to a call to [f] with argument [x], followed by the translation of k( r), with r being the value returned by the call to f. This is because after calling a function in the LLVM SA, the control is returned to the following instruction. LLVM optimization passes like simplifycfg can optimize this if needed. Note: this allows tail call optimizations http://llvm.org/docs/CodeGenerator.html#tail-calls to take place.
  • Primitive operations, such as let x = primitive(args) are translated to the corresponding LLVM operations.

Note that the SSA representation are well-formed only if "the definition of a variable %x does not dominate all of its uses" (http://llvm.org/docs/LangRef.html#introduction). The translation from a CPS term (without free variables) ensures that.
11.  Here is a simplified example of how the translation from CPS to SSA works.

The CPS code:

  let v = 3;
  let k(x) = k(2+x);
  k(11)  

Is translated to SSA (ignoring boxing):

  entry: 
    v = 3
    n_ = 11
    jmp k

  k:
    x = phi (entry n_) (k o_)
    m_ = 2 
    o_ = m_ + x
    jmp k 

This shows how k is translated to a separate basic block, and the argument x to a phi node connected to all the uses of k.


12.  If one encounters segmentation faults when changing the LLVM related code, this may be caused by:

  • Calling Llvm.build_call on a value which does not have the function lltype, or Llvm.build_gep with operations that do not correspond to the lltype of the value.
  • Calling build_phi with an empty list of "incoming".
  • Calling ExecutionEngine.create the_module before calling Llvm_executionengine.initialize_native_target() can also segfault.

Using valgrind or gdb allows to quickly locate the problematic Ocaml Llvm binding.


let context = Llvm.global_context()

let the_module = Llvm.create_module context "my jitted module"

let void_type = Llvm.void_type context

let i32_type = Llvm.i32_type context

let i32star_type = Llvm.pointer_type i32_type

let anystar_type = Llvm.pointer_type (Llvm.i8_type context)

open Cpsbase

Creating and accessing memory objects


13.  These helper functions create or read-from memory object. Currently LLVM compiles using a very simple strategy: every value is boxed (including integers and floats). This simplifies compilation a lot: every value we create has type void *, and we cast the type from void * according to how we use it.

LLVM does not (yet?) know how to replace heap allocations with stack allocations, so we should do that (using an escape analysis). But LLVM has passes that allow promotion of stack allocations to register ("mem2reg" and "scalarrepl"), so once this is done (plus passing and returning arguments in registers), many values should be unboxed by the compiler (and this would not be that inefficient). Additional performances could then be obtained by monomorphizing the code.
14.  Store llvalue in heap-allocated memory.

let build_box llvalue name builder = 
   let lltype = Llvm.type_of llvalue in
   let pointer = Llvm.build_malloc lltype name builder in
   ignore(Llvm.build_store llvalue pointer builder);
   Llvm.build_bitcast pointer anystar_type (name ^ "box") builder
15.  Unbox a llvalue of type lltype.
let build_unbox llvalue lltype name builder = 
   let typeptr = Llvm.pointer_type lltype in
   let castedptr = Llvm.build_bitcast llvalue typeptr (name ^ "castedptr") builder in
   Llvm.build_load castedptr (name ^ "unbox") builder
16.  A n-tuple is allocated as an array of n anystar_type. Each element of the array contains the llvalue in l.
let build_tuple l builder = 
   let length = List.length l in
   let array_type = Llvm.array_type anystar_type length in 
   let pointer = Llvm.build_malloc array_type "tuple" builder in

   let f () (int,elem) = 
     (∗ Note: the first 0 is because pointer is not the start of the array, but a pointer to the start of the array, that must thus be dereferenced. ∗)
     let path = [| (Llvm.const_int i32_type 0); (Llvm.const_int i32_type int) |] in
     let gep_ptr = Llvm.build_gep pointer path "gep" builder in
     ignore(Llvm.build_store elem gep_ptr builder) in

   Utils.Int.fold_with_list f () (0,l);
   Llvm.build_bitcast pointer anystar_type ("tuplecast") builder

17.  Retrieve an element from a tuple.
let build_letproj pointer i builder = 
   let stringi = (string_of_int i) in 
   (∗ First we compute an acceptable LLvm type, and cast the pointer to that type (failure to do that makes Llvm.build_gep segfault). As we try to access the ith element, we assume we are accessing an array of size i+1. ∗)
   let array_type = Llvm.array_type anystar_type (i+1) in 
   let arraystar_type = Llvm.pointer_type array_type in
   let cast_pointer = Llvm.build_bitcast pointer arraystar_type ("castptr") builder in
   let gep_ptr = Llvm.build_gep cast_pointer [| (Llvm.const_int i32_type 0);
                                                 (Llvm.const_int i32_type i) |] 
     ("gep" ^ stringi) builder in 
   let result = Llvm.build_load gep_ptr ("builder" ^ stringi) builder in
   result 
18.  Apply primitive operations.
let build_integer_binary_op op a b builder = 
   let build_fn = match op with
     ∣ Constant.IAdd → Llvm.build_add
     ∣ Constant.ISub → Llvm.build_sub
     ∣ Constant.IMul → Llvm.build_mul
     ∣ Constant.IDiv → Llvm.build_udiv in
   let a_unbox = (build_unbox a i32_type "a" builder) in
   let b_unbox = (build_unbox b i32_type "b" builder) in
   let res = build_fn a_unbox b_unbox "bop" builder in
   build_box res "res" builder
19.  Build a call instruction, casting caller to a function pointer.
let build_call caller callee builder =
   let function_type = Llvm.pointer_type (Llvm.function_type anystar_type [| anystar_type |]) in
   let casted_caller = Llvm.build_bitcast caller function_type "function" builder in 
   let retval = Llvm.build_call casted_caller [| callee |] "retval" builder in
   retval

Creating and accessing basic blocks


20.  This special value is used to ensure, via the type checker, that compilation to LLVM never leaves a basic-block halfly built. LLVM basic blocks should all end with a terminator instruction; whenever one is inserted, the function should return End_of_block. When building non-terminator instructions, the code must continue building the basic block.

type termination = End_of_block
21.  This creates a new basic block in the current function.

Note that LLVM basic blocks are associated to a parent function, that we need to retrieve to create a new basic block.

let new_block builder = 
   let current_bb = Llvm.insertion_block builder in
   let the_function = Llvm.block_parent current_bb in
   let new_bb = Llvm.append_block context "k" the_function in
   new_bb
22.  Returns Some(phi) if the block already begins with a phi instruction, or None otherwise.
let begin_with_phi_node basic_block = 
   let pos = Llvm.instr_begin basic_block in
   match pos with
     ∣ Llvm.At_end(_) → None
     ∣ Llvm.Before(inst) → 
       (match Llvm.instr_opcode inst with
         ∣ Llvm.Opcode.PHI → Some(inst)
         ∣ _ → None)
23.  This builds a jmp instruction to destination_block, also passing the v value. This is achieved by setting v as an incoming value for the phi instruction that begins destination_block. If destination_block does not start with a phi node, then it is the first time that destination_block is called, and we create this phi node.
let build_jmp_to_and_add_incoming destination_block v builder =

   let add_incoming_to_block basic_block (value,curblock) = 
     match begin_with_phi_node basic_block with
       ∣ Some(phi) → Llvm.add_incoming (value,curblock) phi
       ∣ None → 
         (∗ Temporarily create a builder to build the phi instruction. ∗)
         let builder = Llvm.builder_at context (Llvm.instr_begin basic_block) in
         ignore(Llvm.build_phi [value,curblock] "phi" builder) in

   let current_basic_block = Llvm.insertion_block builder in
   add_incoming_to_block destination_block (v, current_basic_block);

   ignore(Llvm.build_br destination_block builder);
   End_of_block

24.  We use the following sum type to establish a distinction between:

  • continuation variables bound with lambda: calling them returns from the function, and the parameter x of the call k( x) is returned;
  • and continuation variables bound with letcont: calling them jumps to the corresponding basic block, and the parameter x of the call k( x) is passed to the phi node starting this basic block.

The CPS→LLVM translation maps continuation variables to dest_types.


type dest_type = 
   ∣ Ret 
   ∣ Jmp_to of Llvm.llbasicblock

Build a call to a continuation k x.
let build_applycont k x builder = 
   match k with
     ∣ Ret → ignore(Llvm.build_ret x builder); End_of_block
     ∣ Jmp_to(destination) → build_jmp_to_and_add_incoming destination x builder

Main CPS term translation


It is important for LLVM that function names are unique.

module UniqueFunctionId = Unique.Make(struct end)
25.  This function builds the CPS term cps, in the current block pointed to by builder. varmap maps CPS variables to LLVM llvalues. contvarmap maps CPS continuation variables to values of type contvar_type.

All the free variables or continuation variables in cps must be in contvarmap or in varmap. cps can contain lambda, but they must not contain any free variables or free continuation variables (even the one in varmap and contvarmap). Closure conversion deals with this. Note: previously-defined global variables are not considered free.

let rec build_term cps (contvarmap, varmap) builder =
26.  Helper functions to retrieve/add values from/to maps.
   let lookup_var x = 
     try VarMap.find x varmap 
     with _ → failwith "in lookup" in

   let lookup_contvar k = 
     try ContVarMap.find k contvarmap 
     with _ → failwith "in contvar lookup" in

   let add_to_varmap var value = VarMap.add var value varmap in
   let add_to_contvarmap contvar block = ContVarMap.add contvar (Jmp_to block) contvarmap in

27.  Converting the term is done by inductive decomposition. There are three kind of cases:

  • those that only build new values (letvalue, letproj, letprimop...) in the current basic block
  • those that return a value and end a basic block (apply, applycont, and halt)
  • the one that build a new basic blocks (letcont).

To keep the implementation simple, all values are boxed (i.e. put in the heap and accessed through a pointer), and of llvm type "i8 *". Pointer conversions are done according to the use of the value.

   match cps with
28.  These cases build a new value, then continue building the basic block.
     ∣ Let_value(x, value, body) → 
       let newllvalue = 
         (match value with 
           ∣ Constant(Constant.Int i) →
             let llvalue = Llvm.const_int i32_type i in
             build_box llvalue ("int" ^ string_of_int i) builder

           ∣ Tuple(l) →
             let llvalues = List.map lookup_var l in
             build_tuple llvalues builder

           This build a new function, with private linkage (since that it can be used only by the current term), which allows llvm optimizations.

Note that build_function will use a new builder, so the lambda can be built in parallel with the current function. Also it will use new variables and continuation variable maps (with only the x parameter), so the lambda expression must not contain any free variables.

           ∣ Lambda(k,x,body) → 
             let f = build_function "lambda" k x body in
             Llvm.set_linkage Llvm.Linkage.Private f;
             Llvm.build_bitcast f anystar_type "lambdacast" builder

           Expressions such as let x = primitive] should have been translated into something like let x = (a,b) -> primitiveop( a,b) ] in previous compilation stage, so should fail here.
           ∣ Constant(c) → 
             assertConstant.is_function c);
             failwith "ICE: primitive operations as value in LLVM translation."
         )
       in build_term body (contvarmap, (add_to_varmap x newllvalue)) builder

     Primitive operations are similar to letvalue.
     ∣ Let_primop(x,prim,body) → 
       let result = (match prim with 
         ∣ Integer_binary_op(op,xa,xb) → 
           build_integer_binary_op op (lookup_var xa) (lookup_var xb) builder
         ∣ Projection(x,i) → build_letproj (lookup_var x) i builder
       ) in
       build_term body (contvarmap, (add_to_varmap x result)) builder
29.  Building new basic blocks. The algorithm first creates an empty basic block, bound to [k], then build [body], then build [term] (if [k] is really called), binding [x] to the phi node.

The tricky part is that the llvm bindings do not allow to create an "empty" phi node (even if it would, in future implementations which would not box everything we would still have to know the llvm type of the phi node, and that llvm type is not known until we have processed the jumps to that node). So it is the calls to k that create or change the phi node; no phi node means [k] is never called.

Doing the operations in this order ensures that calls to [k] are processed before [k] is built.

     ∣ Let_cont(k,x,term,body) → 
       let new_bb = new_block builder in
       let newcvm = add_to_contvarmap k new_bb in
       let End_of_block = build_term body (newcvm, varmap) builder in
       Llvm.position_at_end new_bb builder;
       (match begin_with_phi_node new_bb with
         ∣ None → End_of_block
         ∣ Some(phi) → build_term term (newcvm, (add_to_varmap x phi)) builder)
30.  Cases that change or create basic blocks.
     Depending on k, applycont either returns or jumps to k.
     ∣ Apply_cont(k,x) → 
       build_applycont (lookup_contvar k) (lookup_var x) builder

     The CPS semantics state that caller should return to k, but LLVM SSA does not require that calls end basic blocks. So we just build a call instruction, and then a call to [k]. LLVM optimizations will eliminate the superfluous jump if needed.
     ∣ Apply(caller,k,callee) → 
       let retval = build_call (lookup_var caller) (lookup_var callee) builder in
       build_applycont (lookup_contvar k) retval builder

     ∣ Halt(x) → ignore(Llvm.build_ret (lookup_var x) builder); End_of_block

Expression built out of a definition are put in a "void -> void" function.
and build_nodef name cpsbody = 
   prepare_build name cpsbody None

and build_function name contparam param cpsbody =
   prepare_build name cpsbody (Some (contparam,param))

Build the function around the main term cpsbody, possibly taking some parameters k and x.
and prepare_build name cpsbody param = 
   let params_type = match param with None → [| |] ∣ _ → [| anystar_type |] in
   let function_type = Llvm.function_type anystar_type params_type in
   (∗ Note: it is important for LLVM that function names are unique. ∗)
   let funname = name ^ "#" ^ (UniqueFunctionId.to_string (UniqueFunctionId.fresh())) in
   let the_function = Llvm.declare_function funname function_type the_module in
   let bb = Llvm.append_block context "entry" the_function in
   (∗ Note that we use a new builder. We could even build the functions in parallel. ∗)
   let builder = Llvm.builder context in
   Llvm.position_at_end bb builder;
   try 
     let initial_varmaps = 
       match param with 
         ∣ None → (ContVarMap.empty, VarMap.empty)
         ∣ Some(k,x) → (ContVarMap.singleton k Ret,
                         VarMap.singleton x (Llvm.param the_function 0)) in

     ignore(build_term cpsbody initial_varmaps builder);
     (∗ Prints the textual representation of the function to stderr. ∗)
     Llvm.dump_value the_function;
     (∗ Validate the code we just generated. ∗)
     Llvm_analysis.assert_valid_function the_function;
     the_function
   (∗ Normally, no exception should be thrown, be we never know. ∗)
   with e → Llvm.delete_function the_function; raise e