2013/06/30

Structuring the compiler

Simple is difficult

In the implementation of the L compiler, the focus was put on the readability and simplicity of the source code. As often in computer science, it is difficult to make things simple: I had a complete, working prototype, with parser, typing, cps transformation and compilation to LLVM when I started this blog; I spent all this time cleaning, documenting, and especially simplifying the compiler, publishing the progress along the way.

I believe that the result is worth the work: a simpler compiler is easier to extend, both for myself and future contributors. The challenge will be to keep things easy as more features are added; this is possible only if we wait for things to be well structured before adding the non-essential features1; and if we do not hesitate to perform a local redesign occasionally.

Implementation of the toplevel of the compiler

The toplevel of the compiler, that link all the passes together, is a good example of the results of such a redesign. A design goal of this toplevel is to make each pass communicate with a minimal interface. I had written a first attempt, using a record updated functionally, with each field containing the internal state of one pass. The toplevel consisted in a giant function with nested loops – one loop per pass. This structure was inelegant for a number of reasons:

  • Even if the type of the state could be made abstract, the abstract type was still part of the interface. Moreover, the initial value for this type also had to be provided by the passes.
  • Updating the environment functionally introduced a lot of boilerplate code; even if imperative updates might have helped a little.
  • There was also some boilerplate code to handle the case where a single input could return multiple output (e.g. a datatype declaration in ML defines both a type and some constructors).
  • The nested loop pattern was not really readable.

After a while, I came up with a much simpler design using streams of definitions. The code for a toplevel is then dead-simple:

module Stream = Extensions.Stream

let process_file file =
   let parsetree_stream = Parser.make_stream file in
   let ast_stream = Astfromsexp.from_stream parsetree_stream in
   let cps_stream = Cpstransform.from_stream ast_stream in
   let converted_stream =
     Stream.iter_and_copy Cpsconvertclosures.in_definition cps_stream in
   let llvm_stream = Cpsllvm.from_stream converted_stream in
   Stream.iter (fun elt → Llvmexec.exec_nodef () elt) llvm_stream

The actual implementation also interposes optional printer to see the contents of the stream, which is very useful when debugging the compiler.

The stream interface make it easy to add new passes, completely remove the state from the interface, factorize the pattern when a pass can produce several elements at once, retain the minimal memory consumption of the nested loop design, and can be easily updated to a parallel pipelining implementation.

Interface to the passes and modular implementation

Interface of each pass

The interface of each pass is made minimal. Each pass only declares:

  • The type of the elements in the stream (e.g. tokens, typed AST definitions, CPS definitions…). The type is not abstract: it is used by the following pass to convert them to its own representation (e.g. the CPS pass use the AST type to perform CPS transformation, the LLVM pass use the CPS type to compile to LLVM).
  • A function mapping an input stream to an output stream.
  • Possibly, a function to print elements in the stream, used for debugging the compiler.

And that is all. The interface of each pass clear is minimal, which allows to study or change it independently.

Note: this way of structuring things imply that transformation from one pass to the next is done in the second pass. (This implies that the second pass depends on the first one).

An alternative is to expose functions to build elements in the second pass, and move the transformation code from the beginning of the second pass to the end of the first pass. In this case, the first pass depends on the second pass. One advantage is that the first pass does not have to expose the type of its internal format. I am considering using this structure for the parser, where a "parsetree" is produced, but whose type is not very interesting outside of the parsing phase; while the building functions for the AST are more interesting.

Modules inside each pass

Inside each pass, the code is structured around a hierarchy of modules and interfaces. The idea is that each interface progressively hides more details about the implementation of the pass, until the top where we get the minimal interface presented earlier. Intermediate levels show up when we manage to group a complex piece of code into a simple interface. For instance in the CPS pass, there are four intermediary modules:

  • CPStransform, performing AST to CPS transformation. Its implementation consist of 4 sub-modules, that include the compilation of pattern matching; its interface only consists of a single function.
  • CPSconvertclosure is a slightly less complex algorithm; its interface also consists in a single function.
  • CPSshrink will perform shrinking reductions and optimizations. I have not yet implemented the cleaned version of this module.
  • CPSbase provides general functions for creating and updating CPS terms; the interface of this module is more complex, but hides many details about the internal CPS data structure (which is quite complex), and prevents direct writes to this structure.

Structuring the code around hierarchical, modular interfaces and implementations thus allow intermediate levels of abstractions; when completing a task one is provided with just the needed information, the right degree of abstraction; this helps to focus. The hierarchy also decrease, and thus simplifies, the size of the interfaces. OCaml was perfect for this job, but C, to a lesser extent, also allows this kind of modular programming.

Comparison with object orientation

Note that this design is not at all object-oriented. I think that when providing a new abstract type, packing it with all the methods allowing to access or update objects of this type is an excellent idea. However by using only object-oriented interfaces, the tendency is to to present all methods of an object in a flat manner, and implement all of them in the same class and file. This is problematic when there are many methods (which should be grouped by themes), or that some methods are lengthy algorithms (the algorithm should be set in a separate file). Modules allow such grouping and separation of methods, while creating a new usual object-oriented classes would not work (because we operate on an object defined in another class). Of course there are object-oriented work-arounds; but I think that modules and interfaces with abstract types solve the problem more elegantly (while still allowing "type + methods"-style interfaces). I hope that the code for the L compiler, written in OCaml, shows my point.

Footnotes:

1 I have a huge and ever-growing list of extensions I would like to experiment or add to the L language

2013/05/28

Compiling pattern matching

I have finished implementing the compilation of pattern matching for L. L source code can now use ML-style patterns, as used in the match, let, and fun/function operators of OCaml. See http://caml.inria.fr/pub/docs/oreilly-book/html/book-ora016.html for an example in OCaml if you don't know about pattern matching.

Compilation of pattern matching happens together with the CPS transformation phase of the compiler, which translates the "AST" language (used in particular to perform type inference) to the "CPS" language (in which jumps and order of computation is made explicit). CPS transformation fixes the order of evaluation, so it makes sense to perform pattern compilation during this phase.

As the rest of the compiler, this module has been been written in literate programming style (or at least, it is heavily commented, so that it should be understandable by someone with no prior knowledge of the subject). I have extracted the header of the module at the bottom of this post.

The next step is to finish cleaning the rest of the CPS transformation (which is much easier); with this the entire backend of the compiler will have been published.

Module Cpstransform_rules

The Cpstransform_rules module handles the compilation of a set of rules. A rule is composed of a pattern, matched against a value; and an expression (the body of the rule), to be executed if the pattern succeeds in matching the value.

The match(expression){rules} expression matches an expression against a set of rules; its behaviour is to evaluate the expression once, and try to match the resulting value against the patterns in rules until one succeeds, in which case the corresponding body is executed.

The order of the rules is thus important: a value can be matched by the patterns of two rules, e.g. (5,6) is matched by (5,_) and (_,6); but only the body of the first rule is executed. This implies that two successive rules in a match can be reordered if, and only if, they are disjoint, i.e. they cannot match the same value.

But when allowed, reordering and factorizing the compilation of rules matching lead to more compact, faster code. Trying to produce the most efficient code for matching against a set of rules is a NP-complete problem (the complexity arise when compiling multiple independent patterns, for instance tuple patterns). Rather than attempting to solve this problem, L specifies how pattern matching is compiled, which allows the developper to visualize the costs of its pattern matching.

Pattern matching rewrites

The compiler maintains a list of patterns that remain to be matched for each rule, and a list of values against which each rule is matched. The list of the first pattern to be matched in each rules is the first column, and is matched against the first value. Several cases can occur (see in the book "The implementation of functional programming languages" by Simon Peyton Jones, the chapter 5: "Efficient compilation of pattern matching", by Philip Wadler):

There is no more column

When there are no remaining patterns to match, and no remaining values, it means that all the rules match. As pattern matching selects the first rule that matches, we execute the body of the first rule, and discard the other rules with a warning.

For instance in

match(){
 () → body1
 () → body2
}

body1 matches and is executed; body2 also matches, but is superseded by body1, and is just discarded.

The column contain only variables

In this case, in each rule the variable is bound to the value, and matching continues. For instance in

match(expr1,...){
 (a,...) → body1
 (b,...) → body2
}

a is bound to v1 in the first rule, and b in the second rule; where v1 is a CPS variable representing the result of computing expr1 in the condition of the match (computation in the condition is thus not repeated). Matching then proceeds, starting from the second column.

This rule can be extended to incorporate wildcard (_) patterns (where nothing is bound), and all irrefutable patterns. A pattern is irrefutable if it does not contain any variant.

For instance, consider

match((expr1,expr2),...){
 (a,...) → body1
 (_,...) → body2
 ((c,d),...) → body3
}

The column contains only irrefutable patterns. Let v1 be a CPS variable containing the evaluation of expr1, and v2 containing the evaluation of expr2. Then a is bound to (v1,v2), c to v1, and d to v2.

The column contains only calls to constructors

A constructor is a specific version of a variant type; for instance 3 is a constructor of Int, True a constructor of Bool, and Cons(1,Nil) a constructor of List<Int>.

Note that if the column contain a variant, then all the constructors that it contains are of the same type: this is necessary for the pattern matching to typecheck.

When two contiguous rules have different constructors at the same place, they cannot match the same value simultaneously: they are thus disjoint, and can be swapped. This allows to group the rules according to the constructor in their first column (the order of the rules within a group being preserved).

For instance,

match(expr1, expr2){
 (Cons(a,Cons(b,Nil)),Nil) → body1
 (Nil,Nil) → body2
 (Nil,c) → body3
 (Cons(d,e),_) → body4 
}

can be grouped as (preserving the order between rules 1 and 4, and 2 and 3) :

match(expr1, expr2){
 (Cons(a,Cons(b,Nil)),Nil) → body1
 (Cons(d,e),_) → body4
 (Nil,Nil) → body2
 (Nil,c) → body3
}

Then, the matching of contiguous rules with the same constructor can be factorized out, as follow:

let v1 = expr1 and v2 = expr2 in
match(v1){
 Cons(hd,tl) → match((hd,tl),v2){
     ((a,Cons(b,Nil)), Nil) → body1 
     ((d,e),_) → body4 
 }
 Nil → match(v2){
     Nil → body2 
     c → body3 
 }
}

Note that the L compiler matches the values in the constructor before matching the other columns of the tuple, as was exemplified in the Cons rules.

The construct of matching only the constructors of a single variant type can be transformed directly into the CPS case expression. It is generally compiled very efficiently into a jump table, and dispatching to any rule is done in constant time. (Note that the compiler may not generate a jump table if the list of constructors to check is sparse).

The first column contains both refutable and irrefutable patterns

If the first column contains both kind of patterns, the list of rules is split into groups such that the ordering between rules is preserved, and either all the rules in the group have their first pattern that is refutable, or they are all irrefutable.

For instance, the following match:

match((v1,v2),v3){
 (_,1) → 1
 ((a,b),2) → a+b+2
 ((3,_),3) → ...
 ((4,_),_) → ...
 ((_,5),_) → ...
 ((a,b),c) → a+b+c
}

is split into three groups, with respectively rules 1-2 (_ and (a,b) are both irrefutable patterns), rules 3-5, and rule 6. Then the groups are matched successively, the next group being matched only if no rule matched in the first one. This amount to performing the following transformation:

let c = ((v1,v2),v3) in
match(c){
 (_,1) → 1
 ((a,b),2) → a+b+2
 _ → match(c){
     ((3,_),3) → ...
     ((4,_),_) → ...
     ((_,5),_) → ...
     _ → match(c){
         ((a,b),c) → a+b+c
         }
     }
}

Note that rules 3,4,5 also need to be split and transformed further using this method.

Compiling pattern matching

Compilation order

The order in which checking is made for a set of patterns is a choice, done by the compiler. L chooses to match the tuples from left to right, and the contents of the constructor as soon as they are matched; and to split rules according to the refutability of the pattern in their first remaining column. This choice may not be optimal in every case (but minimizing the number of matches is a NP-hard problem), but allows for a simple, visual analysis of the cost of pattern matching. The user is free to rearrange the set of patterns to improve performance (possibly guided by compiler hints).

Element retrieval

At the beginning of a match, all the components, needed by at least one rule, that can be retrieved (i.e. components in a tuple etc., but not those that are under a specific constructor) are retrieved. When a constructor is matched, all the components that can be retrieved that were under this constructor are retrieved. This behaviour produces the most compact code (avoid duplicating retrieval of elements in the compilation of several rules), but maybe not the most efficient (sometimes elements are retrieved that are not used). Optimizations, such as shrinking reductions, are allowed to move down or even duplicate code performing retrieval of elements into the case.

CPS

Compilation of pattern matching is done during the CPS transformation, which transforms source code from the AST language to the CPS language. There are several reasons for that:

  • The CPS transformation of expression fixes the order of their evaluation; compiling pattern matching fixes the order in which patterns are matched. So it makes sense to do both at the same time, to have a single pass that fixes all the order of evaluation.

    As a side note, it makes sense to keep pattern matching in the AST language, because patterns are easy to type, and any typing error can be more easily returned to the user.

  • The CPS language provides continuations, which allows to express explicit join points in the control flow, something not possible in the AST language (without creating new functions). These joint points are necessary notably to factorize the compilation of pattern matching (this problem is similar to compilation of boolean expressions with the short-circuit operators && and ||). For instance, compiling:

    match(v){ (4,5) → 1; _ → 2 }

    gives:

    let k_not4_5() = { kreturn(2) }
    match(#0(v)){
     4 → match(#1(v)){
       5 → kreturn(1)
       _ → k_not4_5()
     }
     _ → k_not4_5()
    }

    Matching against (4,5) can fail at two different steps, and the action to perform in these two cases are the same, so they should be factorized using the same continuation.

    The L compiler does not yet allow it, but "or-patterns" (i.e. in match(l){ Cons(NilCons(_,Nil)) → 0 _ → 1 }) also need join points. Finally, there is also a joint point (in expr_env.context) to which the value of the bodies in each rule is returned.

All the functions that involve building CPS code are themselves in CPS style; see the Cps_transform_expression module for an explanation.

A complete example

Here is a (contrived) exemple of a complete pattern matching:

This pattern is compiled as follows. We begin by creating a join continuation, which is where the result of the match is returned. This allows to factorize the following computation (the addition to 17 in our case).

let kfinal(x) = { let x17 = x + 17 in halt(x17) }

Then, the condition of the match e is evaluated, and its result stored in a temporary value.

let v = ... eval e ...

Then, analysis of the patterns show that v contains a tuple.

let v.0 = #0(v)
let v.1 = #1(v)

Analysis of the patterns also show that v.0 contain a tuple. v.1 is a variant type, so its elements cannot be retrieved yet.

let v.0.0 = #0(v.0)
let v.0.1 = #1(v.0)

We begin by analysis the whole pattern (i.e. column c0). All the rules are refutable, except the last one, so we split them into two contiguous blocks bi and bii; bii is executed if matching against all the rules in bi fail.

decl kb_ii

All the rules in bi are tuples, so we inspect them from left to right (i.e. we begin by column c1, then proceed with c4). Analysis of column c1 yields three contiguous blocks: the patterns in column c1 are all irrefutable for block bi.a, refutable for block bi.b, and irrefutable again for block bi.c.

decl kb_i.b, kb_i.c

As the patterns of column c1 in rules in bi.a are all irrefutable, we just have to associate the variable x to v.0.0 and a to v.0.1 for the translation of the body of the rules. (x and a are unused in the rules of the example).

We can then proceed with the analysis of column c2 (still in block bi.a). It is a variant, so we can regroup the rules according to the constructor, and perform a simple case analysis.

decl kcons
match(v.1){
 Nil → { kfinal(2) }
 Cons(x) → { kcons(x) } 
}

For the Nil constructor, we are already done. For Cons, we have to discriminate against the patterns inside the Cons. But first, we analyze these patterns to retrieve all the elements that are needed:

let kcons(x) = {
 let x.0 = #0(x)
 let x.1 = #1(x)
 let x.0.0 = #0(x.0)
 let x.0.1 = #1(x.0)

There are two contiguous blocks: one with rule 1 and 3 (since rule 2 has been regrouped with the Nil), and one with rule 4. We begin with the 1-3 block:

 decl knext
 match(x.0.0){
   1 → { kfinal(1)}
   3 → { kfinal(3)}
   _ → { knext()}
 }

If matching against the 1-3 block fails, we match against rule 4. If this fails, then matching against all the rules in bi.a failed, and we try to match against the rules in bi.b.

 let knext() = {
   match(x.0.1){
     4 → { kfinal(4)}
     _ → { kb_i.b()}
   }
 }
}

The rest of the matching is very similar. In bi.b.1, the matching against rules 5 and 7 is factorized, because there is a common constructor. (Note that there is no factorization on Cons between rules 4 and 5, because they are in different blocks). Then blocks bi.b.2, bi.b.3, bi.c are tried successively. In bi.c, the test of Cons is factorized, but not the test for Nil, because testing Nil is done after testing 10.

Finally, the pattern matching always succeeds since rule 12 is irrefutable, so there is no need to introduce code that perform match_failure in case nothing succeeds in matching.

Note that the presentation would have been clearer if the patterns had been regrouped differently; in particular, grouping rules who share a constructor matched as the same time (e.g. exchanging rules 1 and 2, and rule 5 and 6) would improve the presentation.