2013/05/28

Compiling pattern matching

I have finished implementing the compilation of pattern matching for L. L source code can now use ML-style patterns, as used in the match, let, and fun/function operators of OCaml. See http://caml.inria.fr/pub/docs/oreilly-book/html/book-ora016.html for an example in OCaml if you don't know about pattern matching.

Compilation of pattern matching happens together with the CPS transformation phase of the compiler, which translates the "AST" language (used in particular to perform type inference) to the "CPS" language (in which jumps and order of computation is made explicit). CPS transformation fixes the order of evaluation, so it makes sense to perform pattern compilation during this phase.

As the rest of the compiler, this module has been been written in literate programming style (or at least, it is heavily commented, so that it should be understandable by someone with no prior knowledge of the subject). I have extracted the header of the module at the bottom of this post.

The next step is to finish cleaning the rest of the CPS transformation (which is much easier); with this the entire backend of the compiler will have been published.

Module Cpstransform_rules

The Cpstransform_rules module handles the compilation of a set of rules. A rule is composed of a pattern, matched against a value; and an expression (the body of the rule), to be executed if the pattern succeeds in matching the value.

The match(expression){rules} expression matches an expression against a set of rules; its behaviour is to evaluate the expression once, and try to match the resulting value against the patterns in rules until one succeeds, in which case the corresponding body is executed.

The order of the rules is thus important: a value can be matched by the patterns of two rules, e.g. (5,6) is matched by (5,_) and (_,6); but only the body of the first rule is executed. This implies that two successive rules in a match can be reordered if, and only if, they are disjoint, i.e. they cannot match the same value.

But when allowed, reordering and factorizing the compilation of rules matching lead to more compact, faster code. Trying to produce the most efficient code for matching against a set of rules is a NP-complete problem (the complexity arise when compiling multiple independent patterns, for instance tuple patterns). Rather than attempting to solve this problem, L specifies how pattern matching is compiled, which allows the developper to visualize the costs of its pattern matching.

Pattern matching rewrites

The compiler maintains a list of patterns that remain to be matched for each rule, and a list of values against which each rule is matched. The list of the first pattern to be matched in each rules is the first column, and is matched against the first value. Several cases can occur (see in the book "The implementation of functional programming languages" by Simon Peyton Jones, the chapter 5: "Efficient compilation of pattern matching", by Philip Wadler):

There is no more column

When there are no remaining patterns to match, and no remaining values, it means that all the rules match. As pattern matching selects the first rule that matches, we execute the body of the first rule, and discard the other rules with a warning.

For instance in

match(){
 () → body1
 () → body2
}

body1 matches and is executed; body2 also matches, but is superseded by body1, and is just discarded.

The column contain only variables

In this case, in each rule the variable is bound to the value, and matching continues. For instance in

match(expr1,...){
 (a,...) → body1
 (b,...) → body2
}

a is bound to v1 in the first rule, and b in the second rule; where v1 is a CPS variable representing the result of computing expr1 in the condition of the match (computation in the condition is thus not repeated). Matching then proceeds, starting from the second column.

This rule can be extended to incorporate wildcard (_) patterns (where nothing is bound), and all irrefutable patterns. A pattern is irrefutable if it does not contain any variant.

For instance, consider

match((expr1,expr2),...){
 (a,...) → body1
 (_,...) → body2
 ((c,d),...) → body3
}

The column contains only irrefutable patterns. Let v1 be a CPS variable containing the evaluation of expr1, and v2 containing the evaluation of expr2. Then a is bound to (v1,v2), c to v1, and d to v2.

The column contains only calls to constructors

A constructor is a specific version of a variant type; for instance 3 is a constructor of Int, True a constructor of Bool, and Cons(1,Nil) a constructor of List<Int>.

Note that if the column contain a variant, then all the constructors that it contains are of the same type: this is necessary for the pattern matching to typecheck.

When two contiguous rules have different constructors at the same place, they cannot match the same value simultaneously: they are thus disjoint, and can be swapped. This allows to group the rules according to the constructor in their first column (the order of the rules within a group being preserved).

For instance,

match(expr1, expr2){
 (Cons(a,Cons(b,Nil)),Nil) → body1
 (Nil,Nil) → body2
 (Nil,c) → body3
 (Cons(d,e),_) → body4 
}

can be grouped as (preserving the order between rules 1 and 4, and 2 and 3) :

match(expr1, expr2){
 (Cons(a,Cons(b,Nil)),Nil) → body1
 (Cons(d,e),_) → body4
 (Nil,Nil) → body2
 (Nil,c) → body3
}

Then, the matching of contiguous rules with the same constructor can be factorized out, as follow:

let v1 = expr1 and v2 = expr2 in
match(v1){
 Cons(hd,tl) → match((hd,tl),v2){
     ((a,Cons(b,Nil)), Nil) → body1 
     ((d,e),_) → body4 
 }
 Nil → match(v2){
     Nil → body2 
     c → body3 
 }
}

Note that the L compiler matches the values in the constructor before matching the other columns of the tuple, as was exemplified in the Cons rules.

The construct of matching only the constructors of a single variant type can be transformed directly into the CPS case expression. It is generally compiled very efficiently into a jump table, and dispatching to any rule is done in constant time. (Note that the compiler may not generate a jump table if the list of constructors to check is sparse).

The first column contains both refutable and irrefutable patterns

If the first column contains both kind of patterns, the list of rules is split into groups such that the ordering between rules is preserved, and either all the rules in the group have their first pattern that is refutable, or they are all irrefutable.

For instance, the following match:

match((v1,v2),v3){
 (_,1) → 1
 ((a,b),2) → a+b+2
 ((3,_),3) → ...
 ((4,_),_) → ...
 ((_,5),_) → ...
 ((a,b),c) → a+b+c
}

is split into three groups, with respectively rules 1-2 (_ and (a,b) are both irrefutable patterns), rules 3-5, and rule 6. Then the groups are matched successively, the next group being matched only if no rule matched in the first one. This amount to performing the following transformation:

let c = ((v1,v2),v3) in
match(c){
 (_,1) → 1
 ((a,b),2) → a+b+2
 _ → match(c){
     ((3,_),3) → ...
     ((4,_),_) → ...
     ((_,5),_) → ...
     _ → match(c){
         ((a,b),c) → a+b+c
         }
     }
}

Note that rules 3,4,5 also need to be split and transformed further using this method.

Compiling pattern matching

Compilation order

The order in which checking is made for a set of patterns is a choice, done by the compiler. L chooses to match the tuples from left to right, and the contents of the constructor as soon as they are matched; and to split rules according to the refutability of the pattern in their first remaining column. This choice may not be optimal in every case (but minimizing the number of matches is a NP-hard problem), but allows for a simple, visual analysis of the cost of pattern matching. The user is free to rearrange the set of patterns to improve performance (possibly guided by compiler hints).

Element retrieval

At the beginning of a match, all the components, needed by at least one rule, that can be retrieved (i.e. components in a tuple etc., but not those that are under a specific constructor) are retrieved. When a constructor is matched, all the components that can be retrieved that were under this constructor are retrieved. This behaviour produces the most compact code (avoid duplicating retrieval of elements in the compilation of several rules), but maybe not the most efficient (sometimes elements are retrieved that are not used). Optimizations, such as shrinking reductions, are allowed to move down or even duplicate code performing retrieval of elements into the case.

CPS

Compilation of pattern matching is done during the CPS transformation, which transforms source code from the AST language to the CPS language. There are several reasons for that:

  • The CPS transformation of expression fixes the order of their evaluation; compiling pattern matching fixes the order in which patterns are matched. So it makes sense to do both at the same time, to have a single pass that fixes all the order of evaluation.

    As a side note, it makes sense to keep pattern matching in the AST language, because patterns are easy to type, and any typing error can be more easily returned to the user.

  • The CPS language provides continuations, which allows to express explicit join points in the control flow, something not possible in the AST language (without creating new functions). These joint points are necessary notably to factorize the compilation of pattern matching (this problem is similar to compilation of boolean expressions with the short-circuit operators && and ||). For instance, compiling:

    match(v){ (4,5) → 1; _ → 2 }

    gives:

    let k_not4_5() = { kreturn(2) }
    match(#0(v)){
     4 → match(#1(v)){
       5 → kreturn(1)
       _ → k_not4_5()
     }
     _ → k_not4_5()
    }

    Matching against (4,5) can fail at two different steps, and the action to perform in these two cases are the same, so they should be factorized using the same continuation.

    The L compiler does not yet allow it, but "or-patterns" (i.e. in match(l){ Cons(NilCons(_,Nil)) → 0 _ → 1 }) also need join points. Finally, there is also a joint point (in expr_env.context) to which the value of the bodies in each rule is returned.

All the functions that involve building CPS code are themselves in CPS style; see the Cps_transform_expression module for an explanation.

A complete example

Here is a (contrived) exemple of a complete pattern matching:

This pattern is compiled as follows. We begin by creating a join continuation, which is where the result of the match is returned. This allows to factorize the following computation (the addition to 17 in our case).

let kfinal(x) = { let x17 = x + 17 in halt(x17) }

Then, the condition of the match e is evaluated, and its result stored in a temporary value.

let v = ... eval e ...

Then, analysis of the patterns show that v contains a tuple.

let v.0 = #0(v)
let v.1 = #1(v)

Analysis of the patterns also show that v.0 contain a tuple. v.1 is a variant type, so its elements cannot be retrieved yet.

let v.0.0 = #0(v.0)
let v.0.1 = #1(v.0)

We begin by analysis the whole pattern (i.e. column c0). All the rules are refutable, except the last one, so we split them into two contiguous blocks bi and bii; bii is executed if matching against all the rules in bi fail.

decl kb_ii

All the rules in bi are tuples, so we inspect them from left to right (i.e. we begin by column c1, then proceed with c4). Analysis of column c1 yields three contiguous blocks: the patterns in column c1 are all irrefutable for block bi.a, refutable for block bi.b, and irrefutable again for block bi.c.

decl kb_i.b, kb_i.c

As the patterns of column c1 in rules in bi.a are all irrefutable, we just have to associate the variable x to v.0.0 and a to v.0.1 for the translation of the body of the rules. (x and a are unused in the rules of the example).

We can then proceed with the analysis of column c2 (still in block bi.a). It is a variant, so we can regroup the rules according to the constructor, and perform a simple case analysis.

decl kcons
match(v.1){
 Nil → { kfinal(2) }
 Cons(x) → { kcons(x) } 
}

For the Nil constructor, we are already done. For Cons, we have to discriminate against the patterns inside the Cons. But first, we analyze these patterns to retrieve all the elements that are needed:

let kcons(x) = {
 let x.0 = #0(x)
 let x.1 = #1(x)
 let x.0.0 = #0(x.0)
 let x.0.1 = #1(x.0)

There are two contiguous blocks: one with rule 1 and 3 (since rule 2 has been regrouped with the Nil), and one with rule 4. We begin with the 1-3 block:

 decl knext
 match(x.0.0){
   1 → { kfinal(1)}
   3 → { kfinal(3)}
   _ → { knext()}
 }

If matching against the 1-3 block fails, we match against rule 4. If this fails, then matching against all the rules in bi.a failed, and we try to match against the rules in bi.b.

 let knext() = {
   match(x.0.1){
     4 → { kfinal(4)}
     _ → { kb_i.b()}
   }
 }
}

The rest of the matching is very similar. In bi.b.1, the matching against rules 5 and 7 is factorized, because there is a common constructor. (Note that there is no factorization on Cons between rules 4 and 5, because they are in different blocks). Then blocks bi.b.2, bi.b.3, bi.c are tried successively. In bi.c, the test of Cons is factorized, but not the test for Nil, because testing Nil is done after testing 10.

Finally, the pattern matching always succeeds since rule 12 is irrefutable, so there is no need to introduce code that perform match_failure in case nothing succeeds in matching.

Note that the presentation would have been clearer if the patterns had been regrouped differently; in particular, grouping rules who share a constructor matched as the same time (e.g. exchanging rules 1 and 2, and rule 5 and 6) would improve the presentation.