2012/05/01

Guaranteed optimizations for system programming languages

System programming language definition

A system programming language is a language suitable for programming system software, like operating system kernels, drivers, or the runtime of a programming language. I think the most general definition for a system programming language is:

Given any maximally-optimized assembly code, a system language allows to write a program which is compiled to be equivalent to that code.

"Equivalent" means that some instructions can be reordered or replaced (e.g. replacing multiplication by shifts and additions), and register allocation may be different. But all memory accesses, and control flow is the same.

So the language can be seen as a "portable assembler". System programming is exactly this: when you write a piece of code, you know exactly how it could and should be compiled. That's why C is appreciated by system programmers.

Here is a non-exhaustive list of specific things a system programming language must allow:

Control of memory allocation
The programmer should be able to choose whether an object is heap, static, or stack-allocated, is explicitly freed, reference-counted, or garbage collected, etc. Note however that control over whether a variable is stack- or register-allocated is not generally useful.
Conformance to a binary interface
For instance, it is easy to write C structs and enums that conform to a network packet header, filesystem directory entry, MMIO maps of a driver, or ELF section header. In other language, you can conform to an interface using a library (e.g. OCaml's bitstring), but you need to write conversion functions from and to the interface, which makes the code more complex and inefficient.
Control over compilation
Abstraction are practical for understandability and factorization, but create inefficiencies. For instance, a function call has an overhead that can be avoided when the function is inlined. A system language allows to write simple, concrete code that gets compiled exactly to the assembly output one wants.

For instance, GCC has a always_inline keyword that forces all calls to a function to be inlined. This allows things such as partial application of some arguments to a function, and in general a way to control the choice between code size and code efficiency, which is very important for embedded systems.

The compiler's freedom of reordering/removing reads and writes can be controlled by the volatile keyword, or insertion of "fences", or the recent C11 standard for control over memory ordering for multithreaded environments.

Note however that C is not a perfect system programming language. For instance:

  • It does not allow to write "special" assembly instructions inline in a standard way (although most compilers allow this as an extension). The only compiler-agnostic way is to write separate functions for special instructions, which is inefficient.
  • It does not allow to choose which registers are clobbered by a function, to force local variables to be allocated in a specific register, or to require that the stack is not used for a function. There are specific pieces of system code which have these kind of requirements (like interrupt routines), and must be written in assembly because of this.

Also note that the fact that a language can do system programming does not mean that it cannot be used for high-level programming. For instance you could have a mean to enforce how memory is allocated, but have a default choice of garbage collection when you do not care.

Finally, note that the above features can be useful for applicatiion programming as well: conformance to a binary interface can be used for processing binary files; and control over compilation and memory allocation may be useful for the performance-critical parts of a program.

Guaranteed optimization

We consider the following piece of high-level code:

def f(x,y) = 
{ let a = g(x,y);
  let b = h(x,y);
  a + b 
}

A system programmer would expect this code to be compiled to something like this (in the following, rx are registers).

f: 
// x and y are in r1 and r2
r3 <- r1
call g 
r4 <- r1
r1 <- r3
call h
r1 <- add r1,r4 
return

But if the language is compiled by a naïve functional compiler, the following could happen instead:

  • Every let is transformed to a lambda abstraction (that create a closure) + function call. I.e. the code is transformed to:

    (\a.[(\b. a + b)( h(x,y) )])( g(x,y) )

    Two functions are created, one is a closure (allocated to heap). This breaks control over compilation and control of memory allocation.

  • The + could be a function that corresponds to a typeclass; an additional "typeclass" argument is passed to the function, and + is replaced by a function call. This breaks control over compilation and control of binary interface.

However, if the compiler ensured that:

  • let-bound variables do not create new functions and are stack- or register- allocated;
  • When the return types of g and h is Int, then the + function is inlined

Then compilation would look like the above assembly code. Thus, guaranteed optimizations allow using high-level constructs for system programming.

The classical example of guaranteed optimization is the tail-call optimization, that appeared in the specification of the Scheme programming language, and allowed removal of loops (I am not aware of other examples). I think it's time to take this idea further.

No comments:

Post a Comment