Inspired by Dan Piponi’s latest post, I have been looking into partial evaluation. In particular, I thought that a language which emphasizes currying really ought be good at partial evaluation. In this post I describe some ideas regarding partial evaluation in functional languages, and later sketch a partial evaluation machine I devised.

Supercombinator reduction machines, like GHC, do a limited form of partial evaluation. I.e. when you compile a program to supercombinators, you are optimizing it for specialization from left to right. So if f is defined in pointful form, “let a = f x in (a z, a z)” might be a better program than “(f x z, f x z)”. This is a nice property of combinator reduction. Unfortunately, it doesn’t generalize: “let a = flip f x in (a y, a y)” will never be better than “(f y x, f y x)”, because functions only specialize from left to right. I conjecture that this missing feature is more important than we realize.

Mogensen gives a very elegant partial evaluator in pure lambda calculus, which optimize as expected with the Futamura projections (see Dan’s post). This partial evaluator works on higher order abstract syntax, taking and returning descriptions of terms rather than the terms themselves. Essentially all it is is (very simple) machinery describing how to evaluate under a lambda.

The system in that paper takes many precautions to avoid unfolding dynamic arguments, because otherwise the partial evaluator might not terminate. Apparently he is not well-versed in our Haskell School of the Infinite, because the evaluator is compositional. So what he means by “not terminate” is “return an infinite program”. But an infinite program is fine if you interpret/compile it lazily!

In fact, I believe (I am intuiting — I have done no formal checking) that the simplest-minded of lazy partial evaluators is perfect: it maximally unfolds its static arguments, there is no need for the type inference machinery in that paper, and it will have the same termination properties as the program. I attribute the ease of this task with the built-in metacircularity of lambda calculus.

Cool as a self-embedded partial evaluator is, to do efficient partial evaluation you need to keep quotations of your programs everywhere, then compile them before you actually use them. Lambda calculus is hinting at something more: that simply by applying one of several arguments to a curried function, you are specializing it. Wouldn’t it be great if every time you did that, the program were maximally specialized automatically?

### A partial evaluation reduction strategy

It turns out that such an ambitious wish is nothing more than an evaluation order for the lambda calculus. Admittedly, it’s not a very common one. You try to reduce the leftmost application, even under lambdas. We would also like to combine this with call-by-need, so when an argument is duplicated it is only reduced once.

Here’s an example program I’ve been working with, with the standard definitions of the combinators Ix = x and Kxy = x.

flip (\x y. y I x) K K

It’s not much, but it gets the point across. Let’s look at it in call-by-name:

[1] (\f a b. f b a) (\x y. y I x) K K
[2] (\a b. (\x y. y I x) b a) K K
[3] (\b. (\x y. y I x) b K) K
[4] (\x y. y I x) K K
[5] (\y. y I K) K
[6] K I K
[6'] (\m n. m) I K
[7] (\n. I) K
[8] I

Notice by the step [4] that we have lost the structure of `flip (\x y. y I x) K`, so any work we do from then on we will have to redo on subsequent applications of that function. Contrast this with the partial evaluation strategy:

[1] (\f a b. f b a) (\x y. y I x) K K
[2] (\a b. (\x y. y I x) b a) K K
[3] (\a b. (\y. y I b) a) K K
[4] (\a b. a I b) K K
[5] (\b. K I b) K
[5'] (\b. (\m n. m) I b) K
[6] (\b. (\n. I) b) K
[7] (\b. I) K
[8] I

We got the function all the way down to a constant function before it was finally applied.

One thing that’s interesting to notice about this strategy is that it *seems* stricter than call-by-name. That is, if you have a nonterminating term N, then reducing the application (\x y. Ny) z will loop, whereas it won’t in CBN. However, recall that in the domain theory, (\x. ⊥) = ⊥. The only thing you can do to a function to observe it is apply it, and whenever you apply this function you will loop. So you are bound to loop anyway if you are evaluating this application.

### The lazy partial evaluation machine (sketch)

Here is a sketch of an efficient low-level machine for this evaluation strategy. It is simple stack machine code (“which I pull out of my bum for now, but I don’t think an algorithm to generate it will be any trouble”). The only tricky bit about it is that pieces are sometimes removed from the middle of the stack, so it can’t necessarily be represented linearly in memory. A singly linked representation should work (I realize this costs performance).

The values in this language look like [v1,v2,…] { instr1 ; instr2 ; … }. They are closures, where the vn are pushed on to the stack in that order before the instructions are executed. Values can also be “indirections”, which point to an absolute position in the stack. Indirections represent a logical reordering of the stack, and are used to model bound variables. When indirections are executed, they remove themselves and execute (and remove) the thing they point to. The instructions are as follows.

pop n -- delete the entry at position n
dup n -- push the entry at position n on the top of the stack
(other standard stack ops here)
abs n -- push an indirection pointing to position n
-- (skipping other indirections) on top of the stack
exec n -- pop and execute the closure at position n
closure [n1,n2,...] { instr1 ; instr2 }
-- remove positions n1,n2,... and add them
-- to the closure with the given code, and push it

`dup` is the only instruction which duplicates a value; this is where laziness will be encoded.

Let’s look at the code for each piece of our example program:

I: [] { exec 0 }
K: [] { pop 1; exec 0 }

These are pretty straightforward. They both receive their arguments on the stack (the first argument in position 0, and downwards from there), reduce and continue execution. Recall the code is what happens when a value is forced, so every value ends with an exec to continue the execution.

(\x y. y I x): [] { push I; exec 2 }

This one might be a little tricker. The stack comes in like this (position 0 on the left): `x y`, After push I, it looks like “I x y”, so y’s incoming stack will be “I x” just as it wants. Finally, the interesting one:

(\f a b. f b a): [] { abs 2; exec 2 }

abs 2 pushes an indirection to the argument b onto the stack before calling f. This is how evaluation is pulled under binders; when an indirection is forced, it reduces the application at the appropriate level, and all below it. I am still not totally sure when to introduce an abs; my guess is you do it whenever a function would otherwise reorder its arguments. An example may demonstrate what I mean (but perhaps not; I haven’t totally wrapped my head around it myself).

Here’s an execution trace for the example above, with the stack growing to the left. I start with the instruction pointer on the first function and the three arguments on the stack. The stack shown is the state *before* the instruction on each row. An identifier followed by a colon marks a position in the stack for indirections:

1 |
`abs 1` |
`(\x y. y I x) K K` |

2 |
`abs 3` |
`a (\x y. y I x) a:K K` |

3 |
`exec 2` |
`b a (\x y. y I x) a:K b:K` |

4 |
`closure I` |
`b a a:K b:K` |

5 |
`exec 2` |
`I b a a:K b:K` |

6 |
`pop 1` |
`I b b:K` |

7 |
`exec 0` |
`I` |

When an indirection is executed, as in step 5, that is evaluation under a lambda.

This machine still doesn’t support laziness (though it didn’t matter in this example). We can achieve laziness by allocating a thunk when we dup. To evaluate the thunk, we put a mark on the stack. Whenever an instruction tries to reach beyond the mark, we capture the current stack and instruction pointer and jam it in a closure, then write that closure to the thunk. Indirections get replaced by their offsets; i.e. the “abs” commands that would create them. After we have done that, remove the mark (point it to where it was previously) and continue where we left off.

There you have it: my nifty partial evaluation machine. I’m reasonably confident that it’s correct, but I’m still not totally happy with the implementation of indirections — mostly the fact that you have to skip other indirections when you are pushing them. I wonder if there is a better way to get the same effect.

Comments/criticisms requested! :-)