Inspired by Dan Piponi’s latest post, I have been looking into partial evaluation. In particular, I thought that a language which emphasizes currying really ought be good at partial evaluation. In this post I describe some ideas regarding partial evaluation in functional languages, and later sketch a partial evaluation machine I devised.

Supercombinator reduction machines, like GHC, do a limited form of partial evaluation. I.e. when you compile a program to supercombinators, you are optimizing it for specialization from left to right. So if f is defined in pointful form, “let a = f x in (a z, a z)” might be a better program than “(f x z, f x z)”. This is a nice property of combinator reduction. Unfortunately, it doesn’t generalize: “let a = flip f x in (a y, a y)” will never be better than “(f y x, f y x)”, because functions only specialize from left to right. I conjecture that this missing feature is more important than we realize.

Mogensen gives a very elegant partial evaluator in pure lambda calculus, which optimize as expected with the Futamura projections (see Dan’s post). This partial evaluator works on higher order abstract syntax, taking and returning descriptions of terms rather than the terms themselves. Essentially all it is is (very simple) machinery describing how to evaluate under a lambda.

The system in that paper takes many precautions to avoid unfolding dynamic arguments, because otherwise the partial evaluator might not terminate. Apparently he is not well-versed in our Haskell School of the Infinite, because the evaluator is compositional. So what he means by “not terminate” is “return an infinite program”. But an infinite program is fine if you interpret/compile it lazily!

In fact, I believe (I am intuiting — I have done no formal checking) that the simplest-minded of lazy partial evaluators is perfect: it maximally unfolds its static arguments, there is no need for the type inference machinery in that paper, and it will have the same termination properties as the program. I attribute the ease of this task with the built-in metacircularity of lambda calculus.

Cool as a self-embedded partial evaluator is, to do efficient partial evaluation you need to keep quotations of your programs everywhere, then compile them before you actually use them. Lambda calculus is hinting at something more: that simply by applying one of several arguments to a curried function, you are specializing it. Wouldn’t it be great if every time you did that, the program were maximally specialized automatically?

### A partial evaluation reduction strategy

It turns out that such an ambitious wish is nothing more than an evaluation order for the lambda calculus. Admittedly, it’s not a very common one. You try to reduce the leftmost application, even under lambdas. We would also like to combine this with call-by-need, so when an argument is duplicated it is only reduced once.

Here’s an example program I’ve been working with, with the standard definitions of the combinators Ix = x and Kxy = x.

flip (\x y. y I x) K K

It’s not much, but it gets the point across. Let’s look at it in call-by-name:

[1] (\f a b. f b a) (\x y. y I x) K K [2] (\a b. (\x y. y I x) b a) K K [3] (\b. (\x y. y I x) b K) K [4] (\x y. y I x) K K [5] (\y. y I K) K [6] K I K [6'] (\m n. m) I K [7] (\n. I) K [8] I

Notice by the step [4] that we have lost the structure of `flip (\x y. y I x) K`, so any work we do from then on we will have to redo on subsequent applications of that function. Contrast this with the partial evaluation strategy:

[1] (\f a b. f b a) (\x y. y I x) K K [2] (\a b. (\x y. y I x) b a) K K [3] (\a b. (\y. y I b) a) K K [4] (\a b. a I b) K K [5] (\b. K I b) K [5'] (\b. (\m n. m) I b) K [6] (\b. (\n. I) b) K [7] (\b. I) K [8] I

We got the function all the way down to a constant function before it was finally applied.

One thing that’s interesting to notice about this strategy is that it *seems* stricter than call-by-name. That is, if you have a nonterminating term N, then reducing the application (\x y. Ny) z will loop, whereas it won’t in CBN. However, recall that in the domain theory, (\x. ⊥) = ⊥. The only thing you can do to a function to observe it is apply it, and whenever you apply this function you will loop. So you are bound to loop anyway if you are evaluating this application.

### The lazy partial evaluation machine (sketch)

Here is a sketch of an efficient low-level machine for this evaluation strategy. It is simple stack machine code (“which I pull out of my bum for now, but I don’t think an algorithm to generate it will be any trouble”). The only tricky bit about it is that pieces are sometimes removed from the middle of the stack, so it can’t necessarily be represented linearly in memory. A singly linked representation should work (I realize this costs performance).

The values in this language look like [v1,v2,…] { instr1 ; instr2 ; … }. They are closures, where the vn are pushed on to the stack in that order before the instructions are executed. Values can also be “indirections”, which point to an absolute position in the stack. Indirections represent a logical reordering of the stack, and are used to model bound variables. When indirections are executed, they remove themselves and execute (and remove) the thing they point to. The instructions are as follows.

pop n -- delete the entry at position n dup n -- push the entry at position n on the top of the stack (other standard stack ops here) abs n -- push an indirection pointing to position n -- (skipping other indirections) on top of the stack exec n -- pop and execute the closure at position n closure [n1,n2,...] { instr1 ; instr2 } -- remove positions n1,n2,... and add them -- to the closure with the given code, and push it

`dup` is the only instruction which duplicates a value; this is where laziness will be encoded.

Let’s look at the code for each piece of our example program:

I: [] { exec 0 } K: [] { pop 1; exec 0 }

These are pretty straightforward. They both receive their arguments on the stack (the first argument in position 0, and downwards from there), reduce and continue execution. Recall the code is what happens when a value is forced, so every value ends with an exec to continue the execution.

(\x y. y I x): [] { push I; exec 2 }

This one might be a little tricker. The stack comes in like this (position 0 on the left): `x y`, After push I, it looks like “I x y”, so y’s incoming stack will be “I x” just as it wants. Finally, the interesting one:

(\f a b. f b a): [] { abs 2; exec 2 }

abs 2 pushes an indirection to the argument b onto the stack before calling f. This is how evaluation is pulled under binders; when an indirection is forced, it reduces the application at the appropriate level, and all below it. I am still not totally sure when to introduce an abs; my guess is you do it whenever a function would otherwise reorder its arguments. An example may demonstrate what I mean (but perhaps not; I haven’t totally wrapped my head around it myself).

Here’s an execution trace for the example above, with the stack growing to the left. I start with the instruction pointer on the first function and the three arguments on the stack. The stack shown is the state *before* the instruction on each row. An identifier followed by a colon marks a position in the stack for indirections:

1 | abs 1 |
(\x y. y I x) K K |

2 | abs 3 |
a (\x y. y I x) a:K K |

3 | exec 2 |
b a (\x y. y I x) a:K b:K |

4 | closure I |
b a a:K b:K |

5 | exec 2 |
I b a a:K b:K |

6 | pop 1 |
I b b:K |

7 | exec 0 |
I |

When an indirection is executed, as in step 5, that is evaluation under a lambda.

This machine still doesn’t support laziness (though it didn’t matter in this example). We can achieve laziness by allocating a thunk when we dup. To evaluate the thunk, we put a mark on the stack. Whenever an instruction tries to reach beyond the mark, we capture the current stack and instruction pointer and jam it in a closure, then write that closure to the thunk. Indirections get replaced by their offsets; i.e. the “abs” commands that would create them. After we have done that, remove the mark (point it to where it was previously) and continue where we left off.

There you have it: my nifty partial evaluation machine. I’m reasonably confident that it’s correct, but I’m still not totally happy with the implementation of indirections — mostly the fact that you have to skip other indirections when you are pushing them. I wonder if there is a better way to get the same effect.

Comments/criticisms requested! :-)

Very cool post. I’m curious what happens when you through recursion into the mix. Does y-combinator-based recursion automatically specialize down into a closure calling itself?

Thank you, and excellent question! Trying to answer actually helped me correct an error in the machine (or rather in my understanding of it). The indirection rules are not completely specified in this post, and I’m still figuring out the details by trial runs on my whiteboard.

Anyway, the answer is: not really. There is a layer of quotation in between. Here is the resulting closure on a test program I did:

[[*] { dup 0 ; exec 0 } F] { exec 2 }

Where:

* = [[#0] { dup 0 ; exec 0 } F] { exec 2 }

I haven’t yet described the #0 notation; it is my best alternative candidate for indirection as described here. Anyway, it refers to the thing on the top of the stack when the (outer) closure is executed.

It looks like it is just (\x. F (x x)) (\x. F (x x)), a fairly lame specialization.

However, I could see how maybe a self-calling closure might be created, if a different alternative to indirection were used. That’s a good case study, thanks :-)

All interesting partial evaluators seem to use “The Trick” (i.e. eta expansion) somewhere to get specialization going. How does that work in your machine?

Well, I’m arrogant and lazy, so I don’t really know how other partial evaluators work :-). But my guess is that it happens during beta reduction.

For example, if you have the expression (.) f g, it can’t do anything until you actually — eventually — apply it an argument x. At that point it reduces

(\f g x. f (g x)) f g x

(\g x. f (g x)) g x

(\x. f (g x)) x

And here, before the x is applied, (.) has been eta expanded before our eyes and is partially evaluating itself (because it must, before letting the x in). And right before the x is pulled in, it writes out its thunk so that others can benefit.

I hate to be so picky, but why mix the usual lambda notation (using ‘.’) and Haskell’s backslash notation?

Getting the y-combinator case right seems fairly interesting, since I don’t think any other variants of the lambda calculus do direct recursive calls. Maybe you’re close to the fastest version of the lambda calculus ever? :)

Cool, but this problem is messier than it appears. Have a look at Mike Thyer’s Lambda Animator:

http://thyer.name/lambda-animator/

and his thesis “Lazy Specialization” can be downloaded from the same site.

Geoffrey: some variants of typical lambda evaluators such as the SECD machine do have “direct” recursion instead of using Y. IIRC this is a special case rather than falling out naturally, though.

Max: Yeah, that comment was poorly worded. Having direct recursion as a primitive is easy; the interesting part is being able to recover it automatically from a Y-combinator call (since then the efficiency would hopefully generalize to other higher order idioms).

It’s also possible that the completely lazy evaluations strategies mentioned on the Lambda Animator page do this, though I haven’t read through them yet. Thanks for the link, Paul!

You really make it seem so easy with your presentation but I find this topic to be really something which I think I would never understand. It seems too complicated and very broad for me. I am looking forward for your next post, I will try to get the hang of it!

Awesome Sharing. Hope to read more from you!