Monthly Archives: February 2009

RTS Research

Over the past week, I have been again exploring low-level functional systems, in which to write an RTS (notably a garbage collector) for high-level languages. The progress has been slim, but I’ll report on my findings.

I get the hunch that linear lambda calculus is unsuitable, but I haven’t yet convinced myself of that and it’s the closest I’ve got so far. So I have been trying to implement a minimal LLC runtime in C. Apparently Haskell has fried my brain and I can no longer write in C without segfaulting up the wazoo. Or maybe that’s what writing in C is like and I had just forgotten :-).

Said runtime is more complex than I’d like it to be. It is a fairly involved pattern-matching graph reduction machine. It needs a dedicated memory pool to allocate from and deallocate to, holding the graph currently under consideration (but when things fall off the graph, they are freed immediately, which is its charm). I was hoping that it would run in constant memory, but I don’t think that’s possible, so this is the best I could hope for (in some sense, this memory pool is the “stack”). I would like to find a more direct translation, say into a (parallel) sequence of instructions. But that seems to conflict with the graph reduction properties that I like, namely that code can be “compiled” on the fly.

I wonder if there is an incarnation which can generate short bursts of assembly and run them, thus taking the best of both worlds.

Still, I’m not sure it’s possible to implement an RTS in linear lambda calculus, so I may just be barking up a ditch.

Jeremy Siek pointed out that copying collectors are simple and composable; i.e. all they do is copy a root set into contiguous memory and throw away the rest. A copier for a composition is a composition of copiers. This is very promising, and is what I’m shooting for.

He also pointed me at the Categorical Abstract Machine (sorry I couldn’t find any public links), which I guess is fairly popular, and definitely a clever little thing. But alas, it appears to need garbage collection, so I’m not focusing on that level yet.

The boulder functional programmers group is doing a two-session focus on CUDA, and in particular next meeting we will be talking about functional high-level (at least in comparison to C for CUDA) specification of parallel algorithms for CUDA. I’m hoping that what I find for the RTS language will also be applicable to this task.

Until next time.

Dependent types are ridiculously easy

After an inquiry about a combinatory calculus for dependent types, somebody in #haskell — unfortunately I don’t remember who — recommended I look at Illative Combinatory Logic. After studying the paper a bit, all I can say is Wow! It implements a (conjecturally) complete logical system in but three simple typing rules, together with the untyped lambda calculus conversion rules. I was able to write a dependent typechecker in 100 lines.

The specific system I’m working with in the paper is called system IG. The paper also introduces three weaker systems, which are also very nice and worth understanding.

As an example, here is a derivation of the identity function and its type. This is a backward proof, so each step is implied by the step after it. But reading it this way makes the algorithm clear.

    |- G L (\A. G A (\_. A)) (\A x. x)
    L a |- (\A. G A (\_. A)) a ((\A x. x) a)     (Gi)
    L a |- G a (\_. a) ((\A x. x) a)             (beta)
    L a |- G a (\_. a) (\x. x)                   (beta)
    L a, a b |- (\_. a) b ((\x. x) b)            (Gi)
    L a, a b |- a ((\x. x) b)                    (beta)
    L a, a b |- a b                              (beta)

The trick, which isn’t so much a trick as a beautiful deep connection I gather (although I don’t fully grok it), is that typing propositions are just applications. So if A is a type, then A b is the proposition “b has type A”. L is the type of types, and G is essentially Π (the product type constructor). So, for example, the fourth line in the above proof would be written as follows in a conventional system:

    a : Type  |-  (\x. x) : a -> a

I need to extend it with finite types and boxes, but I really don’t want to because it’s so pretty as is! :-)

Dana update: Core execution language, dependent combinators

It’s been a little while since I gave an update about Dana, so here it is.

There has been very little code progress in the last week. Instead, because of the self-embedding issues I talked about last time, I have been exploring the use of a combinator calculus as an intermediate stage for typechecking. I’m on to something, though nothing is terribly concrete yet. The combinator basis I’m using is:

    Sxyz = xz(yz)
    Ktxy = x    -- with the constraint (y:t)
    Bxyz = x(yz)
    Cxyz = xzy
    Itx = x   -- with the constraint (x:t)

Along with primitives for Pi, Arrow (which is always expressible as Pi, but this keeps term size down), and Type.

The point of the constraints is to provide type annotations. I added a type annotation to every combinator which “erases” a lambda (rather than just pushing it downward) during factorization, because when a lambda is erased so is its annotation, so this puts it back.

My goal is for the typechecker not to use any supply of unique names.

Checking is not totally straightforward. Notice how terms like S have no unique most general type; they would have to take some type arguments to do that. So my current lead is that the type of a combinator term t is either a “real” type, or it’s a function which takes a term x and returns the type of t x (or another such function, etc.). This algorithm will be partial; i.e. there will be cases where I require an argument to have an actual type rather than a function. This is because attempting the “total” algorithm is undecidable, so it’s partial anyway.

The other thing I’ve been thinking about is a simple, composable “RTS language”, which does explicit memory management in some form, in which I could implement a garbage collector, etc. This has not been as fruitful as my combinator explorations. I’ve been looking at various forms of linear lambda calculus, which has been neat and enlightening — for example, there is a sense in which all computable functions are differentiable using LLC, which I might blog about soon — but unfortunately it does not look very promising as an RTS language. Any suggestions?

And today I started thinking about what I will need to write my higher-level languages, in particular PureHaskell (Haskell98 sans IO and FFI). It’s pretty annoying that I can’t use any Haskell libraries which use tricky GHC extensions, because those kill my bootstrap (i.e., my PureHaskell compiler needs to be in PureHaskell). There are a lot of libraries that use extensions. For once I wish the flagship compiler weren’t so capable!

Parallel Rewrite System

There was a recent question on Haskell-cafe about how to parallelize a simple “rule engine”. I spent a while investigating it, and the solution is both pretty and fast. And my response on haskell-cafe is really messy because gmail is silly. So here is my response, cleaned up, in literate Haskell:

> {-# LANGUAGE RankNTypes #-}
> import qualified Data.MemoCombinators as Memo
> import qualified Data.Set as Set
> import Control.Parallel (par)
> import qualified Control.Parallel.Strategies as Par
> import Data.Monoid (Monoid(..))
> import Control.Monad.State
> import qualified Data.DList as DList

First, I want to capture the idea of a generative set like you’re doing. GenSet is like a set, with the constructor “genset x xs” which says “if x is in the set, then so are xs”.

I’ll represent it as a stateful computation of the list of things in the set, threading the set of things we’ve seen so far. It’s redundant information, but sets can’t be consumed lazily, thus the list (the set will follow along lazily :-).

Remember that State s a is just the function (s -> (s,a)). So we’re taking the set of things we’ve seen so far, and returning the new elements added and the set unioned with those elements.

> newtype GenSet a 
>       = GenSet (State (Set.Set a) (DList.DList a))
> genset :: (Ord a) => a -> GenSet a -> GenSet a
> genset x (GenSet f) = GenSet $ do
>     seen <- gets (x `Set.member`)
>     if seen
>         then return mempty
>         else fmap (DList.cons x) $ 
>                    modify (Set.insert x) >> f
> toList :: GenSet a -> [a]
> toList (GenSet f) = DList.toList $ evalState f Set.empty

GenSet is a monoid, where mappend is just union.

> instance (Ord a) => Monoid (GenSet a) where
>     mempty = GenSet (return mempty)
>     mappend (GenSet a) (GenSet b) = 
>                  GenSet (liftM2 mappend a b)

Okay, so that’s how we avoid exponential behavior when traversing the tree. We can now just toss around GenSets like they’re sets and everything will be peachy.

Here’s the heart of the algorithm: the reduce function. To avoid recomputation of rules, we could just memoize the rule function. But we’ll do something a little more clever. The function we’ll memoize (“parf”) first sparks a thread computing its *last* child. Because the search is depth-first, it will typically be a while until we get to the last one, so we benefit from the spark (you don’t want to spark a thread computing something you’re about to compute anyway).

> reduce :: (Ord a) => Memo.Memo a -> (a -> [a]) -> a -> [a]
> reduce memo f x = toList (makeSet x)
>     where
>     makeSet x = genset x . mconcat . map makeSet . f' $ x
>     f' = memo parf
>     parf a = let ch = f a in 
>              ch `seq` (f' (last ch) `par` ch)

The ch `seq` is there so that the evaluation of ch and last ch aren’t competing with each other.

Your example had a few problems. You said the rule was supposed to be expensive, but yours was cheap. Also, [x-1,x-2,x-3] are all very near each other, so it’s hard to go do unrelated stuff. I made a fake expensive function before computing the neighbors, and tossed around some prime numbers to scatter the space more.

> rule :: Int -> [Int]
> rule n = expensive `seq` 
>            [next 311 4, next 109 577, next 919 353]
>     where
>     next x y = (x * n + y) `mod` 5000
>     expensive = sum [1..50*n]
> main :: IO ()
> main = do
>     let r = reduce Memo.integral rule 1
>     print (length r)

The results are quite promising:

% ghc --make -O2 rules2 -threaded
% time ./rules2
./rules2  13.25s user 0.08s system 99% cpu 13.396 total
% time ./rules2 +RTS -N2
./rules2 +RTS -N2  12.52s user 0.30s system 159% cpu 8.015 total

That’s 40% decrease in running time! Woot! I’d love to see what it does on a machine with more than 2 cores.

Dana needs a self-embedding of U-Box

My first major goal in Dana is to write a console session interpreter, in which you can construct, name, compose, and observe Dana objects in the console. Since this is a foundational project, I best say what it is I mean by “console session”:

    newtype ConsoleSession = ConsoleSession (String -> (String, ConsoleSession))

In layman’s terms: a console session is a stateful function from input lines to output lines. Then the “hardware” will observe this by allowing you to type things in the console and showing you what the function returned.

Let’s say that this console session has a dictionary from strings to values and their types, and the user tries to apply one of those values to another. This might be illegal, since the types might not match up. Everything has to be well-typed, so we can’t just “try it and see”. If the Type of types is exposed (I’m not sure if it should be — it’s a question of whether the class of types is extensible), it looks like this:

    data Type where
        Type :: Type
        Pi :: (x:Type) -> (x -> Type) -> Type

With some other stuff — those are the two most important. Notice that Pi has a function in it, so we cannot compare two Types while maintaining referential transparency and functional extensionality. These properties are very dear to me, so Dana had better have them too!

So how can we tell whether this application is well-typed? It doesn’t suffice to keep the types themselves in our dictionary; we need to keep quotations of them with decidable equality. I.e. we have to have:

    TypeRep : Type
    embed : TypeRep -> Type
    equal : TypeRep -> TypeRep -> Bool

But then, when it is well-typed, we need to actually perform the application, which we can only do if it’s well typed. Bool is not a good enough return type for equal:

    equal : (x:TypeRep) -> (y:TypeRep) -> Eq Type (embed x) (embed y)

Where Eq is the usual:

    Eq : (A:Type) -> A -> A -> Type
    Eq A x y = (P : A -> Type) -> P x -> P y

Then, if libraries wish to be accessed via the user, they should export TypeReps for every Type they export. For example:

    Export : Type -> Type
    Export T = (rep : TypeRep ; Eq Type (embed rep) T)

    Integer : Type
    IntQuote : Export Integer

I suspect that such a quotation scheme will be quite a challenge to create (it also corresponds to U-Box proving its own consistency, thus U-Box is inconsistent — but we already knew that :-). It could be fun to attempt if I’m in the mood for tricky mathematics. If I’m not in such a mood, I guess I could also cheat and provide TypeRep, embed, and equal as primitives, which might be the way to go just to get things rolling.

The necessity of quotation is the main reason that the core U-Box calculus should be as simple as possible, and most of my effort in this early stage should go toward simplifying it. Is there a combinator calculus for dependent types? That would make things easier.

Dana update: System U-Box compiler is go

Over the past couple days, I have been working on a compiler for something between System U and ΠΣ, as experimentation for Dana’s core language. Here is the repo. I have just finished (I think — seems to work pretty well) the typechecker and it correctly typechecks all the programs I can think of. It is 339 lines: not beautiful and simple, but pretty nice nonetheless (and I haven’t given any thought to refactoring it yet).

The results are both positive and negative. For one, I am very happy with it, since unlike the PiSigma prototype I have only observed it die when there is actually a type error. In particular, unlike PiSigma, it correctly types this fixpoint cominator:

    fix : (A:Type) -> (A -> A) -> A
        = \A:Type. \f:A->A. let x:A = f x in x

I’m very pleased with the “box” dynamic introduced by the PiSigma paper; it seems to model the difference between data and codata in a nice uniform way. Boxes are in some way just explicit laziness.

Using the above fix, we can model equi-recursive types, by introducing a Box function, which “hides” its argument under a box.

    Box : Type -> Type
        = \A:Type. (tag:1) -> !case tag of { [A] }

(Numbers are finite types; eg. 3 has elements (0:3), (1:3), and (2:3), and case elimination is done by listing 3 cases in order). Then, for example, a Stream type looks like:

    Stream : Type -> Type = \A:Type. fix Type (\S:Type. Pair A (Box S))

We had to hide the Stream in a box, otherwise we would end up with an infinite type at typechecking time. A high priority on my list is to characterize the semantics of boxes (hopefully they are sensible), to explain more deeply why you must box this infinite type.

Also, a few days ago I had “unsafe compilation” working, which compiled ASTs into a very fast “native Haskell” representation, as described in The Monad.Reader issue 10. It takes advantage of the typechecker, compiling into untyped combinators with unsafeCoerce everywhere, for proper type erasure. I have since added features to the AST, so it’s broken, but it shouldn’t be hard to revive.

So, that’s the good. I’m convinced this “ΠΣ sans Σ” calculus is powerful and expressive, and will work just fine for Dana. I would like it to be simpler — the AST has 12 cases! — but as far as I can tell the only thing that can reasonably be removed is letrec (replaced with a fixpoint primitive).

The bad is that my typechecker is fully and earnestly a typechecker — it doesn’t even feign interest in inference — and it is quite annoying writing all those type annotations freaking everywhere! In particular, case expressions need to be annotated with their eliminator scheme; so the case tag of { [A] } above was a fib; I actually need to write:

    case tag => \tag:1. $Type of { [A] }

So my compiler as-is, unfortunately, can’t reasonably function as a proper bootstrapping language, because I would never write anything in it.

I’m not quite sure what I will do about bootstrapping now. Perhaps I can translate Yhc.Core to, um, let’s call it “System U-Box”, and then bootstrap in Haskell. That would be cool. But I foresee problems when I need to start writing user interface systems, when we need to do deep embeddings of U-Box into itself… which I will write about shortly.