Tag Archives: algorithms

DI Breakdown

I’m having a philosophical breakdown of the software engineering variety. I’m writing a register allocation library for my current project at work, referencing a not-too-complex algorithm which, however, has many degrees of freedom.  Throughout the paper they talk about making various modifications to achieve different effects — tying variables to specific registers, brazenly pretending that a node is colorable when it looks like it isn’t (because it might work out in its favor), heuristics for choosing which nodes to simplify first, categorizing all the move instructions in the program to select from a smart, small set when the time comes to try to eliminate them.  I’m trying to characterize the algorithm so that those different selections can be made easily, and it is a wonderful puzzle.

I also feel aesthetically stuck.   I am feeling too many choices in Haskell — do I take this option as a parameter, or do I stuff it in a reader monad?  Similarly, do I characterize this computation as living in the Cont monad, or do I simply take a continuation as a parameter?  When expressing a piece of a computation, do I return the “simplest” type which provides the necessary data, do I return a functor which informs how the piece is to be used, or do I just go ahead and traverse the final data structure right there?  What if the simplest type that gives the necessary information is vacuous, and all the information is in how it is used?

You might be thinking to yourself, “yes, Luke, you are just designing software.”  But it feels more arbitrary than that — I have everything I want to say and I know how it fits together.  My physics professor always used to say “now we have to make a choice — which is bad, because we’re about to make the wrong one”.  He would manipulate the problem until every decision was forced.  I need a way to constrain my decisions, to find what might be seen as the unique most general type of each piece of this algorithm.  There are too many ways to say everything.

Searchable Data Types

A few years ago, Martín Escardó wrote an article about a seemingly-impossible program that can exhaustively search the uncountably infinite "Cantor space" (infinite streams of bits). He then showed that spaces that can be thus searched form a monad (which I threw onto hackage), and wrote a paper about the mathematical foundations which is seemingly impossible to understand.

Anyway, I thought I would give a different perspective on what is going on here. This is a more operational account, however some of the underlying topology might peek out for a few seconds. I’m no topologist, so it will be brief and possibly incorrect.

Let’s start with a simpler infinite space to search. The lazy naturals, or the one-point compactification of the naturals:

data Nat = Zero | Succ Nat
    deriving (Eq,Ord,Show)
infinity = Succ infinity

Let’s partially instantiate Num just so we have something to play with.

instance Num Nat where
    Zero   + y = y
    Succ x + y = Succ (x + y)
    Zero   * y = Zero
    Succ x * y = y + (x * y)
    fromInteger 0 = Zero
    fromInteger n = Succ (fromInteger (n-1))

We wish to construct this function:

search :: (Nat -> Bool) -> Maybe Nat

Which returns a Nat satisfying the criterion if there is one, otherwise Nothing. We assume that the given criterion is total, that is, it always returns, even if it is given infinity. We’re not trying to solve the halting problem. :-)

Let’s try to write this in direct style:

search f | f Zero    = Just Zero
         | otherwise = Succ <$> search (f . Succ)

That is, if the predicate worked for Zero, then Zero is our guy. Otherwise, see if there is an x such that f (Succ x) matches the predicate, and if so, return Succ x. Make sense?

And it seems to work.

ghci> search (\x -> x + 1 == 2)
Just (Succ Zero)
ghci> search (\x -> x*x == 16)
Just (Succ (Succ (Succ (Succ Zero))))

Er, almost.

ghci> search (\x -> x*x == 15)
(infinite loop)

We want it to return Nothing in this last case. It’s no surprise that it didn’t — there is no condition under which search is capable of returning Nothing, that definition would pass if Maybe were defined data Maybe a = Just a.

It is not at all clear that it is even possible to get what we want. But one of Escardó’s insights showed that it is: make a variant of search that is allowed to lie.

-- lyingSearch f returns a Nat n such that f n, but if there is none, then
-- it returns a Nat anyway.
lyingSearch :: (Nat -> Bool) -> Nat
lyingSearch f | f Zero = Zero
              | otherwise = Succ (lyingSearch (f . Succ))

And then we can define our regular search in terms of it:

search' f | f possibleMatch = Just possibleMatch
          | otherwise       = Nothing
    where
    possibleMatch = lyingSearch f

Let’s try.

ghci> search' (\x -> x*x == 16)   -- as before
Just (Succ (Succ (Succ (Succ Zero))))
ghci> search' (\x -> x*x == 15)
Nothing

Woah! How the heck did it know that? Let’s see what happened.

let f = \x -> x*x == 15
lyingSearch f
0*0 /= 15
Succ (lyingSearch (f . Succ))
1*1 /= 15
Succ (Succ (lyingSearch (f . Succ . Succ)))
2*2 /= 15
...

That condition is never going to pass, so lyingSearch going to keep taking the Succ branch forever, thus returning infinity. Inspection of the definition of * reveals that infinity * infinity = infinity, and infinity differs from 15 once you peel off 15 Succs, thus f infinity = False.

With this example in mind, the correctness of search' is fairly apparent. Exercise for the readers who are smarter than me: prove it formally.

Since a proper Maybe-returning search is trivial to construct given one of these lying functions, the question becomes: for which data types can we implement a lying search function? It is a challenging but approachable exercise to show that it can be done for every recursive polynomial type, and I recommend it to anyone interested in the subject.

Hint: begin with a Search data type:

newtype Search a = S ((a -> Bool) -> a)

Implement its Functor instance, and then implement the following combinators:

searchUnit :: Search ()
searchEither :: Search a -> Search b -> Search (Either a b)
searchPair :: Search a -> Search b -> Search (a,b)
newtype Nu f = Roll { unroll :: f (Nu f) }
searchNu :: (forall a. Search a -> Search (f a)) -> Search (Nu f)

More Hint: is searchPair giving you trouble? To construct (a,b), first find a such that there exists a y that makes (a,y) match the predicate. Then construct b using your newly found a.

The aforementioned Cantor space is a recursive polynomial type, so we already have it’s search function.

type Cantor = Nu ((,) Bool)
searchCantor = searchNu (searchPair searchBool)

ghci> take 30 . show $ searchCantor (not . fst . unroll . snd . unroll)
"(True,(False,(True,(True,(True"

We can’t expect to construct a reasonable Search Integer. We could encode in the bits of an Integer the execution trace of a Turing machine, as in the proof of the undecidability of the Post correspondence problem. We could write a total function validTrace :: Integer -> Bool that returns True if and only if the given integer represents a valid trace that ends in a halting state. And we could also write a function initialState :: Integer -> MachineState that extracts the first state of the machine. Then the function \machine -> searchInteger (\x -> initialState x == machine && validTrace x) would solve the halting problem.

The reason this argument doesn’t work for Nat is because the validTrace function would loop on infinity, thus violating the precondition that our predicates must be total.

I hear the following question begging to be explored next: are there any searchable types that are not recursive polynomials?

Did you like this post? Flattr this

Collision Detection with Enneatrees

Many of my games boil down to, at some level, a large collection of circles (or spheres) interacting with each other. I use circles for collision detection, and otherwise whenever I can for organization, because the results look more natural than using boxes. If you organize using boxes, your simulation will tend to “align” to the axes, which nature surely does not do.

So naturally, detecting when two of these circles are touching each other is one of the larger computational problems that my games face. And bewilderingly, I have not found anywhere a good solution that covers circles in a wide variety of sizes. That is what this article is about.

If your circles are all about the same size, and in particular fairly small in comparison to the needed accuracy of the algorithm, the obvious and de-facto solution is to use a quadtree. That is, start with the whole screen and divide it up into 4 regions, and do so recursively until you have a nice spatial arrangement of your objects, categorizing objects by which region their center lies in.

When you have a circle and want to see what other circles are touching it, you just look for other objects in the smallest region. Well, except you are not a point, you are a circle, so you might penetrate the edge of the region, in which case you need to look at the adjacent region as well. Well, except other circles might penetrate the edges of their regions, so you actually need to look at all adjacent regions. Well, except you might be really big, so you need to look in all the regions that you touch. Well, except other objects might be really big, so… you need to look in all the regions.

A quadtree has not bought us anything if circles can be arbitrary sizes; collision detection is still linear. You could say things like the number of very big circles is usually small (because they would collide away their competition), so you can put an upper bound on the size of circles in the tree then just do a linear check on the big ones. But that is an awful hack with a tuning parameter and doesn’t generalize. Shouldn’t the right solution work without hacks like that (for the fellow game developers in my audience: despite your instincts, the answer is yes :-).

I have worked with quadtrees before, and saw this problem a mile away for my game with circles of many different sizes. The first variation I tried was to put circles into regions only if they fit entirely. Circles on the boundaries would be added to the lowest branch that they could fit into. Then to check collisions on an object, you look at where it is in the tree, check all subregions, and then check all superregions (but not the superregions’ subregions). You will get small things in your vicinity on your way down, and large things on your way up. And it’s still logarithmic time… right?

Wrong. Suppose that every object in the world lay on the horizontal and vertical center lines. Even very small objects would accumulate in the top node, and no region subdividing would ever occur. This degenerate situation causes linear behavior, even though just by shifting the center of the region a little bit you get a sane subdividing. It’s not just a “let’s hope that doesn’t happen” thing: there are a lot of bad situations nearby it, too. Any object lying on those center lines will be checked when any other object is testing for its collisions, because it is on the way up. This did, in fact, become a major issue in my 10,000 particle simulation.

But if we could somehow salvage the original idea of this solution without the degenerate case, then we would get logarithmic behavior (when the field size is large relative to the radii of the particles).

To understand this solution, I would like you to turn your attention to one dimensional simulations. Here the previous solution would be subdividing the real line into a binary tree.

But the new solution divides intervals into not two, not four, but three child intervals. However, not in thirds like you would expect. (0,1) is divided into (0,1/2), (1/4, 3/4), and (1/2,1) — the intervals overlap. Any “circle” (in 1-D, so it’s another interval) with diameter less than 1/4 must fit into at least one of these subregions.

We use that property to organize the circles so that their size corresponds to their depth in the tree. If you know the diameter of the circle, you know exactly how deep it will be. So now when we are looking downward, we really are looking for small circles, and when we are looking upward we are looking for large ones, and there is no possibility of an accumulation of small irrelevant circles above us that are really far away.

However, on your way up, you also have to check back down again. If you are in the left portion of the (1/4,3/4) interval, you might intersect something from the right portion of the (0,1/2) interval, so you have to descend into it. But you can prune your search on the way down, cutting out “most” (all but a logarithm) of it. For example, you don’t need to check the interval (1/16,1/8) and its children even though you are checking its parent (0, 1/2) — it is too far away.

When you are deciding which region to put a circle in and you have a choice, i.e. the circle fits in multiple subregions, always pick the center one. When the circles are in motion, this gets rid of the degenerate case where a circle bounces back and forth across a deep, expensive boundary.

If you generalize to 2D, you end up cutting a region into 9 subregions (thus the name enneatree). The way I did it was to alternate cutting horizontally and vertically, so that I didn’t have to have 9-way if-else chains. That had more subtleties than I had expected. I have yet to find a straightforward, elegant implementation.

The algorithm seems to work well — I am able to support 5,000 densely packed circles in C# without optimizations.

The inspiration for this algorithm comes from the field of computable real numbers (infinite — not arbitrary, but infinite precision numbers). You run into trouble if you try to represent computable reals infinite streams of bits, because some interactions might be “non-local”; i.e. you might have to look infinitely far down a stream to find out whether a bit should be 0 or 1. Those guys solve the problem in the same way I did, by dividing intervals into 3 overlapping subintervals, and using this 3-symbol system as their number representation.

I see a glimmer of a connection between the two problems, but I can’t see it formally. Maybe it would become clearer if I considered infinite collections of infinitesimal objects needing collision detection.

Lazy Partial Evaluation

Inspired by Dan Piponi’s latest post, I have been looking into partial evaluation. In particular, I thought that a language which emphasizes currying really ought be good at partial evaluation. In this post I describe some ideas regarding partial evaluation in functional languages, and later sketch a partial evaluation machine I devised.

Supercombinator reduction machines, like GHC, do a limited form of partial evaluation. I.e. when you compile a program to supercombinators, you are optimizing it for specialization from left to right. So if f is defined in pointful form, “let a = f x in (a z, a z)” might be a better program than “(f x z, f x z)”. This is a nice property of combinator reduction. Unfortunately, it doesn’t generalize: “let a = flip f x in (a y, a y)” will never be better than “(f y x, f y x)”, because functions only specialize from left to right. I conjecture that this missing feature is more important than we realize.

Mogensen gives a very elegant partial evaluator in pure lambda calculus, which optimize as expected with the Futamura projections (see Dan’s post). This partial evaluator works on higher order abstract syntax, taking and returning descriptions of terms rather than the terms themselves. Essentially all it is is (very simple) machinery describing how to evaluate under a lambda.

The system in that paper takes many precautions to avoid unfolding dynamic arguments, because otherwise the partial evaluator might not terminate. Apparently he is not well-versed in our Haskell School of the Infinite, because the evaluator is compositional. So what he means by “not terminate” is “return an infinite program”. But an infinite program is fine if you interpret/compile it lazily!

In fact, I believe (I am intuiting — I have done no formal checking) that the simplest-minded of lazy partial evaluators is perfect: it maximally unfolds its static arguments, there is no need for the type inference machinery in that paper, and it will have the same termination properties as the program. I attribute the ease of this task with the built-in metacircularity of lambda calculus.

Cool as a self-embedded partial evaluator is, to do efficient partial evaluation you need to keep quotations of your programs everywhere, then compile them before you actually use them. Lambda calculus is hinting at something more: that simply by applying one of several arguments to a curried function, you are specializing it. Wouldn’t it be great if every time you did that, the program were maximally specialized automatically?

A partial evaluation reduction strategy

It turns out that such an ambitious wish is nothing more than an evaluation order for the lambda calculus. Admittedly, it’s not a very common one. You try to reduce the leftmost application, even under lambdas. We would also like to combine this with call-by-need, so when an argument is duplicated it is only reduced once.

Here’s an example program I’ve been working with, with the standard definitions of the combinators Ix = x and Kxy = x.

  flip (\x y. y I x) K K

It’s not much, but it gets the point across. Let’s look at it in call-by-name:

[1]  (\f a b. f b a) (\x y. y I x) K K
[2]  (\a b. (\x y. y I x) b a) K K
[3]  (\b. (\x y. y I x) b K) K
[4]  (\x y. y I x) K K
[5]  (\y. y I K) K
[6]  K I K
[6'] (\m n. m) I K
[7] (\n. I) K
[8]  I

Notice by the step [4] that we have lost the structure of flip (\x y. y I x) K, so any work we do from then on we will have to redo on subsequent applications of that function. Contrast this with the partial evaluation strategy:

[1]  (\f a b. f b a) (\x y. y I x) K K
[2]  (\a b. (\x y. y I x) b a) K K
[3]  (\a b. (\y. y I b) a) K K
[4]  (\a b. a I b) K K
[5]  (\b. K I b) K
[5'] (\b. (\m n. m) I b) K
[6]  (\b. (\n. I) b) K
[7]  (\b. I) K
[8]  I

We got the function all the way down to a constant function before it was finally applied.

One thing that’s interesting to notice about this strategy is that it seems stricter than call-by-name. That is, if you have a nonterminating term N, then reducing the application (\x y. Ny) z will loop, whereas it won’t in CBN. However, recall that in the domain theory, (\x. ⊥) = ⊥. The only thing you can do to a function to observe it is apply it, and whenever you apply this function you will loop. So you are bound to loop anyway if you are evaluating this application.

The lazy partial evaluation machine (sketch)

Here is a sketch of an efficient low-level machine for this evaluation strategy. It is simple stack machine code (“which I pull out of my bum for now, but I don’t think an algorithm to generate it will be any trouble”). The only tricky bit about it is that pieces are sometimes removed from the middle of the stack, so it can’t necessarily be represented linearly in memory. A singly linked representation should work (I realize this costs performance).

The values in this language look like [v1,v2,…] { instr1 ; instr2 ; … }. They are closures, where the vn are pushed on to the stack in that order before the instructions are executed. Values can also be “indirections”, which point to an absolute position in the stack. Indirections represent a logical reordering of the stack, and are used to model bound variables. When indirections are executed, they remove themselves and execute (and remove) the thing they point to. The instructions are as follows.

  pop n      -- delete the entry at position n
  dup n      -- push the entry at position n on the top of the stack
  (other standard stack ops here)
  abs n      -- push an indirection pointing to position n 
             -- (skipping other indirections) on top of the stack
  exec n     -- pop and execute the closure at position n
  closure [n1,n2,...] { instr1 ; instr2 } 
          -- remove positions n1,n2,... and add them
          -- to the closure with the given code, and push it

dup is the only instruction which duplicates a value; this is where laziness will be encoded.

Let’s look at the code for each piece of our example program:

    I: [] { exec 0 }
    K: [] { pop 1; exec 0 }

These are pretty straightforward. They both receive their arguments on the stack (the first argument in position 0, and downwards from there), reduce and continue execution. Recall the code is what happens when a value is forced, so every value ends with an exec to continue the execution.

    (\x y. y I x): [] { push I; exec 2 }

This one might be a little tricker. The stack comes in like this (position 0 on the left): x y, After push I, it looks like “I x y”, so y’s incoming stack will be “I x” just as it wants. Finally, the interesting one:

   (\f a b. f b a): [] { abs 2; exec 2 }

abs 2 pushes an indirection to the argument b onto the stack before calling f. This is how evaluation is pulled under binders; when an indirection is forced, it reduces the application at the appropriate level, and all below it. I am still not totally sure when to introduce an abs; my guess is you do it whenever a function would otherwise reorder its arguments. An example may demonstrate what I mean (but perhaps not; I haven’t totally wrapped my head around it myself).

Here’s an execution trace for the example above, with the stack growing to the left. I start with the instruction pointer on the first function and the three arguments on the stack. The stack shown is the state before the instruction on each row. An identifier followed by a colon marks a position in the stack for indirections:

1 abs 1 (\x y. y I x) K K
2 abs 3 a (\x y. y I x) a:K K
3 exec 2 b a (\x y. y I x) a:K b:K
4 closure I b a a:K b:K
5 exec 2 I b a a:K b:K
6 pop 1 I b b:K
7 exec 0 I

When an indirection is executed, as in step 5, that is evaluation under a lambda.

This machine still doesn’t support laziness (though it didn’t matter in this example). We can achieve laziness by allocating a thunk when we dup. To evaluate the thunk, we put a mark on the stack. Whenever an instruction tries to reach beyond the mark, we capture the current stack and instruction pointer and jam it in a closure, then write that closure to the thunk. Indirections get replaced by their offsets; i.e. the “abs” commands that would create them. After we have done that, remove the mark (point it to where it was previously) and continue where we left off.

There you have it: my nifty partial evaluation machine. I’m reasonably confident that it’s correct, but I’m still not totally happy with the implementation of indirections — mostly the fact that you have to skip other indirections when you are pushing them. I wonder if there is a better way to get the same effect.

Comments/criticisms requested! :-)

Certificate Design Pattern

When working the latest incarnation of my System IG compiler, I used a thingy which I now realize ought to be characterized as a design pattern. It substantially changed the way I was thinking about the code, which is what makes it interesting.

Summary: separate an algorithm into certificate constructors and a search algorithm.

A large class of algorithms can be considered, in some way, as search algorithms. It is given a problem and searches for a solution to that problem. For example, typically you wouldn’t phrase the quadratic formula as a search algorithm, but it is—it’s just a very smart, fast one. It is given a,b, and c and searches for a solution to the equation ax2 + bx + c = 0.

The certificate design pattern separates the algorithm into two modules: the certificate module and the algorithm. The certificate module provides constructors for solutions to the problem. For each correct solution, it is possible to construct a certificate, and it is impossible to construct a certificate for an incorrect solution. The certificate module for the quadratic formula algorithm might look like this:

module Certificate (Certificate, certify, solution) where

data Certificate = Certificate Double Double Double Double

certify :: Double -> Double -> Double -> Double -> Maybe Certificate
certify a b c x | a*x^2 + b*x + c == 0 = Just (Certificate a b c x)
                | otherwise            = Nothing

solution :: Certificate -> (Double,Double,Double,Double)
solution (Certificate a b c x) = (a,b,c,x)

There is only one way to construct a Certificate, and that is to pass it a solution to the quadratic equation. If it is not actually a solution, a certificate cannot be constructed for it. This module is very easy to verify. The algorithm module is obvious:

module Algorithm (solve) where
import Certificate
import Data.Maybe (fromJust)

solve :: Double -> Double -> Double -> Certificate
solve a b c = fromJust $ certify a b c ((-b + sqrt (b^2 - 4*a*c)) / (2*a))

Here, we use the quadratic formula and construct a certificate of its correctness. If we made a typo in the formula, then certify would return Nothing and we would get an error when we fromJust it (an error is justified in this case, rather than an exception, because we made a mistake when programming — it’s like an assert).

The client to the algorithm gets a certificate back from solve, and can extract its solution. All the information needed to verify that the certificate is a correct certificate for the given problem should be provided. For example, if Certificate had only contained x instead of a,b,c,x, then we could have implemented solve like:

solve a b c = certify 0 0 0 0

Because that is a valid solution, but we have not solved the problem. The client needs to be able to inspect that a,b,c match the input values. Maximally untrusting client code might look like this:

unsafeSolve a b c = 
  let (a',b',c',x) = solution (solve a b c) in assert (a == a' && b == b' && c == c') x
  where
  assert True x = x
  assert False _ = error "Assertion failed"

Here we can give any function whatsoever for solve, and we will never report an incorrect answer (replacing the incorrectness with a runtime error).

This is certainly overkill for this example, but in the System IG compiler it makes a lot of sense. I have a small set of rules which form well-typed programs, and have put in much effort to prove this set of rules is consistent and complete. But I want to experiment with different interfaces, different inference algorithms, different optimizations, etc.

So my Certificate implements combinators for each of the rules in my system, and all the different algorithms plug into that set of rules. So whenever I write a typechecker algorithm, if it finds a solution, the solution is correct by construction. This gives me a lot of freedom to play with different techniques.

Verification rules can be more involved than this single function that constructs a certificate. In the System IG compiler, there are 12 construction rules, most of them taking other certificates as arguments (which would make them certificate “combinators”). I’ll show an example of more complex certificate constructors later.

What is interesting about this pattern, aside from the added correctness and verification guarantees, is that is changed the way I thought while I was implementing the algorithm. Instead of being master of the computer, and telling it what to do, it was more like a puzzle I had to solve. In some ways it was harder, but I attribute that to redistributing the workload; it’s harder because I am forced to write code that is correct from the get-go, instead of accidentally introducing bugs and thinking I’m done.

The other interesting mental change was that it often guided my solution. I would look at the certificate I’m trying to create, and see which constructors could create it. This gave me an idea of the information I was after. This information is the information necessary to convince the client that my solution is correct; I cannot proceed without it.

Theoretically, the algorithm part could be completely generic. It might just do a generic search algorithm like Dijkstra. If it finds a certificate, then it has solved your problem correctly. Solutions for free! (But this will not be practical in most cases — it might not yield a correct algorithm by other criteria, such as “always halts”).

Here’s an example of a more complex certificate. The domain is SK combinator calculus, and a Conversion is a certificate that holds two terms. If a Conversion can be constructed, then the two terms are convertible.

module Conversion ( Term(..), Conversion
                  , convId, convCompose, convFlip
                  , convS, convK, convApp)
where

infixl 9 :*
data Term = S | K | Term :* Term   deriving (Eq)
data Conversion = Term :<-> Term

convTerms (a :<-> b) = (a,b)

convId t = t :<-> t

convCompose (a :<-> b) (b' :<-> c)
    | b == b' = Just $ a :<-> c
    | otherwise = Nothing

convFlip (a :<-> b) = b :<-> a

convS (S :* x :* y :* z) = Just $ (S :* x :* y :* z)  :<->  (x :* z :* (y :* z))
convS _ = Nothing

convK (K :* x :* y) = Just $ (K :* x :* y)  :<->  x
convK _ = Nothing

convApp (a :<-> b) (c :<-> d) = (a :* c) :<->  (b :* d)

The export list is key. If we had exported the (:<->) constructor, then it would be possible to create invalid conversions. The correctness of a certificate module is all about what it doesn’t export.

I’m wondering what the best way to present this as an object-oriented pattern is, so I can insert it into popular CS folklore (assuming it’s not already there ;-).

Enumerating a context-free language

Here is a familiar context-free grammar for arithmetic expressions:

S      ::= add
add    ::= mul | add + mul
mul    ::= term | mul * term
term   ::= number | ( S )
number ::= digit | digit number
digit  ::= 0 | 1 | ... | 9

I have a challenge for you: write a program (in a language of your choice) which enumerates all strings accepted by this grammar. That is, your program runs forever, but if I have a string this grammar accepts, then your program should output it in a finite amount of time.

(This really is a fun challenge, so I encourage interested readers to try it themselves)

This is a tricky problem. Indeed it is sufficiently tricky that I cannot think of a clean imperative solution to it (which I’m sure is also related to being very out-of-practice in imperative programming). I’m interested to see any such solutions that people came up with. (And I’ll try it myself in the meantime)

But I’m going to present a functional solution using a neat little monad called Omega which I just uploaded to Hackage.

Let’s step back and consider a simpler motivating example. Consider the following list comprehension (and recall that list comprehensions are just the list monad behind a little bit of sugar):

pairs = [ (x,y) | x <- [0..], y <- [0..] ]

This looks like it generates all pairs of naturals, but it does not. It generates the list [(0,0), (0,1), (0,2), ...], so the first element of the pair will never get to 1. If Haskell allowed us to use ordinal numbers as indices then we could: pairs !! ω == (1,0). :-)

Conceptually what we have is a lattice of pairs that we’re “flattening” poorly:

(0,0)  (0,1)  (0,2)  (0,3)  ...
(1,0)  (1,1)  (1,2)  (1,3)  ...
(2,0)  (2,1)  (2,2)  (2,3)  ...
  .      .      .      .

We’re flattening it by taking the first row, then the second row, and so on. That’s what concat does. But anybody who’s had a brief introduction to set theory knows that that’s not how you enumerate lattices! You take the positive diagonals: (0,0), (1,0), (0,1), (2,0), (1,1), (0,2), .... That way you hit every element in a finite amount of time. This is the trick used to show that there are only countably many rational numbers.

Omega is the monad that comes from this concept. We define Omega‘s join to be this “diagonal” function. What we get back is a monad with a nice property:

If x occurs at a finite position in xs and y occurs at a finite position in f x, then y occurs at a finite position in f =<< xs

It’s hard to know what that means without knowing what =<< is supposed to mean. It means if you have a (possibly infinite) list of items, and to each one you apply a function which generates another (possibly infinite) list, and you merge them all together, you’ll be able to reach every result in a finite time.

More intuitively, it means if you write a multiple branching recursive function where each branch recurses infinitely (generating values), you will not get stuck in the first branch, but rather generate values from all branches.

And that is exactly what we need to write the context-free enumerator. Given a simple data structure representing a context-free grammar:

data Symbol a
    = Terminal a
    | Nonterminal [[Symbol a]] -- a disjunction of juxtapositions

Then we can write an enumerate function in a straightforward depth-first way:

enumerate (Terminal a) = return [a]
enumerate (Nonterminal alts) = do
    alt <- each alts          -- for each alternative
      -- (each is the Omega constructor :: [a] -> Omega a)
    rep <- mapM enumerate alt -- enumerate each symbol in the sequence
    return $ concat rep       -- and concatenate the results

But the Omega monad will do some heavy lifting and stagger those generators for us. Defining the grammar above:

arithGrammar = s
    where
    s      = Nonterminal [[add]]
    add    = Nonterminal [[mul], [add, Terminal '+', mul]]
    mul    = Nonterminal [[term], [mul, Terminal '*', term]]
    term   = Nonterminal [[number], [Terminal '(', s, Terminal ')']]
    digit  = Nonterminal $ map (map Terminal . show) [0..9]
    number = Nonterminal [[digit], [digit, number]]

And then running our enumerator:

runOmega $ enumerate arithGrammar

We get a very encouraging list of results:

0
1
0+0
0*0
0+1
(0)
1+0
0*1
0+0*0
00
...

Notice how each type of node is represented in that initial prefix. That’s a good sign that there won’t be degenerate repetitive behavior. But of course we know there won’t be by the guarantee Omega makes (unless I implemented it wrong).

Set Selectors

I am writing a poker game, and I got mildly annoyed when I went to write the hand classification functions. There was a disparity between the specification and implementation of poker hands; I had to come up with an algorithm to match each type. I didn’t like this, I want the code to match the specification more directly.

This is quite a geeky post. The types of poker hands are very unlikely to change, and the amount of time I’ve spent thinking about this problem already is many times that of solving it directly in the first place. I.e. it would be stupid for someone to pay me by the hour to solve it this way. Still, it gives rise to an interesting generalization that could be very useful.

I decided that the way I would like to specify these things is with “set selectors” over logical expressions. That is, given a finite set U, find a subset R of U such that some logical expression holds in R (i.e. all quantifiers are bounded on R).

This has a straightforward exponential time solution. I’m trying to do better.

I started by classifying logical expressions. In the following, let P(…) be quantifier-free.

  • \exists x P(x) is straightforward O(n).
  • More generally, \exists x_1 \exists x_2 \ldots \exists x_k P(x_1, x_2, \ldots x_k) is straightforward O(n^k).
  • \forall x P(x) is also O(n) to find the largest solution (because the empty set would satisfy it, but that’s not very interesting).
  • \exists x \forall y P(x,y) has an obvious solution, same as \exists x. P(x,x). There is no unique largest solution, but there is a unique largest for each x which can be found in O(n^2) time. It’s unclear what the library should do in this scenario.
  • \forall x \forall y P(x,y) is called the Clique problem and is NP-complete! Damn!

But the most interesting one so far is the case: \forall x \exists y P(x,y). It turns out that there is a unique largest solution for this, and here is an algorithm that finds it:

Given a finite set U, find the largest subset R such that \forall x \! \in \! R \, \exists y \! \in \! R \, P(x,y).

Let r_0 = U, r_{n+1} = \{ x \in r_n | \exists y\!\in\!r_n \, P(x,y) \}. That is, iteratively remove x’s from r that don’t have corresponding y’s. Then define the result R = \bigcap_i r_i.

Lemma. There is a natural number n_f such that r_{n_f} = R.

Proof. Follows from finite U and r_{n+1} \subseteq r_n.

Theorem. \forall x \! \in \! R \, \exists y \! \in \! R \, P(x,y).

Proof. Given x \in R = r_{n_f} = r_{n_f + 1}. Thus there exists y \in r_{n_f} = R such that P(x,y), by the definition of r_{n_f+1}.

Theorem. If R^\prime \subseteq U and \forall x \! \in \! R^\prime \, \exists y \! \in \! R^\prime \, P(x,y), then R^\prime \subseteq R.

Proof. Pick the least n such that R^\prime \not\subseteq r_n. There is an x \in R^\prime with x \not\in r_n. The only way that could have happened is if there were no y in rn-1 with P(x,y). But there is a y in R’ with P(x,y), so R^\prime \not\subseteq r_{n-1}, contradicting n’s minimality.

The time complexity of this algorithm can be made at least as good as O(n^2 \log n), maybe better.

While that was interesting, it doesn’t really help in solving the general problem (which, I remind myself, is related to poker, where all quantifiers will be existential anyway!). The above algorithm generalizes to statements of the form \exists w_1 \exists w_2 \ldots \exists w_j \forall x \exists y_1 \exists y_2 \ldots \exists y_k \, P(w_1,w_2,\ldots w_j,x,y_1,y_2, \ldots y_k). Each existential before the universal adds an order of magnitude, and I think each one after does too, but I haven’t thought it through.

In fact, I think that, because of the clique problem, any statement with two universal quantifiers will take exponential time, which means I’m “done” (in a disappointing sort of way).

Back to the real world, I don’t like that finding a flush in a hand will take O(n^5) time (16,807 checks for a 7-card hand, yikes), when my hand-written algorithm could do it in O(n). I’m still open to ideas for specifying poker hands without the use of set selectors. Any ideas?

Call-by-Future

I was reading about evaluation strategies for lambda calculus on Wikipedia, and one caught my eye: “call-by-future”. The idea behind this strategy is that you evaluate the function and its argument in parallel. It’s like lazy evaluation, except you start evaluating earlier. Call by future semantics are non-strict, so I figured I could coerce Haskell to doing it.

Template Haskell to the rescue again! I wrote a module which has this function:

  parApp :: (a -> b) -> (a -> b)
  parApp f x = x `par` f x

par is a special function which says to start evaluating its first argument and then return its second argument. Then it walks the syntax tree, replacing applications with applications of parApp, like:

  fib 0 = 0
  fib 1 = 1
  fib n = fib (n-1) + fib (n-2)

Becomes:

  fib 0 = 0
  fib 1 = 1
  fib n = parApp (parApp (+) (parApp fib (parApp (parApp (-) n) 1))
                                         (parApp (parApp (-) n) 2))

Pretty, ain’t it? :-p

For the above program (test.hs) computing fib 40, when compiled with -threaded -O2 I get the following results on a dual quad-core machine:

  #  command         time      speedup   incr.speedup
  ./test +RTS -N1  # 20.508s   1.00      N/A
  ./test +RTS -N2  # 18.445s   1.11      0.56
  ./test +RTS -N3  # 15.944s   1.29      0.77
  ./test +RTS -N4  # 12.690s   1.58      0.94
  ./test +RTS -N5  # 11.305s   1.81      0.90
  ./test +RTS -N6  #  9.663s   2.12      0.97
  ./test +RTS -N7  #  8.964s   2.29      0.92
  ./test +RTS -N8  #  8.541s   2.40      0.92

The number after -N is the number of hardware threads used, the speedup is the ratio of the original speed against the multicore speed (in an ideal world this would match the number of hardware threads), and the incremental speedup is (tn-1Nn-1)/(tnNn), i.e. the fraction of what we gained over the previous simulation versus what we should have in an ideal world. As long as this is near one, our time is decreasing linearly. As we can see, we pay a lot of overhead mostly at 2 and 3 processors, and after that there is little additional overhead. There is too little data here to see what the large-scale trend is, though.

Suppose that the incremental speedup goes to a constant p < 1. Then tn+1 = n/(n+1) * tn/p. Doing a little algebra, we see that the time levels off at n = p/(1-p). So, for example, if p were 0.92 as it looks like it’s going to be, then n = 11.5. That is, adding a 12th processor is not going to gain you anything. The long term goal for parallelization is to get solutions where p is very close to 1, because processors are going to start being cheap like hard drives, and I actually wouldn’t surprised if in 24 years, 64-cores were common (calculated using Moore’s law).

So, 8 processors, 2.4x speedup. We could certainly ask for better. But it ain’t bad considering you didn’t have to know anything about your program at all to get it there :-).

No Total Function from Infinite Sequences of Booleans onto the Integers

Yesterday I found out about a remarkable algorithm which can take any total predicate (function returing true or false) on infinite sequences of bits and find an infinite sequence which satisfies it (and implemented it). It’s a counterintuitive result, since you’d think that you can’t search the (uncountably) infinite space of bit sequences in a finite amount of time, but you can.

This implies some results about how much total functions can encode. For example, there must not be any total function from infinite sequences of bits onto the integers (such that any integer is reachable using a suitable infinite sequence). If there were, you could use this algorithm to decide any predicate on the integers, which would be a far-too-powerful mathematical tool.

But I couldn’t prove it, not without citing the algorithm. Why must there not be any such function, besides that it would make the algorithm incredibly powerful (so powerful that you could prove its impossibility, but that proof method was not good enough for me, because it could have been that the algorithm had an error).

The proof was actually hiding in the proof on sigfpe’s blog, but I couldn’t see it at first. So I’ll give a more explicit version here.

Let f be a total function from sequences of bits onto the integers (f : (Z -> Bool) -> Z). Let df(xs) denote the greatest index that f reads when given the argument xs. Now, for example, choose an xs such that f(xs) is 0. df(xs) is the greatest index that f reads, so f doesn’t depend on anything greater (for this argument), so every sequence with the prefix xs[0..df(xs)] has the value zero.

Now make a binary tree corresponding to the input sequence, where a node is a leaf iff the sequence leading to the node is the shortest prefix that completely determines f‘s output. Every possible extension of the sequence leading up to the leaf maps to the same value, so we informally say that the leaf itself maps to the value (even though the leaf represents a finite sequence and f takes infinite sequences).

This tree has infinitely many leaves, since there must be (at least) one leaf for each integer. And now, just as sigfpe did, invoke the König lemma, “every finitely branching tree with infinitely many nodes has at least one infinite path”. That infinite path represents an input to f for which it will not halt. That is to say, f was not total after all!

Abstract State Machines for Game Rule Specification

Jude has been working on the design of a general board game programming engine. We brainstormed a bit, and I remembered something I had read about a little while ago: Abstract state machines.

An ASM is really nothing more than a formalization of a simultaneous chain of if-then statements. A common example is Conway’s life: given a lattice and a countNeighbors function you could define the rules for Conway’s life as an ASM as follows:

letDie(cell, n) =
  if status(cell) = Alive and (n  3) then
    status(cell) := Dead
letLive(cell, n) =
  if status(cell) = Dead and n = 3 then
    status(cell) := Alive

gameOfLife =
  forall cell in cells do
    letDie(cell, countNeighbors(cell))
    letLive(cell, countNeighbors(cell))

e only thing this is getting us over any type of program specification is the parallelism. But the idea of a list of rules, each of which has a condition and an action, is the right level of abstraction to make a board game engine.

We have to tweak a few things to get it just right. For example, you could have must and may combinators, which specify which moves are available to the player.

Let’s look at Pente. First, we assume a linked-list style lattice, where each cell has eight pointers indexed by direction (they could be abstract or just numbers or whatever), and then a link(direction, n) function gives us the cell in a particular neighbor. Then we could specify pente something like this (Haskellesque syntax now):

move s player =
  if color s == Empty then may $ do
    color s := player
    mapM_ capture directions
  where
  capture d = do
    let oneAway = link d s
    let twoAway = link d oneAway
    let threeAway = link d twoAway
    if color oneAway == color twoAway == otherPlayer player && color threeAway == player then do $
      color oneAway := Empty
      color twoAway := Empty

There are still a few issues to solve as far as expressing “you must do either A or B, then you must do C”. I actually used very little of the ASM model (just one may combinator) in expressing pente. Okay, idea on the backburner, let’s see what else comes.