Gravmari Released!

After fifteen weeks, Hubris Arts has completed its first game, Gravmari, directed by Max Rebuschatis.

The game is a journey through the depths of space in a strange and alien ship called the Playertoid. It was inspired by Katamari Damacy, Kubrick’s 2001, the efforts of NASA, and, most importantly, dreams of floating through the cosmos. We wanted to convey the feeling of infinity, both in zoom and scope, and wonder at the mysteries of the stellar wasteland.

Best played with the lights off, with headphones (or a nice sound system) and an Xbox 360 controller. Mouse and lights on also supported.

Collision Detection with Enneatrees

Many of my games boil down to, at some level, a large collection of circles (or spheres) interacting with each other. I use circles for collision detection, and otherwise whenever I can for organization, because the results look more natural than using boxes. If you organize using boxes, your simulation will tend to “align” to the axes, which nature surely does not do.

So naturally, detecting when two of these circles are touching each other is one of the larger computational problems that my games face. And bewilderingly, I have not found anywhere a good solution that covers circles in a wide variety of sizes. That is what this article is about.

If your circles are all about the same size, and in particular fairly small in comparison to the needed accuracy of the algorithm, the obvious and de-facto solution is to use a quadtree. That is, start with the whole screen and divide it up into 4 regions, and do so recursively until you have a nice spatial arrangement of your objects, categorizing objects by which region their center lies in.

When you have a circle and want to see what other circles are touching it, you just look for other objects in the smallest region. Well, except you are not a point, you are a circle, so you might penetrate the edge of the region, in which case you need to look at the adjacent region as well. Well, except other circles might penetrate the edges of their regions, so you actually need to look at all adjacent regions. Well, except you might be really big, so you need to look in all the regions that you touch. Well, except other objects might be really big, so… you need to look in all the regions.

A quadtree has not bought us anything if circles can be arbitrary sizes; collision detection is still linear. You could say things like the number of very big circles is usually small (because they would collide away their competition), so you can put an upper bound on the size of circles in the tree then just do a linear check on the big ones. But that is an awful hack with a tuning parameter and doesn’t generalize. Shouldn’t the right solution work without hacks like that (for the fellow game developers in my audience: despite your instincts, the answer is yes :-).

I have worked with quadtrees before, and saw this problem a mile away for my game with circles of many different sizes. The first variation I tried was to put circles into regions only if they fit entirely. Circles on the boundaries would be added to the lowest branch that they could fit into. Then to check collisions on an object, you look at where it is in the tree, check all subregions, and then check all superregions (but not the superregions’ subregions). You will get small things in your vicinity on your way down, and large things on your way up. And it’s still logarithmic time… right?

Wrong. Suppose that every object in the world lay on the horizontal and vertical center lines. Even very small objects would accumulate in the top node, and no region subdividing would ever occur. This degenerate situation causes linear behavior, even though just by shifting the center of the region a little bit you get a sane subdividing. It’s not just a “let’s hope that doesn’t happen” thing: there are a lot of bad situations nearby it, too. Any object lying on those center lines will be checked when any other object is testing for its collisions, because it is on the way up. This did, in fact, become a major issue in my 10,000 particle simulation.

But if we could somehow salvage the original idea of this solution without the degenerate case, then we would get logarithmic behavior (when the field size is large relative to the radii of the particles).

To understand this solution, I would like you to turn your attention to one dimensional simulations. Here the previous solution would be subdividing the real line into a binary tree.

But the new solution divides intervals into not two, not four, but three child intervals. However, not in thirds like you would expect. (0,1) is divided into (0,1/2), (1/4, 3/4), and (1/2,1) — the intervals overlap. Any “circle” (in 1-D, so it’s another interval) with diameter less than 1/4 must fit into at least one of these subregions.

We use that property to organize the circles so that their size corresponds to their depth in the tree. If you know the diameter of the circle, you know exactly how deep it will be. So now when we are looking downward, we really are looking for small circles, and when we are looking upward we are looking for large ones, and there is no possibility of an accumulation of small irrelevant circles above us that are really far away.

However, on your way up, you also have to check back down again. If you are in the left portion of the (1/4,3/4) interval, you might intersect something from the right portion of the (0,1/2) interval, so you have to descend into it. But you can prune your search on the way down, cutting out “most” (all but a logarithm) of it. For example, you don’t need to check the interval (1/16,1/8) and its children even though you are checking its parent (0, 1/2) — it is too far away.

When you are deciding which region to put a circle in and you have a choice, i.e. the circle fits in multiple subregions, always pick the center one. When the circles are in motion, this gets rid of the degenerate case where a circle bounces back and forth across a deep, expensive boundary.

If you generalize to 2D, you end up cutting a region into 9 subregions (thus the name enneatree). The way I did it was to alternate cutting horizontally and vertically, so that I didn’t have to have 9-way if-else chains. That had more subtleties than I had expected. I have yet to find a straightforward, elegant implementation.

The algorithm seems to work well — I am able to support 5,000 densely packed circles in C# without optimizations.

The inspiration for this algorithm comes from the field of computable real numbers (infinite — not arbitrary, but infinite precision numbers). You run into trouble if you try to represent computable reals infinite streams of bits, because some interactions might be “non-local”; i.e. you might have to look infinitely far down a stream to find out whether a bit should be 0 or 1. Those guys solve the problem in the same way I did, by dividing intervals into 3 overlapping subintervals, and using this 3-symbol system as their number representation.

I see a glimmer of a connection between the two problems, but I can’t see it formally. Maybe it would become clearer if I considered infinite collections of infinitesimal objects needing collision detection.

Maximum entropy and negative probability

I have recently become fascinated with the concept of maximum entropy distributions, and went back and read Dan Piponi’s post on negative probabilities, and link surfing from there. Something sparked and I wondered what kind of connection there is between the two. A little experimenting in Mathematica later and I’m on to something curious.

First, a little background. E.T. Jaynes argues (so I have heard, I have not read the original) that if you have a set of constraints on a set of random variables and you would like a probability distribution over those variables, you should choose the distribution that has the most information entropy, as this is the “least biased” distribution.

The entropy of a distribution is defined as: H = -\sum_i{p_i \log{p_i}}.

I am using Dan’s example, and I will quickly recapitulate the situation. You have a machine that produces boxes of ordered pairs of bits. It is possible to look at only one bit of the pair at a time, say each bit is in its own little box. You do an experiment where you look at all the first bits of the boxes, and it always comes out 1. You do a second experiment where you look at the second bit of the boxes, and it, too always comes out 1.

Now, most reasonable people would draw the conclusion that the machine only produces boxes containing “1,1″. However, if we wholeheartedly believe in Jaynes’s principle, we have to look deeper before drawing a conclusion like that.

The 4 probabilities we are interested in correspond to “0,0″, “0,1″, “1,0″, “1,1″. I will write them as 4-vectors in that order. So an equal chance of getting any combination is written as 1/4 <1,1,1,1>.

For the distribution <a,b,c,d>, our constraints are: a+b+c+d = 1 (claiming our basis is complete), c+d = 1 (the first bit is always 1), b+d = 1 (the second bit is always 1).

The “reasonable” distribution is <0,0,0,1>, which indeed satisfies these constraints. The entropy of this distribution 0 (taking x log x = 0 when x = 0) — of course, there is no uncertainty here. But are there more distributions which satisfy the constraints?

Well, if you require all the probabilities to be positive, then no, that is the maximal entropy one, because it is the only one that satisfies the constraints. But let’s be open-minded and lift that requirement.

We have to talk about what the entropy of a negative probability is, because log isn’t defined there. The real part is perfectly well defined, and the imaginary part is multi-valued with period 2π. I’m not experienced enough with this stuff to make the right decision, so I’m blindly taking the real part for now and pretending the imaginary part is 0, since there’s really no reasonable “magnitude” it could be.

Whew, okay, almost to the fun stuff. We have four variables and three constraints, so we have only 1 degree of freedom, which is a lot easier to analyze than 4. We can express the distribution with only that one degree d as:

<d-1, 1-d, 1-d, d>

And here is a plot of the real part of the entropy as a function of d:

entropy

It achieves a maximum at d = 1/2, the distribution <-1/2, 1/2, 1/2, 1/2>, the same one Dan gave. In some sense, after observing that the first box is always 1 and, separately, that the second box is always 1, it is too biased to conclude that the output is always “1,1″.

I would like to patch up the “real part” hack in this argument. But more so, these exotic probability theories aren’t really doing it for me. I would like to understand what kinds of systems give rise to them (and how that means you must interpret probability). My current line of questioning: is the assumption that probabilities are always greater than 0 connected to the assumption that objects have an intensional identity?

I would love to hear comments about this!

Gravmari

Gravmari

Gravmari

Most of my time has been spent preparing a game for submission into the Independent Games Festival at the Game Developer’s conference. It’s called Gravmari, a game about gravity and the universe, roughly described as “Katamari in space”. The submission deadline is Nov 1, so we are working pretty hard to get it in tip-top shape. A demo version will go on the website shortly after that date.

I am very happy with the game so far. It is very polished, but remains simple and elegant. We have a lot of patience with our features, introducing new dynamics, music, and art gradually throughout the whole game. I am occasionally hit with the haunting sense of grand scale we are going for, and I think it will be very effective on people who don’t know what’s coming.

The other developer, Max, and I have informally started a studio together (we will register soon) called Hubris. This is our first game, and we hope to make many more. I am excited about this game company for two reasons:

  1. We are rejecting the typical working of the game industry and moving more toward the film industry, in the sense that our games have a director with a unified vision in mind, and the rest of the crew is there to help the director achieve his vision. For exampe, the idea of a “team” of game designers is pretty absurd in this context. Gravmari doesn’t have a director, but the soul is there: the game is a work of art as its whole — the execution of a unified vision — not an idea and a list of features.
  2. We are both scientifically-minded, principled individuals. A couple of indications of my little joys: Most game developers know that you shouldn’t hard-code numbers, but abstract them out into named tweakable parameters. But you can do better than that: do some math and discover what the right answer is, so tweaking doesn’t make any sense. Eg. you could have a tweak parameter for the speed of an orbiting ring galaxy, but better is to write down an equation and find out at what speed a stable orbit forms. This approach has worked wonders throughout this game. Another joy is that we fully embrace our 2-dimensional environment, so much so even to change the equation of gravitational attraction. In 2 dimensions, gravity ought to follow an inverse law, rather than an inverse square law (I can explain why later if someone asks). This has the side effect of creating much more math for us to do, because everything you find online applies to the inverse square form.

All in all, I think the game and the studio will be great. I’ll make another announcement when you can try it out.

IO-free splittable supply

I know it has been a while since I posted, and I’m sorry to say this isn’t a terribly deep or interesting one.

I like the value-supply package on hackage. It takes (essentially) an infinite list of things and arranges them into an infinite tree randomly… but a particularly useful kind of random. The first one you look at just happens to be the first element of the list, the second one you look at comes from the second element, etc.

This sort of magic isn’t useful for its magic, it’s just useful because you can use something like Int and not run out of bits too soon. All I ever use it for is unique labels in computations that would be made too strict by the state monad. As a contrived example of such a computation:

    allocateLabel :: State Label Label

    labelList :: State Label [Label]
    labelList = liftM2 (:) allocateLabel labelList

    bad :: State Label [(Label,Label)]
    bad = liftM2 zip labelList labelList

The problem is that we don’t know what label to start with on the second labelList in bad… we never finish using them up in the first one. But if all you care about is uniqueness of labels, rather than their order, this computation is easily definable as a function of a Supply from value-supply:

    labelList :: Supply Label -> [Label]
    labelList sup = let (a,b) = split2 sup in supplyValue a : labelList b

    good :: Supply Label -> [(Label,Label)]
    good sup = let (a,b) = split2 sup in zip (labelList a) (labelList b)

Great. Only problem is, we can only make Supplys from the IO monad. As it should be, since the supply uses a nasty form of nondeterminism to assign elements to its trees.

But, if all you care about is uniqueness, which is the case for me more often than not, you can squash IO out of the interface (not the implementation, unfortunately). Like this:

    runSupply :: (forall a. Eq a => Supply a -> b) -> b
    runSupply f = f (unsafePerformIO (newSupply 0 succ))

In my opinion, this is the best possible interface for such a thing. Value-supply is only to be used when all you care about is uniqueness, not ordering or the precise values. Well, this interface encodes that exactly. When you use this function, all you are allowed to care about is uniqueness, any other concern is hidden behind the type system. It is also impossible to confuse indices from different supplies.

And this is a perfectly well-behaved function. You could demonstrate that by giving a pure implementation of Supply [Bool] (which just traces the path to the given leaf from the root). The unsafe business is merely an optimization (and for some use cases, like labelList above, a very potent one).

Hooray. I get my favorite evil module, with all of its evil safely contained in a jar that they call a box for some reason.

Forever Unemployed

I recently interviewed for a job in Boston, in which I would get to work with Shae Erisson (shapr) and Edward Kmett, which would be a wonderful treat, of course. But I turned it down after it was described to me. My reasoning at the time was that it was too formal and there were not enough hard problems for my taste. But after a little reflecting, I realize that that’s not the reason at all:

I believe now that I am deeply and fundamentally an artist. At a job like that, I’m just an engineer building products for customers. I am not cut out for engineering as an industry.

I’m currently making a game with my friend Max about gravity and the universe, which we plan to enter into the independent games festival at GDC. In the process of making this game, we written and scrapped three large-scale gravitational simulation algorithms, and are developing our fourth. We have learned and are simulating a fair amount of cosmology. We even tried to use some general relativity to solve the simulation problems. Our gameplay will probably not be intuitive to the naive player, but that’s because the naive player doesn’t really understand gravity, and we believe that showing the true nature of the universe trumps pandering to ignorance to flatten the learning curve. Clearly, our hearts are in this: this isn’t just a game to take to the bank. We’re trying to show people something about the beauty of the universe.

Dana is also a work of art. That’s why I’m having trouble understanding what it is as a clearly-stated engineering goal. It is evading the usual engineering practices, because it’s not trying to solve a specific problem. Really what Dana is trying to be is advanced technology. I want it to do whatever it does, and when the code pop culture sees it they have the sense that they are looking at alien technology: it is compact, unfamiliar, but obviously very elegant, whatever it is. When you dig into its subproblems and put in the effort understand them, the response will be “Oh! Of course that’s how you do that!” The realization that Dana is art has convinced me to try to reduce it’s scope. It is more important that it do one thing in the most beautiful way, than for it to be a complete foundation for arbitrary software.

And I have always been an artist with my music. My friend Eric Shapiro bought a piano for me to play on the Pearl Street mall, in exchange for 25% of my earnings. It is going great: I often attract crowds of 20 or more watching me improvise and make about $40/hr doing it. Eric helped me see that my music can be valued. So I’ve decided to be more industrious about it — after all, the music industry is about supporting artists (even if it is mostly corrupt); in contrast, I don’t know anyone who would pay for something like Dana. I’m developing a solo album to sell on the mall and approach record companies with. I’m trying to think of a pseudonym (my name is pretty lame). During the winter I’ll make my money with teaching and shows. Essentially, my plan is curiously “I’ll use music to support myself, so I have time for my art.”

It’s harder to make good money with music than with engineering. But I think this plan will lead me to a happier lifestyle than wasting my half my life on an engineering project that I couldn’t give a rats ass about, just so I can have a surplus of money in the other half.

Emphasizing Specialization

I have briefly—or perhaps not so briefly, my interests being unpredictable beasts—shifted the focus of my research to lazy specialization. This topic was discussed extensively, but not comprehensively, in Michael Jonathan Thyer’s dissertation, Lazy Specialization, and seems to have received little attention otherwise. This is unfortunate. I consider it a very important topic.

Lazy specialization is the idea that a functional program be incrementally optimized during its execution, in tandem with partial application. For example, consider the code:

    let g = f x in (g y, g y')

A lazy specializer would evaluate g as far as it could using only information about x, and then apply the resulting optimized function to y and y’.

If most of f’s work could be done using x alone, you could save a lot of time this way. That is the obvious appeal. The drawback on the other side is that the evaluation machinery of a lazy specializer is more involved than a whnf evaluator, so the whole system takes a speed hit as a result.

But so far as I can tell, the only such machines in existence are in a high-level, symbolic, proof-of-concept form, in which the hit is several orders of magnitude. Where’s the research getting down and dirty with this technology, seeing if we can bring it within a factor of 10 or less? Let me argue why we would want to pursue this seriously:

Here is a quote by Heinrich Apfelmus on laziness, at the end of his article on lazy sorting:

Well, it’s highly unlikely that algorithms get faster by introducing laziness. I mean, lazy evaluation means to evaluate only those things that are really needed and any good algorithm will be formulated in a way such that the unnecessary things have already been stripped off. But laziness allows to simplify and compose algorithms. Sometimes, seemingly different algorithms turn out to be two sides of the same coin when formulated with lazy evaluation. Isn’t it great that finding the k-th minimum is not only an adaption of quicksort but can readily be obtained from it by composing it with (!! k)?

I’d say that data structures are the sink for most of the creativity of lazy functional programmers. We spend lots of energy getting a data structure to represent exactly the things we need, sharing what needs to be shared. The key point about data structures is that they can be decomposed; you can peel off the root of a tree and leave yourself with only one of the branches, and the other branch will be garbage collected.

A lazy specializer promotes full-blown functions to the level of data structures. Functions can now be decomposed (by composing!) in the same way, sharing important pieces and forgetting unimportant ones. Observe this stream implementation:

    type Stream a = Nat -> a
    cons x xs z = case z of { Zero -> x; Succ z' -> xs z' }
    tail s z = s (Succ z)

Where Nat is the standard lazy natural type (constructors Zero and Succ). What happens when we evaluate tail (cons 1 (cons 2 R)), for some stream R (I don’t want to bother creating an infinite stream). I’ll use the HNF strategy that I discussed previously. Let’s watch:

   tail (cons 1 (cons 2 R))
   (\s z. s (z+1)) (cons 1 (cons 2 R))
   (\z. cons 1 (cons 2 R) (Succ z))
   (\z. (\z'. case z' of { Zero -> 1; Succ z' -> cons 2 R z' }) (Succ z))
   (\z. case Succ z of { Zero -> 1; Succ z' -> cons 2 R z' })
   (\z. cons 2 R z)
   ...

Where did the 1 go? Gone, collected by the garbage collector, just as if we had written Stream in the more conventional way. GHC would have held on to the 1 forever. Nat -> a behaves exactly like the Stream data structure. The structure of Nat has induced a structure for Nat -> a. That is a beautiful form of data structure composability.

Working in a language that uses a lazy specializer also comes with a very important mental advantage: abstracting a pattern using a higher-order function always has constant-time overhead. I.e. you will never make your program run faster by unfolding abstractions. This encourages the more composable higher-order style, providing a snappy comeback to any speed nazi. It also removes the encouragement to fiddle with the details of a function for more speed: beta-equivalent formulations will all turn into the same code in the end, so clarity is the only remaining variable, and can be maximized.

Ideas like GHC’s {-# RULES #-} are no longer fickle compiler constructions, but can now be made trustworthy optimizations. If the left side of a rule is in a certain form (a restricted sort of normal form), then its evaluation can be baked into the interpreter and will be applied as aggressively as possible. It is invariant to fiddling around with higher-order presentations and whatnot; they will be folded away and the rule can be seen. (But I’m not convinced RULES are necessary anymore; perhaps you can simply express the rules you want as a functional construction?)

The most convincing property of lazy specializers to me was the primary bench test for Thyer’s dissertation: the “tower of interpreters” test. This is the property that, if you write an interpreter for a language L, and in language L write an interpreter for M, and in M write an interpreter for N, etc., all the layers will be squashed away and you will end up with code that runs just as fast as it did in the base language (with a constant time overhead for eliminating the layers). You can design a language with all the bells and whistles you want, give it a semantics, and you may immediately use without a significant penalty. This is a huge abstraction win.

But there are a fair number of open problems in lazy specialization, more than just getting faster specialization machines. The explosion problem is tricky, in which, say, fix gradually becomes specialized to things like \f. f (f (f (f (f (f (f (fix f))))))); not useful, just eats up memory. The leap from complete laziness to optimal evaluation needs to be understood better. But in particular, I think we need a real LS to program with for a while, just to see what its dynamics are and to bring out its real-world problems.

I leave you with an addendum (which I snipped from the main argument), explaining my perceived connection between LS and FRP, my personal holy grail. All of my research links back to FRP somehow:


Finally, as if you didn’t know this was coming, there is a connection between lazy specializers and FRP, the holy grail of my functional programming life. It is a particular manifestation of the first point I made, designing streams as functions. I figure, all I have to do is design a data structure for time, with the right domain structure, perhaps something like:

   data Time = Zero | Next Double Time

And then define our friends:

   type Behavior a = Time -> a
   type Future a = (Time, a)

The implementation types exactly equal to their desired semantics, the most beautiful implementation I could ask for. Evaluation for a behavior b has a structure like:

   sample0 = b Zero
   b1 = b . Next 0.033
   sample1 = b1 Zero
   b2 = b1 . Next 0.033
   sample2 = b2 Zero
   ...

Each successive b is optimized, the unneeded bits at the front thrown off just as in the Stream example. The reason this can’t be encoded in regular Haskell is because the algebra for FRP includes a value of type Future (Behavior a), and we can’t see inside the Future because it is a functor, and thus arbitrary functions separate us from the enclosed Behavior, about which we must selectively forget things. However, the lazy specializer has no trouble looking through those functions, and will modify the behavior anyway, ridding us of the dreaded space leak.

On the By functions

Here’s a quick little note that has been bugging me for a while, and nobody wants to talk about it right now on IRC.

I think the By functions:

sortBy :: (a -> a -> Ordering) -> [a] -> [a]
maximumBy :: (a -> a -> Ordering) -> [a] -> a
groupBy :: (a -> a -> Bool) -> [a] -> [[a]]
nubBy :: (a -> a -> Bool) -> [a] -> [a]

Etc. should be replaced by On functions:

sortOn :: (Ord b) => (a -> b) -> [a] -> [a]
maximumOn :: (Ord b) => (a -> b) -> [a] -> a
groupOn :: (Eq b) => (a -> b) -> [a] -> [[a]]
nubOn :: (Eq b) => (a -> b) -> [a] -> [a]

My argument is: the functions provided to sortBy etc. have some preconditions. sortBy is not well-defined (or oughtn’t be) for functions which are not linear ordering functions; nubBy is shouldn’t be well-defined (to the dismay of some 31337s) for functions which do not encode an equivalence relation. But the folklore is that functions are typically “as total as possible”, so if it wants a function of some type, all I have to do is conjure a function of that type and my output will be something reasonable in terms of that function.

On the other hand, the folklore of typeclasses is that they typically come with laws. You need to prove — or at least think you know how to prove — some laws when you make a type an instance of a typeclass. The On functions use this obligation to encode their precondition. They are easier to use in a verified setting, too; there are a bunch of standard instances of Eq and Ord for which the laws are known to hold; map your data on to that in any way you like and the precondition is guaranteed.

Thoughts?

My Premonition

Last summer I got the opportunity to work briefly with Conal Elliott in Belgium. The two weeks I spent there are still making an impact on me; immediately I resonated with his semantic design idea. But the more important ideas were the ones I picked up subconsciously, and I’m just now becoming able to put them in words.

(1) If extreme idealism isn’t working out for you, maybe it is because you are not wholeheartedly following your own ideals.

(2) You must be the change you wish to see in the world (– Mahatma Gandhi). As applied to software: design software as if it were the beautiful paradise you want it to be, then build pieces of the scaffolding back to the status quo.

Both of these were key insights leading me to undertake Dana.

At the time I was desperately trying to create an FRP framework for Haskell, trying in terms of IO and STM and playing messy tricks with the operational semantics of thunks. It felt like cheating. I had the desire to throw away IO altogether in my efforts, just to see where that led me (this forced me to put aside the way I thought an FRP interface should look). This was an instance of (1).

But in Haskell, IO would always be beneath it all. Your functional program always becomes an imperative program at the end of the day. Maybe this dissonance is what is causing the endless FRP difficulties. Maybe we can’t really know what pure FRP is until we have pure turtles all the way down — until there is no impedance mismatch between the functional and the imperative. This set me on researching the idea of a purely functional operating system, which is still the slogan by which I name Dana.

But now Dana is more than that. Now, it is my (2). In this post, I describe how I see the world of software in 30 years. This isn’t about what kind of devices computers will be embedded in or how they are connected; this is about the process of developing software.


Code and program are the same; any fragment of code may be used interactively. Complex programs are just compositions of simpler ones, and composition of programs is a fundamental concept to both users and programmers. Thus power users cannot help but to be programmers.

Computer science has become a strict subfield of applied mathematics. Creating complex software is primarily modeling: the fundamental task in software development is to create a mathematical model of your problem. The model is encoded directly, and then software scaffolding is built around it, verified to the model. Reasonable shops do not dare ship code that hasn’t been verified to correspond to their model.

Development tools — programs that talk about programs — are astoundingly rich and expressive. Simply enter a property about your code. The tool goes off on a model search to try to refute the property. Meanwhile, in a way that my mind cannot yet comprehend, the programmer communicates the “essence” of why the property should be true, and the tool fills in all the details of the proof. The property is added to a database of facts, which everyone can immediately see and use.

Provably correct code is what people do, because the tools have made it so easy that it is an obvious engineering choice (this ease does not come without a resculpting of our brains). Gargantuan test suites are replaced by a few hours of a developer with a proving tool (an exponential increase in productivity came from the fact that theorems compose, but tests don’t).

User interfaces are mostly an artistic task, quite different from the rigorous visualization that modelers do. All objects have a few obvious ways to endow them with primitive user interfaces (see Eros). Then making a nice user interface is a matter of rearranging them, composing inputs and outputs, and drawing/animating things (under the assumption that computers are still monitor, keyboard, and mouse driven — make the appropriate generalizations).

The shops that will be renowned by power users are those who have a great program with a great interface, but they also ship components to the underlying logic. This is trivial for them, because of the ubiquitous separation between interface and logic. Some are still irrationally afraid of giving away their trade secrets by exposing this, so there are still lame, interface-only programs. But these lame programs will not have the chance to participate in the miracles of engineering the decade will bring, a combiatorical build up of the the work of millions of developers, coming together to create the Star Trek computer — or something else, far more advanced.


Computation is an information science, and is subject to the exponential explosion that comes with it. This Utopian vision hinges on correct composition of everything, from programs to facts.

Right now, software is primarily non-compositional. We have libraries and languages, which are built up of smaller libraries and languages. But end products are not designed to be composed; they are mostly designed for the end user only. But the end user is where the money is, thus where a great deal of the effort in the industry is going. Moreover, for advanced programs (say, Maya), the end user is also where a lot of the skill and talent is.

We have to work at this. There is little small-scale economic incentive to make your program composable; you just give your competitors the ability to use your software without you being able to use theirs. And it’s hard for idealistic young CEOs to try such a thing, since the state of software development means you have to put in a lot of work both to make your program composable and to compose other people’s programs. But these things are not fundamentally hard, they are just hard using the languages and methods of today.

Dana is my own first step in this direction. It aims to be an environment which forces everything to be composable and verifiable. This comes at the expense of mental laziness: you will have to seriously rewire your brain to work in it. I don’t believe that the enlightenment can come without this rewiring. I’m desperately trying to rewire my own, to even consider the choices that will be obvious to the generation ahead.

Is your brain going to be ready when the revolution comes?

The role of a core calculus

A lot of software engineering can be described as the art of skillful procrastination. When we’re designing some software, we would like to put off as many decisions as possible as long as possible, while still making progress toward the goal. This procrastination leads to good abstraction, because any code we wrote before we made said decision is independent of that decision, which means the decision can be changed later, leading to “maintainability”.

I would describe the philosophy of the Dana project as: make the right decision or keep thinking. It is a philosophy that only a project without a time limit can afford. This constraint forces reconsideration of the very fundamentals of computer science, since I believe much of our modern infrastructure is built on wrong decisions. I don’t ever expect Dana to be finished; instead, it is a unifying project within which some right decisions may be found.

core calculusWhen Dana becomes something runnable, it will look something like an hourglass: a large body of purely functional code on top, all represented eventually upon an extremely simple core calculus, below which layers of complexity are again added to adapt to existing operating systems and software. The layers above the core are only incidentally related to the layers below it, and I hope by studying whatever the eventual form of the upper layers, we can find ways to rip out the needless complexity below and replace it with something more fitting.

This is all very abstract, and it is hardly clearer in my mind than it is in this picture. However, I’m getting a good sense for what that skinny core is and how it relates to the rest of the system. That is what I describe for the rest of this article.


Despite what we pretend to believe, computer programs are more than their denotational semantics. Hard as we try, there will never be a system in which λx. x and λx. log2(2x) are indistinguishable (over, say, the naturals) due to execution performance. The finiteness of memory and non-instantaneousness of computations are things we will always have to deal with. Using a system which ignores these issues as a fundamental backbone would be a wrong decision.

However, Dana’s mission is to be a purely functional operating system. A key feature of reasoning pure functional programming is that of extensionality: if two functions give the same outputs for the same inputs, they are the same function. So here, λx. x and λx. log2(2x) must be the same function. To throw that away would be to give into just another imprecise, operational notion of computer programs.

At first sight, these two points directly contradict each other. How do we reconcile them?

I rather loosely used the term “same” above. What I mean by that word is that, if two things are the same, then you can safely substitute one for the other in all contexts and not observe a difference. Again there is imprecision in this definition: what do I mean by “observe a difference”? And therein lies our solution.

The core calculus is not something which is executed—not seriously, at least. Instead, it’s a way to define your observations. We come back to our two function “λx. x” and “λx. log2(2x)”—mind the quotes. These are represented in our core calculus not as the functions themselves, but as terms in an abstract syntax tree, in which they are obviously not the same. Call them f and g, respectively. Elsewhere in the core, we might define functions on syntax trees, such as Meaning and Performance. Then we can prove that Meaning(f) = Meaning(g) but Performance(f) ≠ Performance(g). These functions make explicit the varying parameters that cause the two implementations to vary.

This syntax tree might then be passed along to an interpreter, which executes it, respecting the Meaning and Performance functions. This can be proven if the interpreter, too, is implemented on top of the core calculus. But eventually something must leave the system to get to the drivers, and we can’t use the core calculus to prove its correctness anymore. Many methods can be used to convince ourselves of the correctness of the driver, but we can’t do it within Dana—an unfortunate fundamental necessity.

The point is, though, that everything above the core can be proven correct, and it is all coherent: the code we are proving things about is the very same code we’re running, and we don’t have to worry about translation errors. If we can trust the drivers, then we can trust the code.