In 2001, Ray Kurzweil published his inspiring, controversial essay The Law of Accelerating Returns. I acknowledge that the essay sometimes seems far-fetched, and perhaps some details are overly (or underly) idealistic, but in essence I find it very convincing. I will take it as an assumption for this article, and I also assume my readers are familiar with the gist of the essay.
I read Kurzweil’s essay about six weeks ago, and it has set off a chain reaction in my consciousness, causing me to reconsider my perceptions of technology. At first, it seemed to murder my denotational idealism, destroy my hopes of Dana (my purely functional operating system). It filled me with an excited fear, as if I were being carried down the rapids of a great river, trying not to hit the next rock.
Now the ideas have settled and my idealism is born again with a new perspective. The spirit of Dana still has a place, and I will describe what I think that is.
We are at a unique inflection point in the process of technological acceleration. In the last 50-100 years, orders of magnitude of progress began to occur in a single human lifetime. Technology is folding upon itself, each new invention making use of more and more recent knowledge and tools.
Dana was originally envisaged to facilitate this growth by observing that the vast majority of software out there is not compositional, thus we are re-inventing many wheels. Dana’s goal was to provide a framework in which everything must be compositional. But my early execution of the idea was misguided. It may have had a noble goal, but I was pretending that I was at the beginning of an acceleration of software, not the middle. Dana’s implementation did not make use of modern technology, and thus its development was an order of magnitude or two slower than the rate of modern software’s development. Further, its despotic purism would never catch on: it was trying to change the direction of an avalanche by throwing a snowball.
A truly modern compositional environment must embrace the flow of knowledge. Those who try to reverse the paradigm will end up reinventing all the technology that the mainstream is already using, and by the time they are finished the mainstream will be lightyears ahead. The state of the art may be ugly, but it exists, which is more than the purists can say. One person — one team — cannot develop software anymore; to do something new, we must employ the labor of the entire world.
Currently, the most constraining property of software is that for it to be reusable, it must be designed for reuse. It is hard to reuse libraries in a way for which they were not intended. And in fact most of the coding that goes on is not in libraries at all — people aren’t intending their code to be reused by anyone. They just want to get their product out the door and get some cash flowing in. Below is a reified mental picture of this.
The bulb on the bottom is the software designed for reuse. It grows slowly and uniformly, mostly occurring in the open-source world, created by people who like to write good software. The rays at the top are individual, result-oriented projects. This code is not typically designed to be reused. These projects grow quickly in a single direction, but if someone else wants to go in a similar direction, they have to start at the bottom. This is an economic phenomenon, and the economic goals of developers are not likely to change.
To accelerate progress, we must find a way to re-use those chutes, the goal being to transform the picture into something more like this:
In this picture, each results-oriented chute carries with it a net of all the support code that was written for it, upon which anyone can build. Of course only with the consent of the creator (people will find a way to protect their information), but if it is useful, then people will buy it.
I have thus become a supporter of Donald Knuth’s mantra:
I also must confess to a strong bias against the fashion for reusable code. To me, “re-editable code” is much, much better than an untouchable black box or toolkit.
Forking a project in order to use one of its features in another project is no simple task. Depending on the complexity of the feature, you might gain some time by doing things this way, but many competent software engineers prefer to simply rewrite from scratch. One reason is in order to understand the code, but another is simply economics. Saying that I rewrite features because it is fun is missing half of the equation. I rewrite features because it is more fun then spending a week fighting integration issues (very much not fun). I claim that to support this new picture, our primary goal should be to vastly reduce integration issues.
Here is where my purist ideals kick back in, having found a new purpose. Pure functions have no integration issues, at least semantically speaking (performance is always a bit more subtle). A pure function makes explicit all of its inputs and outputs, so to use it, you can just rip it out of its context, provide the inputs, and transform the outputs. While depending on the situation that may be a lot of work, compare it to finding all the places a singleton was modified in order to recreate its state at some point in the execution of the program. What if the singleton returned a reference to an object that was stored away and modified by an obscure corner of the program, upon which your feature later depended. This sounds like a terribly inflexible design, but most software is terribly designed, and we still want to reuse it.
However, just saying “use pure functions, voila!” is no solution. The trend of the world is gradually shifting toward more functional idioms, but asking everyday programmers (who we care about, because they are doing our work for us) to switch to purity is asking a lot. Retraining brains is too hard.
So that is my open research question. How do we introduce the immense advantage of purity from this perspective into programming pop culture? Perhaps it is a new virtual machine, that does dynamic dataflow analysis. Perhaps it is a static analysis tool. Perhaps it is a new language, which can interface with many popular existing languages. This would be slow to take off because it requires migration (thus risking never), but because of interop and the good reuse properties (with tool support) it might be faster to achieve critical mass. Clojure and Scala are doing kindof okay.
Once you have a way to track assumptions, the rest follows easily. It would be possible to assemble a database of every function ever written, with the ability to just drag and drop it into your program (ehem, Haskellers, why don’t we have this?). The natural version control system, another step in the direction of the DVCSes, tracks changes between individual objects, and how those changes induce changes in the objects that reference them (this is the part I have thought through the most, but I will forgo describing it for it is a small detail and this post is already very long).
A singularity-aware culture should keep its eye on a way to bring the next inevitable order of magnitude of progress. I believe the elimination of integration issues is one such way, and Haskell, for the most part, has already eliminated them. The first step is to bring this property to the popular languages in any way we can (this includes making a Haskell a popular language… LOL).
Did you enjoy reading this article? Let me know! :-)
I just lost my job. This happening was unexpected except to the deep knowledge of my subconscious, who, for the last week, has been practicing what to say when it happened. The reason is that my “level of productivity is not sufficiently high to justify [my] contract rate”. I agree with this, and although the whole thing has got me feeling a tad worthless, I am not beating myself up about it. The project was less familiar territory than I had expected, being of primarily object-oriented design (rather than functional — and yes “functional design” is a real thing). I have done a lot of OO programming in my day, but since I have started functional programming, my OO code never seems good enough. I now spend a good proportion of my time searching for cleaner, smaller, more obviously correct solutions, when the code that I am criticizing would have been just dandy 5 years ago.
Anyway I won’t blather on anymore justifying why I am still a good programmer nonetheless. This post is about the future.
In any case, I feel like the event woke me up. I was spending my 25 hours working on a project I didn’t really care about (despite it being fun and working with good people), a few hours a week on my own projects, and the rest of the time on who-knows-what. Chilling out, wasting time on the internet, watching netflix, getting high and playing the piano. It’s all a fine, relaxing lifestyle, but I am disappointed that I was spending so little time on my projects. So, I resolve to spend no less time working than I was before — at least 30 hours per week, all for my own goals.
For my own clarity, and for the interested, here is what I’m working on:
- Evo (codename), a “serious” real-time strategy game written in Haskell, with my friend Max. The purpose of this game is twofold: (1) to be a ridiculously fun real-time strategy game (nothing good has come out of that genre for a few years) and (2) to prepare and exposit Haskell as a game development platform. Perhaps it is yet another attempt to solve my 6 year old thesis Game Programming is Too Hard.
- Vatican, a lazy specializing interpreter, taking after the work of Michael Jonathan Thyer’s dissertation. This is largely a research project, in which I would be happy to get 1% of the performance of GHCi — just something to start from. Here is the repository.
- An experimental programming language with a unique view of types: types are just properties (eg. “f :: Int -> Int” is another way of saying “for all x, if x is an Int then f x is an Int”). The language is essentially an untyped lambda calculus, but comes with a proof language in which the usual types and many more sorts of properties can be expressed and verified. There is some (mechanism as-yet undecided) macro facility that allows traditional type inference to be encoded as a macro in the language. There are a lot more ideas for this language floating around, but I am trying to stick to the principle “everything it does, it does perfectly”, so I’m holding off on new features until I know exactly where in the picture they appear.
- I will consider blogging as officially one of my projects, to continue improving my writing. I am converging on a functional design methodology, my antipattern post hinting at its direction. But the feedback I’ve received on that post makes me realize that the argument and the details are not yet clear. I’d like to keep writing about it and exploring this methodology in order to one day pin it to rigorous principles.
30 hours a week of the above, or projects that the above evolve into. I don’t need to be a pedant: I know when I am and am not being productive.
Get up in the morning, work all day. –Philip Glass
It has been two years, twenty-six days, six hours, twenty-four minutes and thirty seconds since my eyes caught the first glimpse of my mistress, functional reactive programming. Her simplicity, beauty, and power has captivated me since, but she has teased me — all of us lonely fools, really — by slipping through my fingers every time, just as I thought I was getting to know her.
A year ago I swore off the cold bitch. She was transforming my life into a single, unhealthy obsession. I needed time to think. Time — there she is again. Maybe by giving myself space, I thought, I could stumble upon a packet of wisdom and finally capture her heart, without really looking for it. But that sort of distance is a lie, my life continues to be about her. Distance only makes the heart grow fonder.
Last week I bumped into her at work. I am not proud of what I did. Forgive me for I am only human, ultimately incapable of suppressing the basal instinct that has ruled my kind for millions of years. That’s right, I implemented her, in the back seat of my Scala convertible. It would seem but a few hour fling on a whiteboard and an editor for the unprivy, but O how it intoxicated me! And it has not, as yet, ended in the heartbreak I have felt over and over again — she is quietly sitting in the back of my codebase, hardly a hundred lines, even doing some good if you can believe it. Maybe I am just fooling myself and this time will be no different. But, at the very least, I feel like I may have learned something.
Perhaps FRP is not the sparkling Holy Grail of interactive functional programming that we have made it out to be. Maybe we cannot find an efficient, general implementation because there is none. Perhaps what we currently know about FRP is what there is to know — a book of heuristics, a collection of experiences and wisdom that tells us which of its features will and will not work together. FRP is not a single feature, you know, it really is a rather large design space, and quite a lot can be done with only pieces of it.
What I did at work was remarkably effective: I sketched how our server would read in FRP language on a whiteboard, and then I implemented a framework which supported exactly the operations I needed and no more. It took hardly two hours, and there is no spacetime leak, no GC nondeterminism. The situation in FRP seems to be something like that of modern theoretical physics: we have a few different theories that all work remarkably well for different kinds of problems, and if only they would work with each other, we would have a unified theory. But mysteriously, they just don’t seem to get along.
A unified theory is more than just something to please the eyes of pure functionalists — it is important to us because it would mean the fall of the IO monad regime. The book of heuristics I claim is FRP still needs a sin bin from which to retrieve the superglue that holds together the pieces of a complete application. So the school of semanticists cannot give a meaning to a complete program, only pieces of it. This makes us unable to prove things about a program, and unsure how to meaningfully compose multiple programs (eg. what kinds of things should and should not be allowed).
There are the quantum field theorists, attempting to bring gravity in to the regime of quantum mechanics; there are the string theorists, attempting to bring the rest of reality into the regime of general relativity. But I, as a physics layman, naively believe that neither of these approaches will end up being the workhorse of the eventual unified theory. I believe that said theory will come with a new way of looking at the world, rather than patching up the old ways. Maybe the expert in me can learn from the layman in me and try to solve the problem instead of clinging to the solution I think I desire.
What is the meaning of an interactive program?
Last summer I got the opportunity to work briefly with Conal Elliott in Belgium. The two weeks I spent there are still making an impact on me; immediately I resonated with his semantic design idea. But the more important ideas were the ones I picked up subconsciously, and I’m just now becoming able to put them in words.
(1) If extreme idealism isn’t working out for you, maybe it is because you are not wholeheartedly following your own ideals.
(2) You must be the change you wish to see in the world (– Mahatma Gandhi). As applied to software: design software as if it were the beautiful paradise you want it to be, then build pieces of the scaffolding back to the status quo.
Both of these were key insights leading me to undertake Dana.
At the time I was desperately trying to create an FRP framework for Haskell, trying in terms of IO and STM and playing messy tricks with the operational semantics of thunks. It felt like cheating. I had the desire to throw away IO altogether in my efforts, just to see where that led me (this forced me to put aside the way I thought an FRP interface should look). This was an instance of (1).
But in Haskell, IO would always be beneath it all. Your functional program always becomes an imperative program at the end of the day. Maybe this dissonance is what is causing the endless FRP difficulties. Maybe we can’t really know what pure FRP is until we have pure turtles all the way down — until there is no impedance mismatch between the functional and the imperative. This set me on researching the idea of a purely functional operating system, which is still the slogan by which I name Dana.
But now Dana is more than that. Now, it is my (2). In this post, I describe how I see the world of software in 30 years. This isn’t about what kind of devices computers will be embedded in or how they are connected; this is about the process of developing software.
Code and program are the same; any fragment of code may be used interactively. Complex programs are just compositions of simpler ones, and composition of programs is a fundamental concept to both users and programmers. Thus power users cannot help but to be programmers.
Computer science has become a strict subfield of applied mathematics. Creating complex software is primarily modeling: the fundamental task in software development is to create a mathematical model of your problem. The model is encoded directly, and then software scaffolding is built around it, verified to the model. Reasonable shops do not dare ship code that hasn’t been verified to correspond to their model.
Development tools — programs that talk about programs — are astoundingly rich and expressive. Simply enter a property about your code. The tool goes off on a model search to try to refute the property. Meanwhile, in a way that my mind cannot yet comprehend, the programmer communicates the “essence” of why the property should be true, and the tool fills in all the details of the proof. The property is added to a database of facts, which everyone can immediately see and use.
Provably correct code is what people do, because the tools have made it so easy that it is an obvious engineering choice (this ease does not come without a resculpting of our brains). Gargantuan test suites are replaced by a few hours of a developer with a proving tool (an exponential increase in productivity came from the fact that theorems compose, but tests don’t).
User interfaces are mostly an artistic task, quite different from the rigorous visualization that modelers do. All objects have a few obvious ways to endow them with primitive user interfaces (see Eros). Then making a nice user interface is a matter of rearranging them, composing inputs and outputs, and drawing/animating things (under the assumption that computers are still monitor, keyboard, and mouse driven — make the appropriate generalizations).
The shops that will be renowned by power users are those who have a great program with a great interface, but they also ship components to the underlying logic. This is trivial for them, because of the ubiquitous separation between interface and logic. Some are still irrationally afraid of giving away their trade secrets by exposing this, so there are still lame, interface-only programs. But these lame programs will not have the chance to participate in the miracles of engineering the decade will bring, a combiatorical build up of the the work of millions of developers, coming together to create the Star Trek computer — or something else, far more advanced.
Computation is an information science, and is subject to the exponential explosion that comes with it. This Utopian vision hinges on correct composition of everything, from programs to facts.
Right now, software is primarily non-compositional. We have libraries and languages, which are built up of smaller libraries and languages. But end products are not designed to be composed; they are mostly designed for the end user only. But the end user is where the money is, thus where a great deal of the effort in the industry is going. Moreover, for advanced programs (say, Maya), the end user is also where a lot of the skill and talent is.
We have to work at this. There is little small-scale economic incentive to make your program composable; you just give your competitors the ability to use your software without you being able to use theirs. And it’s hard for idealistic young CEOs to try such a thing, since the state of software development means you have to put in a lot of work both to make your program composable and to compose other people’s programs. But these things are not fundamentally hard, they are just hard using the languages and methods of today.
Dana is my own first step in this direction. It aims to be an environment which forces everything to be composable and verifiable. This comes at the expense of mental laziness: you will have to seriously rewire your brain to work in it. I don’t believe that the enlightenment can come without this rewiring. I’m desperately trying to rewire my own, to even consider the choices that will be obvious to the generation ahead.
Is your brain going to be ready when the revolution comes?
A lot of software engineering can be described as the art of skillful procrastination. When we’re designing some software, we would like to put off as many decisions as possible as long as possible, while still making progress toward the goal. This procrastination leads to good abstraction, because any code we wrote before we made said decision is independent of that decision, which means the decision can be changed later, leading to “maintainability”.
I would describe the philosophy of the Dana project as: make the right decision or keep thinking. It is a philosophy that only a project without a time limit can afford. This constraint forces reconsideration of the very fundamentals of computer science, since I believe much of our modern infrastructure is built on wrong decisions. I don’t ever expect Dana to be finished; instead, it is a unifying project within which some right decisions may be found.
When Dana becomes something runnable, it will look something like an hourglass: a large body of purely functional code on top, all represented eventually upon an extremely simple core calculus, below which layers of complexity are again added to adapt to existing operating systems and software. The layers above the core are only incidentally related to the layers below it, and I hope by studying whatever the eventual form of the upper layers, we can find ways to rip out the needless complexity below and replace it with something more fitting.
This is all very abstract, and it is hardly clearer in my mind than it is in this picture. However, I’m getting a good sense for what that skinny core is and how it relates to the rest of the system. That is what I describe for the rest of this article.
Despite what we pretend to believe, computer programs are more than their denotational semantics. Hard as we try, there will never be a system in which λx. x and λx. log2(2x) are indistinguishable (over, say, the naturals) due to execution performance. The finiteness of memory and non-instantaneousness of computations are things we will always have to deal with. Using a system which ignores these issues as a fundamental backbone would be a wrong decision.
However, Dana’s mission is to be a purely functional operating system. A key feature of reasoning pure functional programming is that of extensionality: if two functions give the same outputs for the same inputs, they are the same function. So here, λx. x and λx. log2(2x) must be the same function. To throw that away would be to give into just another imprecise, operational notion of computer programs.
At first sight, these two points directly contradict each other. How do we reconcile them?
I rather loosely used the term “same” above. What I mean by that word is that, if two things are the same, then you can safely substitute one for the other in all contexts and not observe a difference. Again there is imprecision in this definition: what do I mean by “observe a difference”? And therein lies our solution.
The core calculus is not something which is executed—not seriously, at least. Instead, it’s a way to define your observations. We come back to our two function “λx. x” and “λx. log2(2x)”—mind the quotes. These are represented in our core calculus not as the functions themselves, but as terms in an abstract syntax tree, in which they are obviously not the same. Call them f and g, respectively. Elsewhere in the core, we might define functions on syntax trees, such as Meaning and Performance. Then we can prove that Meaning(f) = Meaning(g) but Performance(f) ≠ Performance(g). These functions make explicit the varying parameters that cause the two implementations to vary.
This syntax tree might then be passed along to an interpreter, which executes it, respecting the Meaning and Performance functions. This can be proven if the interpreter, too, is implemented on top of the core calculus. But eventually something must leave the system to get to the drivers, and we can’t use the core calculus to prove its correctness anymore. Many methods can be used to convince ourselves of the correctness of the driver, but we can’t do it within Dana—an unfortunate fundamental necessity.
The point is, though, that everything above the core can be proven correct, and it is all coherent: the code we are proving things about is the very same code we’re running, and we don’t have to worry about translation errors. If we can trust the drivers, then we can trust the code.
In the last IΞ post, I introduced the calculus and sketched the construction of some standard mathematical objects. In this post, I will dive a little deeper and construct of all the positive recursive types. The proof will be in an informal style (in particular, omitting the H constraints), but I have little doubt that the proof will go through.
Only a superficial familiarity with IΞ is needed to understand this proof; I adopt a set-theoretic style notation, so the proof should be approachable. Here’s a quick refresher of the primitives of IΞ.
- Membership is application, so when I write , this is translated into IΞ as . Thus sets, unary predicates, and types are all the same thing.
- Because membership and application are identified, universal quantification and the subset relation are also. means “A is a subset of B”, or “x in A implies x in B”. In particular, the pattern can be interpreted as “for every a in A, P a”.
- L is the set of all sets (whose existence does not lead to a contradiction in this system!). I will give its definition at the end of the article.
- Other symbols have their usual interpretation, and I’ll include an appendix which gives them all precise definitions at the end.
Definition: A function is called monotone if .
Intuitively, the monotone functions correspond roughly to the functors in Haskell; they use their parameter in a positive way (appear to the left of an even number of arrows).
Definition: The type recursion combinator μ is defined as: .
We are allowed to define things simply by giving a condition for membership like this. Formally, this definition starts out:
This definition intuitively says, x is in μ F if x is in every type closed under F. We will see that this definition corresponds to a type recursion combinator.
Lemma 1: If F is monotone, then .
Proof. Given ; show . Expanding the definition of μ above:
Given ; show .
Observe : Suppose , show . Since , we have by definition of μ. Letting , we have from above, so .
Therefore, (by the monotonicity of F and ). QED.
Now for the easy direction:
Lemma 2: If F is monotone, then .
Proof. Given ; show .
By the definition of μ, we have . Let . We have by monotonicity of F and Lemma 1, therefore . QED.
Which leads us to the recursion equation.
Theorem: If F is monotone, . (Proof trivial by the two lemmas)
I’m using set equality here, i.e. mutual containment, which is probably a bit weaker than Leibniz equality. It is obvious from the definition that this fixed point is minimal.
Monotonicity is easy to prove for any of the standard positive types (products, coproducts, functions). So we can model a good variety of Haskell data types already; however these are standard inductive (least fixed point) types, no infinite structures allowed. I’m still working on the encoding and analogous proofs for ν (greatest fixed point), which is closer to Haskell.
Hopefully I’ll be able to port many libraries (maybe after a few totality annotations) without having to worry about partiality. But there are more pressing matters, such as finishing my interactive proof assistant for IΞ.
- , the True proposition.
- , the constant function.
- , the set of all objects.
- , function composition.
- , the set of functions from a to b.
- , the set of all sets.
- , implication.
- , universal quantification of types (like forall in Haskell)
- , product type.
- , coproduct type.
Anyone who has spent time trying to implement an FRP library knows the unsafePerformIO story. You may use unsafePerformIO as long as you ensure that the result maintains purely functional semantics. It’s possible to create impure values with unsafePerformIO. It is up to you to “prove” that you have created a pure one. Seems like a decent trade-off.
I put “prove” in quotes for a reason. If you’re doing something nontrivial (i.e. you’re not just using unsafePerformIO . return), you need an operational semantics for IO to prove this. But that’s not all! You are probably depending on some external state inside the unsafePerformIO, which depends on the time and order in which thunks are evaluated. But thunks aren’t part of the operational semantics of IO, they are part of the operational semantics of pure values in Haskell — something we quite explicitly do not have. So you need not only to embrace the ill-definedness of IO, but in fact tie yourself down to a particular operational interpretation of Haskell!
Let’s say I write an HNF evaluator for Haskell. Your unsafePerformIO magic will probably not work on this style of evaluator because the meaning of thunks — and the way they are executed — is quite different in this style.
There are more invariants on a Haskell function than purity and referential transparency. We can of course only implement computable functions. They have to be monotone and continuous. And they might be other things, as well, which someone one day will come along and prove by leveraging properties of the type system, exposed primivites, etc. (see ST for prior art). But they have not accounted for your magic, so their analysis does not include any program using your library.
We are pure functonal programmers. We have chosen a language which vastly restricts what we are allowed to do, because we understand the benefits we reap as a result. However, pretending to understand when we are allowed to cheat only buys us the benefits we know about now, but precludes future benefits from work in the field. By using unsafePerformIO — even in a safe way (or so you think) — you avert the exponential growth of our field.
If you find you can’t express something you feel you should be able to, I suggest one of two things: (1) look deeper until you find an incidental limitation of the language, and attempt to solve it at the language level, or (2) look deeper until you understand why you actually shouldn’t have been able to do that, revealing the truth from behind the curtain of zealous ignorance. In my experience, (2) is much more often successful.
On a more practical note, most of said limitations are about performance, which is not in the semantics’ domain of discourse. It makes sense that our languages wouldn’t be good at such things. Instead of introducing a hack, why not push the field forward and think about what a language which can talk about such things would look like? Each time you run into a limitation, you have a new use case, and thus a new perspective on the problem.
Until that problem is solved, though, your library users might have to pay the price of not having as elegant an interface. But by restricting yourself thusly, you are protecting yourself from your own ignorance, at least knowing that what you have made should, in fact, be makeable.
In summary: It is never safe to cheat.
I have some very exciting news! I wrote some actual code in the Dana repository. It is the IΞ Certificate module. That is, it’s an abstract data type Proof, such that only valid proofs in IΞ can be constructed.
The certificate module (IXi.Term and IXi.Proof together) is about 280 lines, which isn’t fantastically small, but isn’t too bad. This is especially considering that I don’t expect it to grow—it is perfect, modulo bugs.
Right now, Proof is just a proof checker function, but it’s designed so that I could swap it out with some serializable format, or even (crosses fingers) a dependent certified type.
One interesting development in this implementation is the new conversion certificate for lambda calculus with De Bruijn notation. That is, objects of type Conversion represent a valid βη conversion between terms. Previously, I hadn’t been satisfied with my solution: the certificate was implemented as a pair of convertible terms. This led to far too many equality comparisons of terms when combining and using certificates, which is both inefficient and I suspect would be hard to prove things about. Also, it required you to know too much about the goal you were proving, bloating the proof terms and tangling them with the theorems they were trying to prove.
The hardest part is β expansion, for example turning X into (\x.\y.x) X (\x.x). That was the reason for the pair representation: to do beta expansion, you just did a beta reduction and reversed it.
The new implementation instead implements conversions as partial functions. I.e. you give a conversion a source term, and it gives you an equivalent term (or says it couldn’t convert). This means I had to separately model beta reduction and beta expansion, because you can’t easily reverse a conversion. However, the solution is quite clean. I chose a basis of expansion combinators, which can be composed to form any expansion. They are:
Identity : A → (\x. x) A Constant : A → (\x. A) B [x not free in A] Apply : (\x. A) C ((\x. B) C) → (\x. A B) C Lambda : \y. (\x. A) B → (\x. \y. A) B [y not free in B]
This is in addition to the other combinators, which are needed to make this basis complete. They include β reduction, η expansion/contraction, and ways to focus a conversion on a subexpression. The key is that each combinator is correct by inspection, so we can be confident that the conversion algebra is sound.
I chose these combinators by thinking about what would be needed to construct the inverse conversion from bringing a term to normal form. If you’re familiar with SKI factorization, the process is pretty similar. Whenever you reduce an application (\x. A) B, you look at the structure of A and “push” B in by one level, applying one of these combinators. For example:
|(\f. \y. f (f y)) (\x. x)||Lambda|
|\y. (\f. f (f y)) (\x. x)||inLambda Apply|
|\y. (\f. f) (\x. x) ((\f. f y) (\x. x))||inLambda (inLeft Identity)|
|\y. (\x. x) ((\f. f y) (\x. x))||inLambda Identity|
|\y. (\f. f y) (\x. x)||inLambda Apply|
|\y. (\f. f) (\x. x) ((\f. y) (\x. x))||inLambda (inLeft Identity)|
|\y. (\x. x) ((\f. y) (\x. x))||inLambda Identity|
|\y. (\f. y) (\x. x)||inLambda (Constant (\x. x))|
The reverse composition of the conversions on the right will bring us from \y. y to (\f. \y. f (f y)) (\x. x).
But isn’t it an awful pain to write all those combinators when proving things? Of course not! I make a computer do it for me. I have a little algorithm which takes two terms and computes a conversion between them, by bringing them both to normal form, and using the forward conversions one way and the inverse conversions the other way. Of course, if I give it terms which have no normal form it won’t halt, but the idea is that these terms are static: I use dummy terms to explain the conversion I want, and then apply the conversion I got back to the real terms (which may have subterms without normal forms).
So I say: get me from (\x y. A x) A I to (\x. x x) A, where “A” and “I” are just strings, and then I apply the conversion I got back to, say, (\x. \y. WW x) (WW) (\x. x), where WW has no normal form. The conversion still succeeds.
The certificate pattern shines here: my constructors are easy to verify, then I have a fairly involved algorithm for constructing certificates that is easy to use, which is guaranteed (at least one sort of) correct by construction.
So that’s fun stuff.
Proofs are still pretty tedious, however. My next step is to make some smart “tactic” combinators (which of course generate the underlying certificates) to make proofs easier. It shouldn’t take too long to make it at least tolerable. Then I’ll build up a library of certified infrastructure necessary for typechecking
Haskell--, and finally write the compiler to complete the bootstrap. There are plenty of dragons to be slain along the way.
Over the past couple months, I have been attempting to find a language to use as a core calculus for Dana, as anyone who follows this blog knows. I have been slowly converging on IΞ for its elegance and simplicity. It turns out to be very powerful, capable of modeling many structures while avoiding many of the paradoxes typically involved with those structures. In this post, I give an exposition of IΞ and construct some common structures.
The main idea is to use lambda calculus for all of our symbol manipulation needs, adding a constant Ξ for quantification. “Types” are unary predicates, so membership is application. For example, “Nat 0″ is a proposition saying that 0 has type Nat (where 0 and Nat are given previous definitions somewhere).
ΞXY means “forall x, if X x, then Y x”, or, if you think of predicates as sets, simply “X is a subset of Y”. So if we have types Nat and Bool, we can say that f has type Nat → Bool with “ΞNat(\x. Bool (f x))”, read “forall x in Nat, f x is in Bool”. Very direct, is it not?
Interpreting these meanings directly, we arrive at Curry’s System I, whose rules follow. A proposition (sequent) has the form “Γ |- X”, which has the interpretation “assuming Γ, X is provable”.
I have stated the rules as “to prove X, you need to prove Y”, because that’s kinda how my brain works. Take a moment to internalize them. They are obvious given the above intuitions, and shouldn’t be hard to read.
On top of this, we can build propositional calculus. Define “K = \x. \y. x”, and write “X ⇒ Y” as shorthand for “Ξ(KX)(KY)”. This system has the properties you would expect of a simple propositional calculus.
Sadly, this system is inconsistent. We can prove any proposition X:
Let Y = (\x. x x ⇒ X) (\x. x x ⇒ X) Observe that Y = Y ⇒ X.  Y |- Y  Y |- Y ⇒ X  Y |- X -- modus ponens on ,  |- Y ⇒ X  |- Y  |- X -- modus ponens on ,
Martin Bunder had a brilliant idea to block this paradox, and also many others, which brings us to the object of my infatuation:
The crucial step above was -, where we abstracted over the infinite proposition “((… ⇒ X) ⇒ X) ⇒ X”. The way we will block this is to only allow abstraction over finite propositions. Introduce a new symbol, H, such that HX means “X is a finite proposition” (or simply “X is a proposition”). We will derive finiteness from the finiteness of proofs: to prove HX, we first have to prove H of each component of X. Our system becomes (the new additions are in bold):
The final rule is an axiom, simply saying that “X is a proposition” is always a proposition.
Constructing the Naturals
Now I will embark on constructing the type of naturals. Since types are predicates, I also need to come up with a representation for naturals. It turns out that it doesn’t matter what representation I use, as long as it has zero and a 1-1 successor function. For the sake of discussion, let’s use the Church encoding.
0 = \f. \x. x S n = \f. \x. f (n f x)
So a natural is an iteration function. For example, the number 3 iterates a function 3 times on its argument: 3 f x = f (f (f x)).
Coming up with a way to classify all of these, but not any others (such as infinity f x = f (f (f (f (f ...))))), was quite a challenge. You might try to classify these function on the property that they are an iteration function, but any sensible method of doing that ends up including infinity. I began thinking that IΞ was not strong enough, and looking for ways to enrich it by adding more axioms.
Fortunately, no new axioms are necessary! The encoding is obvious in retrospect. What is the first thing a mathematician thinks when you talk about the naturals: induction! Let’s define a natural as any object which you can do induction (with 0 and S) over.
To make this readable, we need to introduce a few more symbols:
f . g = \x. f (g x) -- composition A → B = \f. ΞA(B . f) -- the type of functions from A to B True = ΞHH -- the true proposition U = K True -- the type of all objects (mnemonic: universe) L = U → H -- the type of predicates/types
Now, for the naturals:
Nat x = ΞL(\p. p 0 ⇒ ΞU(\n. p n ⇒ p (S n)) ⇒ p x)
Reading this in English: “x is a Natural if, for all predicates p, if p 0, and p n implies p (S n), then p x”. In other words, x is a natural if you can do induction up to it.
Exercise: prove |- Nat 0 and |- (Nat → Nat) S. Note that this can be done independent of our definitions of 0 and S.
More Inductive Constructions
Using this scheme, we can construct all sorts of things. For example, equality:
Eq x y = ΞL(\p. p x ⇒ p y)
nil = \n. \c. n cons x xs = \n. \c. c x xs List a x = ΞL(\p. p nil ⇒ ΞU(\ys. p ys ⇒ Ξa(\y. p (cons y ys))) ⇒ p x)
There is a classic paradox involving the inductive type Set = Set → H, which is definable using this scheme:
Set x = ΞL(\p. Ξ(p → H)p ⇒ p x)
However, as I tried to prove the inconsistency, I was blocked by the H rules. This gives me hope.
It is also possible to construct coinductive types. Here are the “conaturals”, the naturals with infinity. We can’t use the constructor definition anymore; coinductive types are all about projections. So the conaturals have only one projection, onto a disjunction. So to eliminate a conatural n, you pass it what to do if n = 0, and what to do with n’ if n = S n’. For example, to check if a conatural is zero, you can use n True (K False).
CoNat x = ΞL(\p. Ξp(\y. y True p) ⇒ p x)
In English: x is a conatural if for all predicates p, if p y implies y True p, then p x. y True p can be interpreted as y = 0 or y = S n and p n.
Isn’t IΞ cool? Simple, obvious, but powerful enough to model a wide class of data (and codata). The thing I like best about it is that it is untyped at its core. Functions are just pure lambda calculus functions that we are “later” proving properties about. Type erasure comes for free (well, sortof: encoding dependent types into this system will end up passing types as parameters at runtime, even though they are never used).
Bunder proved his version of IΞ consistent and complete. But the rules I gave, and that I have been using, are not his, and in fact are more powerful than his. This makes it possible that my rules are inconsistent. My system can prove LL, while his cannot. This worries me, because LL in the typical interpretation means “Type : Type”, which gives rise to the Burali-Forti paradox. However, even though I was able to encode the infrastructure for Russell’s paradox, I was unable to complete the proof because of the H rules. Maybe the same thing will happen?
I’m spending a bit of time trying to understand the Burali-Forti paradox that gives rise to the inconsistency in Girard’s System U, so that I can try to carry it out in IΞ. If IΞ does turn out to be inconsistent, I am not worried. A few universe restrictions here and there (i.e. leave U unspecified, rather than defining as K T) should do the trick, at the expense of some convenience and beauty.
Also, since I intend to compile Haskell to IΞ, I need to talk about partiality somehow (eg. prove that if a function terminates, it has this type). This has been giving me trouble, but in a good way. I found that I don’t really understand how bottoms behave, and how to talk about them without implying their semantics. I’m confident there is a way, though, maybe after adding a few axioms as a compromise. But I need to think hard and understand bottoms better, so I know what ought to be true, before I try to prove them or add them as axioms.
I’m trying really hard not to become a logician. Like my obsession with FRP, it would be very interesting and educational. But my FRP fancy came from a desire to make games more easily, and I have since lost interest in that endeavor, studying FRP for its own sake. Now I am trying to change the world with Dana, and getting caught up in the beauty and unity of different logical systems.
This happened when trying to choose a core calculus for Dana. I am now furiously interested in Martin Bunder’s work on combinatory logic (btw, if anybody has a copy of his PhD thesis, “a one axiom set theory based on combinatory logic”, please let me know). System IΞ — or rather, systems nearby it — strike me as amazingly beautiful. It is based on an untyped lambda calculus, in which you can prove things about untyped functions (which is a way of endowing them with types). For example, to say that f has type A → B, you say:
Ξ A (B ∘ f)
In English: for all x in A, f x is in B.
However, the core logic really isn’t that important; I’ve only been focusing on it because it’s interesting. In fact, a cool thing about Dana is that there is very little dependency between its parts. But I would really like to start making something rather than researching. How come math is so fascinating?
Anyway, I am not sure that IΞ is strong enough. Assuming a “big enough” universe, I’ve been able to construct an equality predicate (the construction is essentially “the smallest reflexive relation”). But I have had little success in constructing any inductive types, such as the naturals. That’s why I want to read Bunder’s thesis — to get ideas.
Not just system IΞ, but logic in general, is fascinating me. Large cardinals in set theory, universe levels in CIC, and related “stratification” ideas abound and unify to create some intuitive notion of truth. In some sense, truth is the strongest consistent such unverse — however there is provably no such thing. In what system should we then work? Is it essential that picking a system of axioms in which to work will always be a part of mathematics? How do you consolidate results which assume different axioms?
That is actually my current goal for Dana’s core. I think the core calculus will be very weak, and you add axioms as you need more (in line with a quote from Dr. Scott himself: “If you want more you have to assume more”). Such axioms will have the same pattern as e.g. the IO monad in Haskell; your assumptions bubble their way to the top. However, it’s a much richer system than “IO or not IO”; you know exactly what you are assuming to run any piece of code. If there is a top level “user OS”, its assumptions will be vast (or maybe there’s some clever way to incrementally add them?).
Anyway, if the itch to make something irritates me so, I can assume I have a strong core logic — whatever it is — and start building things at a higher level. It’s emotionally difficult for me to do so, because I like to feel like I am on a strong foundation (isn’t that the whole point of Dana, after all?).