In the last IΞ post, I introduced the calculus and sketched the construction of some standard mathematical objects. In this post, I will dive a little deeper and construct of all the positive recursive types. The proof will be in an informal style (in particular, omitting the H constraints), but I have little doubt that the proof will go through.
Only a superficial familiarity with IΞ is needed to understand this proof; I adopt a set-theoretic style notation, so the proof should be approachable. Here’s a quick refresher of the primitives of IΞ.
- Membership is application, so when I write , this is translated into IΞ as . Thus sets, unary predicates, and types are all the same thing.
- Because membership and application are identified, universal quantification and the subset relation are also. means “A is a subset of B”, or “x in A implies x in B”. In particular, the pattern can be interpreted as “for every a in A, P a”.
- L is the set of all sets (whose existence does not lead to a contradiction in this system!). I will give its definition at the end of the article.
- Other symbols have their usual interpretation, and I’ll include an appendix which gives them all precise definitions at the end.
Definition: A function is called monotone if .
Intuitively, the monotone functions correspond roughly to the functors in Haskell; they use their parameter in a positive way (appear to the left of an even number of arrows).
Definition: The type recursion combinator μ is defined as: .
We are allowed to define things simply by giving a condition for membership like this. Formally, this definition starts out:
This definition intuitively says, x is in μ F if x is in every type closed under F. We will see that this definition corresponds to a type recursion combinator.
Lemma 1: If F is monotone, then .
Proof. Given ; show . Expanding the definition of μ above:
Given ; show .
Observe : Suppose , show . Since , we have by definition of μ. Letting , we have from above, so .
Therefore, (by the monotonicity of F and ). QED.
Now for the easy direction:
Lemma 2: If F is monotone, then .
Proof. Given ; show .
By the definition of μ, we have . Let . We have by monotonicity of F and Lemma 1, therefore . QED.
Which leads us to the recursion equation.
Theorem: If F is monotone, . (Proof trivial by the two lemmas)
I’m using set equality here, i.e. mutual containment, which is probably a bit weaker than Leibniz equality. It is obvious from the definition that this fixed point is minimal.
Monotonicity is easy to prove for any of the standard positive types (products, coproducts, functions). So we can model a good variety of Haskell data types already; however these are standard inductive (least fixed point) types, no infinite structures allowed. I’m still working on the encoding and analogous proofs for ν (greatest fixed point), which is closer to Haskell.
Hopefully I’ll be able to port many libraries (maybe after a few totality annotations) without having to worry about partiality. But there are more pressing matters, such as finishing my interactive proof assistant for IΞ.
- , the True proposition.
- , the constant function.
- , the set of all objects.
- , function composition.
- , the set of functions from a to b.
- , the set of all sets.
- , implication.
- , universal quantification of types (like forall in Haskell)
- , product type.
- , coproduct type.
I have some very exciting news! I wrote some actual code in the Dana repository. It is the IΞ Certificate module. That is, it’s an abstract data type Proof, such that only valid proofs in IΞ can be constructed.
The certificate module (IXi.Term and IXi.Proof together) is about 280 lines, which isn’t fantastically small, but isn’t too bad. This is especially considering that I don’t expect it to grow—it is perfect, modulo bugs.
Right now, Proof is just a proof checker function, but it’s designed so that I could swap it out with some serializable format, or even (crosses fingers) a dependent certified type.
One interesting development in this implementation is the new conversion certificate for lambda calculus with De Bruijn notation. That is, objects of type Conversion represent a valid βη conversion between terms. Previously, I hadn’t been satisfied with my solution: the certificate was implemented as a pair of convertible terms. This led to far too many equality comparisons of terms when combining and using certificates, which is both inefficient and I suspect would be hard to prove things about. Also, it required you to know too much about the goal you were proving, bloating the proof terms and tangling them with the theorems they were trying to prove.
The hardest part is β expansion, for example turning X into (\x.\y.x) X (\x.x). That was the reason for the pair representation: to do beta expansion, you just did a beta reduction and reversed it.
The new implementation instead implements conversions as partial functions. I.e. you give a conversion a source term, and it gives you an equivalent term (or says it couldn’t convert). This means I had to separately model beta reduction and beta expansion, because you can’t easily reverse a conversion. However, the solution is quite clean. I chose a basis of expansion combinators, which can be composed to form any expansion. They are:
Identity : A → (\x. x) A Constant : A → (\x. A) B [x not free in A] Apply : (\x. A) C ((\x. B) C) → (\x. A B) C Lambda : \y. (\x. A) B → (\x. \y. A) B [y not free in B]
This is in addition to the other combinators, which are needed to make this basis complete. They include β reduction, η expansion/contraction, and ways to focus a conversion on a subexpression. The key is that each combinator is correct by inspection, so we can be confident that the conversion algebra is sound.
I chose these combinators by thinking about what would be needed to construct the inverse conversion from bringing a term to normal form. If you’re familiar with SKI factorization, the process is pretty similar. Whenever you reduce an application (\x. A) B, you look at the structure of A and “push” B in by one level, applying one of these combinators. For example:
|(\f. \y. f (f y)) (\x. x)||Lambda|
|\y. (\f. f (f y)) (\x. x)||inLambda Apply|
|\y. (\f. f) (\x. x) ((\f. f y) (\x. x))||inLambda (inLeft Identity)|
|\y. (\x. x) ((\f. f y) (\x. x))||inLambda Identity|
|\y. (\f. f y) (\x. x)||inLambda Apply|
|\y. (\f. f) (\x. x) ((\f. y) (\x. x))||inLambda (inLeft Identity)|
|\y. (\x. x) ((\f. y) (\x. x))||inLambda Identity|
|\y. (\f. y) (\x. x)||inLambda (Constant (\x. x))|
The reverse composition of the conversions on the right will bring us from \y. y to (\f. \y. f (f y)) (\x. x).
But isn’t it an awful pain to write all those combinators when proving things? Of course not! I make a computer do it for me. I have a little algorithm which takes two terms and computes a conversion between them, by bringing them both to normal form, and using the forward conversions one way and the inverse conversions the other way. Of course, if I give it terms which have no normal form it won’t halt, but the idea is that these terms are static: I use dummy terms to explain the conversion I want, and then apply the conversion I got back to the real terms (which may have subterms without normal forms).
So I say: get me from (\x y. A x) A I to (\x. x x) A, where “A” and “I” are just strings, and then I apply the conversion I got back to, say, (\x. \y. WW x) (WW) (\x. x), where WW has no normal form. The conversion still succeeds.
The certificate pattern shines here: my constructors are easy to verify, then I have a fairly involved algorithm for constructing certificates that is easy to use, which is guaranteed (at least one sort of) correct by construction.
So that’s fun stuff.
Proofs are still pretty tedious, however. My next step is to make some smart “tactic” combinators (which of course generate the underlying certificates) to make proofs easier. It shouldn’t take too long to make it at least tolerable. Then I’ll build up a library of certified infrastructure necessary for typechecking
Haskell--, and finally write the compiler to complete the bootstrap. There are plenty of dragons to be slain along the way.
Over the past couple months, I have been attempting to find a language to use as a core calculus for Dana, as anyone who follows this blog knows. I have been slowly converging on IΞ for its elegance and simplicity. It turns out to be very powerful, capable of modeling many structures while avoiding many of the paradoxes typically involved with those structures. In this post, I give an exposition of IΞ and construct some common structures.
The main idea is to use lambda calculus for all of our symbol manipulation needs, adding a constant Ξ for quantification. “Types” are unary predicates, so membership is application. For example, “Nat 0″ is a proposition saying that 0 has type Nat (where 0 and Nat are given previous definitions somewhere).
ΞXY means “forall x, if X x, then Y x”, or, if you think of predicates as sets, simply “X is a subset of Y”. So if we have types Nat and Bool, we can say that f has type Nat → Bool with “ΞNat(\x. Bool (f x))”, read “forall x in Nat, f x is in Bool”. Very direct, is it not?
Interpreting these meanings directly, we arrive at Curry’s System I, whose rules follow. A proposition (sequent) has the form “Γ |- X”, which has the interpretation “assuming Γ, X is provable”.
I have stated the rules as “to prove X, you need to prove Y”, because that’s kinda how my brain works. Take a moment to internalize them. They are obvious given the above intuitions, and shouldn’t be hard to read.
On top of this, we can build propositional calculus. Define “K = \x. \y. x”, and write “X ⇒ Y” as shorthand for “Ξ(KX)(KY)”. This system has the properties you would expect of a simple propositional calculus.
Sadly, this system is inconsistent. We can prove any proposition X:
Let Y = (\x. x x ⇒ X) (\x. x x ⇒ X) Observe that Y = Y ⇒ X.  Y |- Y  Y |- Y ⇒ X  Y |- X -- modus ponens on ,  |- Y ⇒ X  |- Y  |- X -- modus ponens on ,
Martin Bunder had a brilliant idea to block this paradox, and also many others, which brings us to the object of my infatuation:
The crucial step above was -, where we abstracted over the infinite proposition “((… ⇒ X) ⇒ X) ⇒ X”. The way we will block this is to only allow abstraction over finite propositions. Introduce a new symbol, H, such that HX means “X is a finite proposition” (or simply “X is a proposition”). We will derive finiteness from the finiteness of proofs: to prove HX, we first have to prove H of each component of X. Our system becomes (the new additions are in bold):
The final rule is an axiom, simply saying that “X is a proposition” is always a proposition.
Constructing the Naturals
Now I will embark on constructing the type of naturals. Since types are predicates, I also need to come up with a representation for naturals. It turns out that it doesn’t matter what representation I use, as long as it has zero and a 1-1 successor function. For the sake of discussion, let’s use the Church encoding.
0 = \f. \x. x S n = \f. \x. f (n f x)
So a natural is an iteration function. For example, the number 3 iterates a function 3 times on its argument: 3 f x = f (f (f x)).
Coming up with a way to classify all of these, but not any others (such as infinity f x = f (f (f (f (f ...))))), was quite a challenge. You might try to classify these function on the property that they are an iteration function, but any sensible method of doing that ends up including infinity. I began thinking that IΞ was not strong enough, and looking for ways to enrich it by adding more axioms.
Fortunately, no new axioms are necessary! The encoding is obvious in retrospect. What is the first thing a mathematician thinks when you talk about the naturals: induction! Let’s define a natural as any object which you can do induction (with 0 and S) over.
To make this readable, we need to introduce a few more symbols:
f . g = \x. f (g x) -- composition A → B = \f. ΞA(B . f) -- the type of functions from A to B True = ΞHH -- the true proposition U = K True -- the type of all objects (mnemonic: universe) L = U → H -- the type of predicates/types
Now, for the naturals:
Nat x = ΞL(\p. p 0 ⇒ ΞU(\n. p n ⇒ p (S n)) ⇒ p x)
Reading this in English: “x is a Natural if, for all predicates p, if p 0, and p n implies p (S n), then p x”. In other words, x is a natural if you can do induction up to it.
Exercise: prove |- Nat 0 and |- (Nat → Nat) S. Note that this can be done independent of our definitions of 0 and S.
More Inductive Constructions
Using this scheme, we can construct all sorts of things. For example, equality:
Eq x y = ΞL(\p. p x ⇒ p y)
nil = \n. \c. n cons x xs = \n. \c. c x xs List a x = ΞL(\p. p nil ⇒ ΞU(\ys. p ys ⇒ Ξa(\y. p (cons y ys))) ⇒ p x)
There is a classic paradox involving the inductive type Set = Set → H, which is definable using this scheme:
Set x = ΞL(\p. Ξ(p → H)p ⇒ p x)
However, as I tried to prove the inconsistency, I was blocked by the H rules. This gives me hope.
It is also possible to construct coinductive types. Here are the “conaturals”, the naturals with infinity. We can’t use the constructor definition anymore; coinductive types are all about projections. So the conaturals have only one projection, onto a disjunction. So to eliminate a conatural n, you pass it what to do if n = 0, and what to do with n’ if n = S n’. For example, to check if a conatural is zero, you can use n True (K False).
CoNat x = ΞL(\p. Ξp(\y. y True p) ⇒ p x)
In English: x is a conatural if for all predicates p, if p y implies y True p, then p x. y True p can be interpreted as y = 0 or y = S n and p n.
Isn’t IΞ cool? Simple, obvious, but powerful enough to model a wide class of data (and codata). The thing I like best about it is that it is untyped at its core. Functions are just pure lambda calculus functions that we are “later” proving properties about. Type erasure comes for free (well, sortof: encoding dependent types into this system will end up passing types as parameters at runtime, even though they are never used).
Bunder proved his version of IΞ consistent and complete. But the rules I gave, and that I have been using, are not his, and in fact are more powerful than his. This makes it possible that my rules are inconsistent. My system can prove LL, while his cannot. This worries me, because LL in the typical interpretation means “Type : Type”, which gives rise to the Burali-Forti paradox. However, even though I was able to encode the infrastructure for Russell’s paradox, I was unable to complete the proof because of the H rules. Maybe the same thing will happen?
I’m spending a bit of time trying to understand the Burali-Forti paradox that gives rise to the inconsistency in Girard’s System U, so that I can try to carry it out in IΞ. If IΞ does turn out to be inconsistent, I am not worried. A few universe restrictions here and there (i.e. leave U unspecified, rather than defining as K T) should do the trick, at the expense of some convenience and beauty.
Also, since I intend to compile Haskell to IΞ, I need to talk about partiality somehow (eg. prove that if a function terminates, it has this type). This has been giving me trouble, but in a good way. I found that I don’t really understand how bottoms behave, and how to talk about them without implying their semantics. I’m confident there is a way, though, maybe after adding a few axioms as a compromise. But I need to think hard and understand bottoms better, so I know what ought to be true, before I try to prove them or add them as axioms.
I’m trying really hard not to become a logician. Like my obsession with FRP, it would be very interesting and educational. But my FRP fancy came from a desire to make games more easily, and I have since lost interest in that endeavor, studying FRP for its own sake. Now I am trying to change the world with Dana, and getting caught up in the beauty and unity of different logical systems.
This happened when trying to choose a core calculus for Dana. I am now furiously interested in Martin Bunder’s work on combinatory logic (btw, if anybody has a copy of his PhD thesis, “a one axiom set theory based on combinatory logic”, please let me know). System IΞ — or rather, systems nearby it — strike me as amazingly beautiful. It is based on an untyped lambda calculus, in which you can prove things about untyped functions (which is a way of endowing them with types). For example, to say that f has type A → B, you say:
Ξ A (B ∘ f)
In English: for all x in A, f x is in B.
However, the core logic really isn’t that important; I’ve only been focusing on it because it’s interesting. In fact, a cool thing about Dana is that there is very little dependency between its parts. But I would really like to start making something rather than researching. How come math is so fascinating?
Anyway, I am not sure that IΞ is strong enough. Assuming a “big enough” universe, I’ve been able to construct an equality predicate (the construction is essentially “the smallest reflexive relation”). But I have had little success in constructing any inductive types, such as the naturals. That’s why I want to read Bunder’s thesis — to get ideas.
Not just system IΞ, but logic in general, is fascinating me. Large cardinals in set theory, universe levels in CIC, and related “stratification” ideas abound and unify to create some intuitive notion of truth. In some sense, truth is the strongest consistent such unverse — however there is provably no such thing. In what system should we then work? Is it essential that picking a system of axioms in which to work will always be a part of mathematics? How do you consolidate results which assume different axioms?
That is actually my current goal for Dana’s core. I think the core calculus will be very weak, and you add axioms as you need more (in line with a quote from Dr. Scott himself: “If you want more you have to assume more”). Such axioms will have the same pattern as e.g. the IO monad in Haskell; your assumptions bubble their way to the top. However, it’s a much richer system than “IO or not IO”; you know exactly what you are assuming to run any piece of code. If there is a top level “user OS”, its assumptions will be vast (or maybe there’s some clever way to incrementally add them?).
Anyway, if the itch to make something irritates me so, I can assume I have a strong core logic — whatever it is — and start building things at a higher level. It’s emotionally difficult for me to do so, because I like to feel like I am on a strong foundation (isn’t that the whole point of Dana, after all?).
I don’t have much time, so here’s just a quick jot. I don’t think System IG can prove this rather obvious proposition:
|- GL(\a. Ga(Ka)) (\a x. x)
Or translated to more conventional notation: (\a x. x) : (a : Type) -> (a -> a), the type of the polymorphic identity function. In order to prove this, we need the axiom LL (conventionally, Type : Type). But that axiom is inconsistent. So I’ve tweaked the rules a bit to allow the above to be proven, without allowing LL. It’s possible that these tweaks are still consistent; they don’t admit Girard’s paradox, at least.
The rules I’m modifying can be found on page 8 of the above paper, in the IG box. My tweak is, changing rule Gi from:
?, Xx |- Yx(Zx) ; ? |- LX ; x not in FV(?,X,Y,Z) => ? |- GXYZ
?, Xx |- Yx(Zx) ; ? |- L(GXY) ; x not in FV(?,X,Y,Z) => ? |- GXYZ
I’m saying that you need to prove that GXY (conventionally, (x:X) -> Y x) is a type before you can say something has that type. Without more modifications, this is equivalent to the original. However, now I add two new rules:
? |- L(GL(KL)) ? |- GL(KL)p => ? |- L(GLp)
These say: (Type -> Type) : Type, and if p : Type -> Type then ((x:Type) -> p x) : Type. The latter one is the one I’ve heard is still consistent, but you can’t really do anything with it without the former. So, it needs further investigation.
Trying to prove the consistency of this new system, I need a semantics for terms. If I can model these semantics in CIC (the calculus used by Coq), then if CIC is consistent, so is this system. My idea for semantics is as follows:
Bare lambda terms don’t really have semantics; they are modeled by their syntax. But typing proofs have the semantics of whatever they are proving as well-typed. If I use [-] as the meaning function, then the semantics of [Lx] is a type (because it is a proof that x is a type). Formally:
[Lx] : Type [Gxyz] : (a:[Lx]) -> [Gx(KL)y] a
Simplified, the latter means that the semantics of a proof that x has type y -> z is a function from y to z. (Rather, it’s the dependent version of that). These semantics motivate the axioms I am adding.
Okay, that’s all the time I have!
After an inquiry about a combinatory calculus for dependent types, somebody in #haskell — unfortunately I don’t remember who — recommended I look at Illative Combinatory Logic. After studying the paper a bit, all I can say is Wow! It implements a (conjecturally) complete logical system in but three simple typing rules, together with the untyped lambda calculus conversion rules. I was able to write a dependent typechecker in 100 lines.
The specific system I’m working with in the paper is called system IG. The paper also introduces three weaker systems, which are also very nice and worth understanding.
As an example, here is a derivation of the identity function and its type. This is a backward proof, so each step is implied by the step after it. But reading it this way makes the algorithm clear.
|- G L (\A. G A (\_. A)) (\A x. x) L a |- (\A. G A (\_. A)) a ((\A x. x) a) (Gi) L a |- G a (\_. a) ((\A x. x) a) (beta) L a |- G a (\_. a) (\x. x) (beta) L a, a b |- (\_. a) b ((\x. x) b) (Gi) L a, a b |- a ((\x. x) b) (beta) L a, a b |- a b (beta)
The trick, which isn’t so much a trick as a beautiful deep connection I gather (although I don’t fully grok it), is that typing propositions are just applications. So if A is a type, then A b is the proposition “b has type A”. L is the type of types, and G is essentially Π (the product type constructor). So, for example, the fourth line in the above proof would be written as follows in a conventional system:
a : Type |- (\x. x) : a -> a
I need to extend it with finite types and boxes, but I really don’t want to because it’s so pretty as is! :-)