Tag Archives: math

The essence of metastrategy

A recent post on Less Wrong, Levels of Action, reminded me of a game I created whose dynamics I wanted to explore. I still have not explored the dynamics to a great level of depth, but I thought it would be interesting to the nerdy community that reads my blog.

The idea came after playing Castle Wars 2. In that game you try to build your castle as tall as possible while keeping your opponents castle as short as possible. The basic game dynamic is an action/meta-action trade off: (oversimplifying) you can play a card to gain 10 bricks, or you can play a card to gain one brick per turn for the rest of the game. I was surprised by the amount of subtlety derived from such a simple dynamic, and I recommend the game to anyone wanting to kill an hour. It’s not the best game ever, but it’s not as trivial as it at first seems.

I wondered what would happen if I removed the cards, the weapons, the defense from that game and replaced them with more levels of this same dynamic. Here’s what I came up with.

You can play it with a chessboard and poker chips (my old game design standby). You don’t need 6 of the rows of the board. Each player plays on a side of the board, and has eight squares which we will label, from left to right, 1 to 8. Each square can have up to eight chips in it. The goal of the game is to get eight chips in the eighth square. Here is how play proceeds:

On your turn, place a single chip in any of your squares. Your opponent does the same. Before each turn, “cancel out” any chips that both players have on corresponding squares. That is, if you have 4 chips on the 5th square, and your opponent has 5 chips on the 5th square, remove 4 chips from both, so that you have none and your opponent has one. Then (still before your turn) duplicate each square to the next higher position and truncate down to 8. So before this action if your eight squares had these values:

0 0 1 4 5 2 0 3

Then after this action, the state of your board will be:

0 0 1 5 8 7 2 3

Another way to think about it is that you slide a copy of your board one position to the right and add (then truncate).

Then place your new chip, and your opponent takes his turn. That’s it. The first player to eight in the eighth square wins.

Despite this game’s simplicity, I have been unable to devise a good strategy for it. The strategy for the game seems to revolve around estimating the length of the game. If you know how many turns the game will last, it is fairly easy to determine how to play optimally. But knowing how long the game will last is not so easy to determine.

Try it out, think about it. Let me know if you discover anything.

Like this post?
Flattr this


I’m a student of the University of Colorado again. I’ve gone back to finish my bachelor’s in mathematics, which essentially involves fulfilling a bunch of core requirements. I’m going to start the discussion by mixing my experience of one class (religions of south Asia) with a concept from another class (connectedness from topology).

Last spring I took my (now ex-)’:girlfriend on a trip to Hawaii. While we were there, we attended a weekend immersive class on Sanskrit. The class was very “new-agey” — we chanted, meditated, in addition to learning Devanagari (the Sanskrit/Hindi alphabet) and something about Indian religion. The ideas combined with the approach fascinated and inspired me. I have never been much of a religious person; the religious ideas I had heard of always sounded a bit naive and silly. But this new approach gave me a glimpse of another way of looking at the world: the words of the Bhagavad Gita played with the gods, using them half as entities, half as concepts. The philosophical ideas, language, and religion we studied were clearly inseparable, all connected and synthesized into a single world view. Further, this world view seemed to incorporate my objections to the naivety of western world views — emphasizing the duality in all things, focusing not so much on right and wrong but on purpose and spirit, using the malleability and metaphor of truth.

My curiosity whetted, I enrolled in a class about Hinduism at the university. So far it has been a disappointment. What drew me to these ideas in the first place was the connectedness and duality — the yin and yang, so to speak — I perceived in the world view. And we have started by drawing thick lines categorizing the different approaches to divinity. An especially potent event in bringing to my attention my disappointment with the class occurred during our discussion of Bhakti. The professor began to describe the philosophy of Bhakti: that connecting with the divine is about love and devotion, that the details of ritual are not as important as a true spiritual devotion to god. Immediately after this description, the professor wrote on the board BHAKTI RITUALS. Um, teacher, did you not feel that just now? How did you build your immunity to cognitive dissonance?

We have been categorizing, deconstructing, analyzing this beautiful philosophy as if engineers. After the class I suspect I will know many facts, but have no understanding. If I were to talk to a yogi, he will consider me no closer to understanding his spirituality than any other American out of the hat. This is disappointing, since I don’t consider myself to have learned something until I understand it. We have a Hindu temple here in Boulder; I hope to find a way to study there and use the class as a supplement.

But why I am really writing this post is to help me to grip a vague sense I felt as I was processing after the BHAKTI RITUALS class. I am in a topology class this semester, and we are learning set-theoretic point-set topology. The constructivist in me winces every few minutes, lamenting the non-computability of everything we are discussing. I think the same cognitive orientation is fueling my dissatisfaction with the Indian religions class and my taste for constructivism. Classical mathematics seeks to separate the world into true and false, existence and nonexistence, equal and inequal. The inclusion of the law of excluded middle as obvious is evidence of this, as is the surprise felt by the mathematical world over Gödel’s incompleteness theorem. “What? We can’t eventually separate everything into two categories?!”

If you ask a set theorist whether ℕ = ℚ, they will probably say they are not equal (although have equal cardinalities). If you ask a type theorist whether ℕ = ℚ they will say “huh?”. The question cannot be answered, for we must consider what it means to treat 1 : ℕ as a ℚ, and we don’t know how to do that — not without a function that shows how. Indeed, in constructivism we have to be careful when talking about real numbers, since the set of observations matters, i.e. it matters how we look at them. And for any reasonable construction of the reals, their connectedness falls out of the constructivism of the theory: we cannot separate them into two categories in any way. A set theorist can, and has to define himself into a more realistic world where he can’t using the mechanism of topology.

Mathematicians are probably getting upset at me or thinking I’m an idiot. This isn’t a mathematical post, it’s philosophical, thus my fuzzy intuitive discussions. If you have the desire to leave an emphatic corrective comment at this point, maybe take a step back and try to make out the landscape with me. I don’t consider any of this true, I’m just trying to get a feel for the philosophically general idea of connectedness, outside of a particular formal system. I have the impression that we can think of the world — the real one or the mathematical one — this way and it might lead to a more accurate, if less “clear-cut”, way of thinking.

The pure untyped lambda calculus is connected in the Scott topology. This fact has fascinated me since I heard of it, trivial though it might be. We are used to adding traditional totally disconnected types to the lambda calculus and pretending bottoms don’t exist. I have been curious about what it would look like if we embraced this connectedness and extended lambda calculus with connected concepts. They may play more nicely in a connected system. I still have not made any concrete progress on this idea, but it appeals to me as potentially beautiful and powerful. Maybe we are computing in an awkward way without realizing it.

Did you like this post? Accelerate the flow of Karma :-) Flattr this

Searchable Data Types

A few years ago, Martín Escardó wrote an article about a seemingly-impossible program that can exhaustively search the uncountably infinite "Cantor space" (infinite streams of bits). He then showed that spaces that can be thus searched form a monad (which I threw onto hackage), and wrote a paper about the mathematical foundations which is seemingly impossible to understand.

Anyway, I thought I would give a different perspective on what is going on here. This is a more operational account, however some of the underlying topology might peek out for a few seconds. I’m no topologist, so it will be brief and possibly incorrect.

Let’s start with a simpler infinite space to search. The lazy naturals, or the one-point compactification of the naturals:

data Nat = Zero | Succ Nat
    deriving (Eq,Ord,Show)
infinity = Succ infinity

Let’s partially instantiate Num just so we have something to play with.

instance Num Nat where
    Zero   + y = y
    Succ x + y = Succ (x + y)
    Zero   * y = Zero
    Succ x * y = y + (x * y)
    fromInteger 0 = Zero
    fromInteger n = Succ (fromInteger (n-1))

We wish to construct this function:

search :: (Nat -> Bool) -> Maybe Nat

Which returns a Nat satisfying the criterion if there is one, otherwise Nothing. We assume that the given criterion is total, that is, it always returns, even if it is given infinity. We’re not trying to solve the halting problem. :-)

Let’s try to write this in direct style:

search f | f Zero    = Just Zero
         | otherwise = Succ <$> search (f . Succ)

That is, if the predicate worked for Zero, then Zero is our guy. Otherwise, see if there is an x such that f (Succ x) matches the predicate, and if so, return Succ x. Make sense?

And it seems to work.

ghci> search (\x -> x + 1 == 2)
Just (Succ Zero)
ghci> search (\x -> x*x == 16)
Just (Succ (Succ (Succ (Succ Zero))))

Er, almost.

ghci> search (\x -> x*x == 15)
(infinite loop)

We want it to return Nothing in this last case. It’s no surprise that it didn’t — there is no condition under which search is capable of returning Nothing, that definition would pass if Maybe were defined data Maybe a = Just a.

It is not at all clear that it is even possible to get what we want. But one of Escardó’s insights showed that it is: make a variant of search that is allowed to lie.

-- lyingSearch f returns a Nat n such that f n, but if there is none, then
-- it returns a Nat anyway.
lyingSearch :: (Nat -> Bool) -> Nat
lyingSearch f | f Zero = Zero
              | otherwise = Succ (lyingSearch (f . Succ))

And then we can define our regular search in terms of it:

search' f | f possibleMatch = Just possibleMatch
          | otherwise       = Nothing
    possibleMatch = lyingSearch f

Let’s try.

ghci> search' (\x -> x*x == 16)   -- as before
Just (Succ (Succ (Succ (Succ Zero))))
ghci> search' (\x -> x*x == 15)

Woah! How the heck did it know that? Let’s see what happened.

let f = \x -> x*x == 15
lyingSearch f
0*0 /= 15
Succ (lyingSearch (f . Succ))
1*1 /= 15
Succ (Succ (lyingSearch (f . Succ . Succ)))
2*2 /= 15

That condition is never going to pass, so lyingSearch going to keep taking the Succ branch forever, thus returning infinity. Inspection of the definition of * reveals that infinity * infinity = infinity, and infinity differs from 15 once you peel off 15 Succs, thus f infinity = False.

With this example in mind, the correctness of search' is fairly apparent. Exercise for the readers who are smarter than me: prove it formally.

Since a proper Maybe-returning search is trivial to construct given one of these lying functions, the question becomes: for which data types can we implement a lying search function? It is a challenging but approachable exercise to show that it can be done for every recursive polynomial type, and I recommend it to anyone interested in the subject.

Hint: begin with a Search data type:

newtype Search a = S ((a -> Bool) -> a)

Implement its Functor instance, and then implement the following combinators:

searchUnit :: Search ()
searchEither :: Search a -> Search b -> Search (Either a b)
searchPair :: Search a -> Search b -> Search (a,b)
newtype Nu f = Roll { unroll :: f (Nu f) }
searchNu :: (forall a. Search a -> Search (f a)) -> Search (Nu f)

More Hint: is searchPair giving you trouble? To construct (a,b), first find a such that there exists a y that makes (a,y) match the predicate. Then construct b using your newly found a.

The aforementioned Cantor space is a recursive polynomial type, so we already have it’s search function.

type Cantor = Nu ((,) Bool)
searchCantor = searchNu (searchPair searchBool)

ghci> take 30 . show $ searchCantor (not . fst . unroll . snd . unroll)

We can’t expect to construct a reasonable Search Integer. We could encode in the bits of an Integer the execution trace of a Turing machine, as in the proof of the undecidability of the Post correspondence problem. We could write a total function validTrace :: Integer -> Bool that returns True if and only if the given integer represents a valid trace that ends in a halting state. And we could also write a function initialState :: Integer -> MachineState that extracts the first state of the machine. Then the function \machine -> searchInteger (\x -> initialState x == machine && validTrace x) would solve the halting problem.

The reason this argument doesn’t work for Nat is because the validTrace function would loop on infinity, thus violating the precondition that our predicates must be total.

I hear the following question begging to be explored next: are there any searchable types that are not recursive polynomials?

Did you like this post? Flattr this

One true way

This post is a follow-up to my recent post programs are made of ideas. I received a lot of interesting comments and criticism on this philosophical exposition. But there is one in particular that caused me to deeply consider whether my argument made sense. It starts with this excerpt from the post:

Why constrain ourselves by forcing a fundamental idea? Must there really be a One True Way which forms a basis for everything else to be defined?

Surely the set of all the ideas we are capable of having forms a fundamental basis in which everything else can be defined. Or more concretely, it is presumptuous to think that our brains are so great as to not be constrained by some underlying system. We may not see the definitions of our ideas, but if our ideas can be said to exist at all, they have to have some expression in terms of something, right?

It is another thing entirely to wish to communicate that basis to a computer. Logicians will be quick to intuit that we would not be able to fully understand any system which forms the basis for our own thoughts (however, this intuition will be difficult to formalize unless our thoughts obey first order logic). And I’m pretty confident in saying that it isn’t first order logic, it isn’t lambda calculus, and it (still) isn’t actions on digital state machine1.

A perhaps clearer way of saying what I was trying to say is this: let the system in which we communicate our ideas be our slave, not our master. Think first, then decide which words will lead to the clearest explanation. We as software engineers, mathematicians, and even speakers of natural language have a tendency to give in to Sapir and Whorf. While we may not be capable of freeing our thoughts from our language2, we may be caving too easily. I felt a breath of fresh air when I noticed and cut the ropes tying me to lambda calculus and functional thinking.

It is a beautiful thing that most of these systems are Turing complete. I have never read the Church-Turing thesis as “C is Turing complete, so you might as well use C for all your programming tasks”. I read it almost the opposite way: “choose whatever language you want, because I can always write an interpreter for your language in my language”. Isn’t it great that we are able to choose between the uneasy handwaving of Perl and the frustrating precision of Agda?

I have always seen a programming language as a single core system for which ideas must be tailored. In other words, as a master. A current project of mine is to devise software development software that embraces the plurality of these systems: a way to tie together all these different methods of expression so that they can benefit each other. Let the language camps remain polarized, but let the computer do the arguing.

1 You might ask, “But what if we consider the brain to be a big digital state machine?” That is beside the point. By the time our consciousness gets hold of our thoughts, the mechanism used to implement them is long abstracted away. (Or perhaps not — maybe these thoughts are the result of a highly abstracted digital state machine, and with a different sort of brain we would have vastly different sorts of thoughts. That even seems likely. Hmm, food for… digital state machine operations.)

2 Nor may it be desirable — Bertrand Russell admired Giuseppe Peano for the clarity in which he could express thoughts using the then-novel idea of formal logic.

Did you like this post? Prove it! Flattr this

Programs are made of ideas

quietfanatic The fundamental unit of a computer program is not a function; it’s an action. CPUs don’t do lambda calculus; they follow procedures.

I saw this on twitter a couple days ago and responded with a jerk of my knee, only to commence a discussion1 in which I failed to accurately express my objection. So here is a full treatment.

First I would like to disclaim that I am not attacking quietfanatic in any way. His quote merely provides a convenient means to discuss a widespread pattern of thinking.

Before we start talking about fundamental units of X, it’s probably a good idea to get clear about what we mean by X (where X = computer program). Here are some of the possible definitions I can think of:

  1. A computer program is a pattern of memory intended to be read by a CPU — machine code.
  2. A computer program is an abstract specification for the behavior of the CPU when fed (to-be-determined) machine code.
  3. A computer program is the idea of its behavior. To illuminate what I mean by this, think of Firefox. What idea did you have just then? Was it of an abstract specification? A series of bits? I doubt it. You probably pictured a window browsing the internet or something like that. I’m talking about that kind of idea.
  4. A computer program is a state of a physical system. If it is not in a block of hard disk or ram somewhere, it cannot be considered to exist.

The difference between these is the level of abstraction at which they are defined. But importantly, they are all defined at some level of abstraction — which means that they are all just ideas. Perhaps (4) is not, depending on how concretely you see the world.

Here are some of the ways to think of how a program is run:

  1. The specification of the program is interpreted by some rules, resulting in some changes to physical things.
  2. The CPU deconstructs memory into a series of discrete instructions, which it then “performs” one after another.
  3. The computer is a system in which the states of some physical devices are coupled to a system of interacting voltages, some of which encode a program.

Again, they differ at the level of abstraction at which they are defined.

The way I read quietfanatic’s quote is talking about level (2) of the program and level (2) of the CPU. To me it says “CPUs perform instructions, so our abstract specifications of computer programs should be (ultimately) in terms of instructions”. With that phrasing there seems to be a wide logical gap between the two clauses. Why does the language we use for abstract specification have to correspond to one particular abstract conception of how a CPU works?

By permuting my list of interpretations, I can construct many plausible quotes that follow the same logic. “The fundamental unit of a computer program is a rule, since the program is ultimately interpreted as rules” or “the fundamental unit of a computer program is a voltage, since voltages are ultimately what drive the devices.”

But my objection is larger than this fallacy. To me, and I presume to most software developers, programs are made of ideas. When thinking about programming languages, we are looking for a way to express our ideas in a way that the computer can approximate them. Our ideas form the central theme, not the mechanisms that approximate them.

What is the fundamental unit of a number? How about of a web page? A shape? A set? A cursor?

All these things are ideas, each of which has many different lights under which it can be viewed, each of which may give a different answer about a “fundamental unit”. But ultimately ideas are not defined in terms of anything — we use definitions to grasp how our ideas connect to each other. Some might say that the fundamental unit of mathematics is the set, but when I think about a triangle, I’m sure not thinking about sets.

I am unable to see how computer programs are any different. Why constrain ourselves by forcing a fundamental idea? Must there really be a one true way which forms a basis for everything else to be defined? In the historical development of mathematics, our ideas have had a shifting fluid foundation, each generation bringing a new way to think about the same things. Currently the evolution of programming languages is serving as our shifting foundation. Let’s use our full vocabulary of ideas to express programs, and try not to limit our concepts’ true breadth and beauty by telescoping in on what they mean in terms of a fixed basis.

This article is reflecting a change in thought that I am going through right now. After a lot of time being committed to the lambda calculus, finding ways to construct the naturals out of some pretty-looking set of primitives, devising ways to abolish imperative thinking and embrace functional thinking, I realized that I was lost in a world of details. Everything I did and thought was coupled to the fundamental system I was creating.

It is dawning on me that the fundamental system is merely a means to express my ideas, and the true beauty is in the ideas themselves. My denotationalism fades as I perceive that most of my mental structures are intrinsic. They are not defined in terms of anything and they have no denotation other than themselves. There are ways to model those ideas, but I am suddenly being careful not to confuse the model with the idea itself.

1 To the extent that a series of tweets can be considered a discussion.

Did you like this post? Show, don’t tell Flattr this

Associative Alpha Blending

I recently revamped my graphics-drawingcombinators module to have a handsome denotational semantics. I may write a post going into full detail later, but the executive summary is that Image a = R2 → (Color, a). Due to TCM, I get semantics for Image’s Functor instance from this, and if Color is a monoid I get the semantics for the Applicative and Monoid instances as well. OpenGL combines colors by alpha blending, so I happily defined my monoid instance as alpha blending:

 mempty = Color 0 0 0 0
 mappend c@(Color _ _ _ a) c' = a*c + (1-a)*c'

It doesn’t take a genius to see that this violates two of the three monoid laws (the only one it satisfies is left unit: mempty `mappend` x = x). This is embarrassing. My new rigorous denotational semantics has a pretty awesome flaw.

To make this right, let’s redefine Color not as something which is drawn, but as a transformation on alpha-free colors. I do this because functions make great monoids: composition is always associative and identity is always a unit. But it’s not just any function, it’s an alpha blending function. So we say that f is a “Color” if there exists constants a and x such that f(c) = a x + (1-a) c, which I will write as f = [a; x]. Here a is a scalar and x is another alpha-free color like c. We would really like it if the composition of two Colors were also a Color. Let’s try:

f(g(c)) = [fa;fx]([ga;gx](c))
        = fa fx + (1 - fa) ([ga;gx](c))
        = fa fx + (1 - fa) (ga gx + (1 - ga) c)
        = fa fx + (1 - fa) ga gx + (1 - fa) (1 - ga) c
        = fa fx + (1 - fa) ga gx + (1 - fa - ga + fa ga) c
        = (fa fx + (1 - fa) ga gx) + (1 - (fa + ga - fa ga)) c

It looks like the “alpha” of our composition is (fa + ga – fa ga). Let’s call this a’. Now we have to get the left side of the addition into the form a’ r for some r. Let’s just hack it: multiply and divide1 by a’.

        = a' (fa fx + (1 - fa) ga gx) / a' + (1 - a') c

And then we can read off the answer:

[fa;fx] . [ga;gx] = [a' ; (fa fx + (1 - fa) ga gx) / a']
   where a' = fa + ga - fa ga

For the identity, we need:

a x + (1 - a) c = c

Which is satisfied for a = 0 with x = anything, so we can use [0,0].

Because we derived this from composition and identity, the laws must hold. The mathematically untrusting may wish to check this.

And there you have it: my new Color monoid which actually satisfies the laws. This is the way to compose alpha-blended colors — it reflects what happens if you draw one after the other, blending as you go. This is the math that pre-blends any segment of that sequence.

I should have known that OpenGL’s colors were transformations all along, since the color of an object that you see can depend on the color you used to clear the screen.

1 But what if (fa + ga – fa ga) = 0? Fortunately, the only place this happens when these variables are between 0 and 1 is fa = ga = 0, which means both f and g are the identity color.

Maximum entropy and negative probability

I have recently become fascinated with the concept of maximum entropy distributions, and went back and read Dan Piponi’s post on negative probabilities, and link surfing from there. Something sparked and I wondered what kind of connection there is between the two. A little experimenting in Mathematica later and I’m on to something curious.

First, a little background. E.T. Jaynes argues (so I have heard, I have not read the original) that if you have a set of constraints on a set of random variables and you would like a probability distribution over those variables, you should choose the distribution that has the most information entropy, as this is the “least biased” distribution.

The entropy of a distribution is defined as: H = -\sum_i{p_i \log{p_i}}.

I am using Dan’s example, and I will quickly recapitulate the situation. You have a machine that produces boxes of ordered pairs of bits. It is possible to look at only one bit of the pair at a time, say each bit is in its own little box. You do an experiment where you look at all the first bits of the boxes, and it always comes out 1. You do a second experiment where you look at the second bit of the boxes, and it, too always comes out 1.

Now, most reasonable people would draw the conclusion that the machine only produces boxes containing “1,1″. However, if we wholeheartedly believe in Jaynes’s principle, we have to look deeper before drawing a conclusion like that.

The 4 probabilities we are interested in correspond to “0,0″, “0,1″, “1,0″, “1,1″. I will write them as 4-vectors in that order. So an equal chance of getting any combination is written as 1/4 <1,1,1,1>.

For the distribution <a,b,c,d>, our constraints are: a+b+c+d = 1 (claiming our basis is complete), c+d = 1 (the first bit is always 1), b+d = 1 (the second bit is always 1).

The “reasonable” distribution is <0,0,0,1>, which indeed satisfies these constraints. The entropy of this distribution 0 (taking x log x = 0 when x = 0) — of course, there is no uncertainty here. But are there more distributions which satisfy the constraints?

Well, if you require all the probabilities to be positive, then no, that is the maximal entropy one, because it is the only one that satisfies the constraints. But let’s be open-minded and lift that requirement.

We have to talk about what the entropy of a negative probability is, because log isn’t defined there. The real part is perfectly well defined, and the imaginary part is multi-valued with period 2π. I’m not experienced enough with this stuff to make the right decision, so I’m blindly taking the real part for now and pretending the imaginary part is 0, since there’s really no reasonable “magnitude” it could be.

Whew, okay, almost to the fun stuff. We have four variables and three constraints, so we have only 1 degree of freedom, which is a lot easier to analyze than 4. We can express the distribution with only that one degree d as:

<d-1, 1-d, 1-d, d>

And here is a plot of the real part of the entropy as a function of d:


It achieves a maximum at d = 1/2, the distribution <-1/2, 1/2, 1/2, 1/2>, the same one Dan gave. In some sense, after observing that the first box is always 1 and, separately, that the second box is always 1, it is too biased to conclude that the output is always “1,1″.

I would like to patch up the “real part” hack in this argument. But more so, these exotic probability theories aren’t really doing it for me. I would like to understand what kinds of systems give rise to them (and how that means you must interpret probability). My current line of questioning: is the assumption that probabilities are always greater than 0 connected to the assumption that objects have an intensional identity?

I would love to hear comments about this!

Recursive Types in IΞ

In the last IΞ post, I introduced the calculus and sketched the construction of some standard mathematical objects. In this post, I will dive a little deeper and construct of all the positive recursive types. The proof will be in an informal style (in particular, omitting the H constraints), but I have little doubt that the proof will go through.

Only a superficial familiarity with IΞ is needed to understand this proof; I adopt a set-theoretic style notation, so the proof should be approachable. Here’s a quick refresher of the primitives of IΞ.

  • Membership is application, so when I write x \in A, this is translated into IΞ as A x. Thus sets, unary predicates, and types are all the same thing.
  • Because membership and application are identified, universal quantification and the subset relation are also. \Xi A B means “A is a subset of B”, or “x in A implies x in B”. In particular, the pattern \Xi A (\lambda a. P a) can be interpreted as “for every a in A, P a”.
  • L is the set of all sets (whose existence does not lead to a contradiction in this system!). I will give its definition at the end of the article.
  • Other symbols have their usual interpretation, and I’ll include an appendix which gives them all precise definitions at the end.

Definition: A function F : L \mapsto L is called monotone if A \subseteq B \Rightarrow F A \subseteq F B.

Intuitively, the monotone functions correspond roughly to the functors in Haskell; they use their parameter in a positive way (appear to the left of an even number of arrows).

Definition: The type recursion combinator μ is defined as: x \in \mu F = \forall P\!\in\!L.\, F P \subseteq P \Rightarrow x \in P.

We are allowed to define things simply by giving a condition for membership like this. Formally, this definition starts out: \mu = \lambda F. \lambda x. \Xi L (\lambda P. \dots)

This definition intuitively says, x is in μ F if x is in every type closed under F. We will see that this definition corresponds to a type recursion combinator.

Lemma 1: If F is monotone, then F (\mu F) \subseteq \mu F.

Proof. Given x \in F (\mu F); show x \in \mu F. Expanding the definition of μ above:

Given P \in L, F P \subseteq P; show x \in P.

Observe \mu F \subseteq P: Suppose y \in \mu F, show y \in P. Since y \in \mu F, we have \forall P'\!\in\!L.\, F P' \subseteq P' \Rightarrow y \in P' by definition of μ. Letting P' = P, we have F P \subseteq P from above, so y \in P.

Therefore, x \in F (\mu F) \subseteq F P \subseteq P (by the monotonicity of F and \mu F \subseteq P). QED.

Now for the easy direction:

Lemma 2: If F is monotone, then \mu F \subseteq F (\mu F).

Proof. Given x \in \mu F; show x \in F (\mu F).

By the definition of μ, we have \forall P\!\in\!L.\, F P \subseteq P \Rightarrow x \in P. Let P = F (\mu F). We have F P = F (F (\mu F)) \subseteq F (\mu F) =  P by monotonicity of F and Lemma 1, therefore x \in P = F (\mu F). QED.

Which leads us to the recursion equation.

Theorem: If F is monotone, \mu F = F (\mu F). (Proof trivial by the two lemmas)

I’m using set equality here, i.e. mutual containment, which is probably a bit weaker than Leibniz equality. It is obvious from the definition that this fixed point is minimal.

Monotonicity is easy to prove for any of the standard positive types (products, coproducts, functions). So we can model a good variety of Haskell data types already; however these are standard inductive (least fixed point) types, no infinite structures allowed. I’m still working on the encoding and analogous proofs for ν (greatest fixed point), which is closer to Haskell.

Hopefully I’ll be able to port many libraries (maybe after a few totality annotations) without having to worry about partiality. But there are more pressing matters, such as finishing my interactive proof assistant for IΞ.


  • \text{True} = \Xi H H, the True proposition.
  • K x y = x, the constant function.
  • U = K \text{True}, the set of all objects.
  • (f \circ g) x = f (g x), function composition.
  • (a \mapsto b) f = \Xi a (b \circ f), the set of functions from a to b.
  • L = U \mapsto H, the set of all sets.
  • a \Rightarrow b = \Xi (Ka) (Kb), implication.
  • \Lambda f x = \Xi L (\lambda t. f t x), universal quantification of types (like forall in Haskell)
  • a \times b = \Lambda (\lambda r. (a \mapsto b \mapsto r) \mapsto r), product type.
  • a \oplus b = \Lambda (\lambda r. (a \mapsto r) \mapsto (b \mapsto r) \mapsto r), coproduct type.

Some Constructions in IΞ

Over the past couple months, I have been attempting to find a language to use as a core calculus for Dana, as anyone who follows this blog knows. I have been slowly converging on IΞ for its elegance and simplicity. It turns out to be very powerful, capable of modeling many structures while avoiding many of the paradoxes typically involved with those structures. In this post, I give an exposition of IΞ and construct some common structures.

The main idea is to use lambda calculus for all of our symbol manipulation needs, adding a constant Ξ for quantification. “Types” are unary predicates, so membership is application. For example, “Nat 0″ is a proposition saying that 0 has type Nat (where 0 and Nat are given previous definitions somewhere).

ΞXY means “forall x, if X x, then Y x”, or, if you think of predicates as sets, simply “X is a subset of Y”. So if we have types Nat and Bool, we can say that f has type Nat → Bool with “ΞNat(\x. Bool (f x))”, read “forall x in Nat, f x is in Bool”. Very direct, is it not?

System I

Interpreting these meanings directly, we arrive at Curry’s System I, whose rules follow. A proposition (sequent) has the form “Γ |- X”, which has the interpretation “assuming Γ, X is provable”.

Γ |- X   ⇐  X ∈ Γ
Γ |- Y   ⇐  Γ |- X  ;  X is βη convertible with Y
Γ |- YZ  ⇐  Γ |- XZ  ;  Γ |- ΞXY
Γ |- ΞXY ⇐  Γ,Xa |- Ya  ;  a does not occur free in X,Y,Γ 

I have stated the rules as “to prove X, you need to prove Y”, because that’s kinda how my brain works. Take a moment to internalize them. They are obvious given the above intuitions, and shouldn’t be hard to read.

On top of this, we can build propositional calculus. Define “K = \x. \y. x”, and write “X ⇒ Y” as shorthand for “Ξ(KX)(KY)”. This system has the properties you would expect of a simple propositional calculus.

Sadly, this system is inconsistent. We can prove any proposition X:

  Let Y = (\x. x x ⇒ X) (\x. x x ⇒ X)
  Observe that Y = Y ⇒ X.
  [1] Y |- Y
  [2] Y |- Y ⇒ X
  [3] Y |- X      -- modus ponens on [1],[2]
  [4] |- Y ⇒ X
  [5] |- Y
  [6] |- X        -- modus ponens on [4],[5]

Martin Bunder had a brilliant idea to block this paradox, and also many others, which brings us to the object of my infatuation:

System IΞ

The crucial step above was [3]-[4], where we abstracted over the infinite proposition “((… ⇒ X) ⇒ X) ⇒ X”. The way we will block this is to only allow abstraction over finite propositions. Introduce a new symbol, H, such that HX means “X is a finite proposition” (or simply “X is a proposition”). We will derive finiteness from the finiteness of proofs: to prove HX, we first have to prove H of each component of X. Our system becomes (the new additions are in bold):

Γ |- X      ⇐  X ∈ Γ
Γ |- Y      ⇐  Γ |- X  ;  X is βη convertible with Y
Γ |- YZ     ⇐  Γ |- XZ  ;  Γ |- ΞXY
Γ |- ΞXY    ⇐  Γ |- H(Xa)  ;  Γ,Xa |- Ya     ;  a does not occur free in X,Y,Γ 
Γ |- H(ΞXY) ⇐  Γ |- H(Xa)  ;  Γ,Xa |- H(Ya)  ;  a does not occur free in X,Y,Γ
Γ |- H(HX)

The final rule is an axiom, simply saying that “X is a proposition” is always a proposition.

Constructing the Naturals

Now I will embark on constructing the type of naturals. Since types are predicates, I also need to come up with a representation for naturals. It turns out that it doesn’t matter what representation I use, as long as it has zero and a 1-1 successor function. For the sake of discussion, let’s use the Church encoding.

  0 = \f. \x. x
  S n = \f. \x. f (n f x)

So a natural is an iteration function. For example, the number 3 iterates a function 3 times on its argument: 3 f x = f (f (f x)).

Coming up with a way to classify all of these, but not any others (such as infinity f x = f (f (f (f (f ...))))), was quite a challenge. You might try to classify these function on the property that they are an iteration function, but any sensible method of doing that ends up including infinity. I began thinking that IΞ was not strong enough, and looking for ways to enrich it by adding more axioms.

Fortunately, no new axioms are necessary! The encoding is obvious in retrospect. What is the first thing a mathematician thinks when you talk about the naturals: induction! Let’s define a natural as any object which you can do induction (with 0 and S) over.

To make this readable, we need to introduce a few more symbols:

    f . g = \x. f (g x)   -- composition
    A → B = \f. ΞA(B . f)   -- the type of functions from A to B
    True = ΞHH    -- the true proposition
    U = K True    -- the type of all objects (mnemonic: universe)
    L = U → H  -- the type of predicates/types

Now, for the naturals:

    Nat x = ΞL(\p. p 0  ⇒  ΞU(\n. p n ⇒ p (S n))  ⇒  p x)

Reading this in English: “x is a Natural if, for all predicates p, if p 0, and p n implies p (S n), then p x”. In other words, x is a natural if you can do induction up to it.

Exercise: prove |- Nat 0 and |- (Nat → Nat) S. Note that this can be done independent of our definitions of 0 and S.

More Inductive Constructions

Using this scheme, we can construct all sorts of things. For example, equality:

    Eq x y = ΞL(\p. p x ⇒ p y)


    nil = \n. \c. n
    cons x xs = \n. \c. c x xs 
    List a x = ΞL(\p. p nil  ⇒  ΞU(\ys. p ys ⇒ Ξa(\y. p (cons y ys)))  ⇒  p x)

There is a classic paradox involving the inductive type Set = Set → H, which is definable using this scheme:

    Set x = ΞL(\p. Ξ(p → H)p ⇒ p x)

However, as I tried to prove the inconsistency, I was blocked by the H rules. This gives me hope.

Coinductive Constructions

It is also possible to construct coinductive types. Here are the “conaturals”, the naturals with infinity. We can’t use the constructor definition anymore; coinductive types are all about projections. So the conaturals have only one projection, onto a disjunction. So to eliminate a conatural n, you pass it what to do if n = 0, and what to do with n’ if n = S n’. For example, to check if a conatural is zero, you can use n True (K False).

   CoNat x = ΞL(\p. Ξp(\y. y True p) ⇒ p x)

In English: x is a conatural if for all predicates p, if p y implies y True p, then p x. y True p can be interpreted as y = 0 or y = S n and p n.

Isn’t IΞ cool? Simple, obvious, but powerful enough to model a wide class of data (and codata). The thing I like best about it is that it is untyped at its core. Functions are just pure lambda calculus functions that we are “later” proving properties about. Type erasure comes for free (well, sortof: encoding dependent types into this system will end up passing types as parameters at runtime, even though they are never used).

Future Concerns

Bunder proved his version of IΞ consistent and complete. But the rules I gave, and that I have been using, are not his, and in fact are more powerful than his. This makes it possible that my rules are inconsistent. My system can prove LL, while his cannot. This worries me, because LL in the typical interpretation means “Type : Type”, which gives rise to the Burali-Forti paradox. However, even though I was able to encode the infrastructure for Russell’s paradox, I was unable to complete the proof because of the H rules. Maybe the same thing will happen?

I’m spending a bit of time trying to understand the Burali-Forti paradox that gives rise to the inconsistency in Girard’s System U, so that I can try to carry it out in IΞ. If IΞ does turn out to be inconsistent, I am not worried. A few universe restrictions here and there (i.e. leave U unspecified, rather than defining as K T) should do the trick, at the expense of some convenience and beauty.

Also, since I intend to compile Haskell to IΞ, I need to talk about partiality somehow (eg. prove that if a function terminates, it has this type). This has been giving me trouble, but in a good way. I found that I don’t really understand how bottoms behave, and how to talk about them without implying their semantics. I’m confident there is a way, though, maybe after adding a few axioms as a compromise. But I need to think hard and understand bottoms better, so I know what ought to be true, before I try to prove them or add them as axioms.

Might as well face it, I’m addicted to logic

I’m trying really hard not to become a logician. Like my obsession with FRP, it would be very interesting and educational. But my FRP fancy came from a desire to make games more easily, and I have since lost interest in that endeavor, studying FRP for its own sake. Now I am trying to change the world with Dana, and getting caught up in the beauty and unity of different logical systems.

This happened when trying to choose a core calculus for Dana. I am now furiously interested in Martin Bunder’s work on combinatory logic (btw, if anybody has a copy of his PhD thesis, “a one axiom set theory based on combinatory logic”, please let me know). System IΞ — or rather, systems nearby it — strike me as amazingly beautiful. It is based on an untyped lambda calculus, in which you can prove things about untyped functions (which is a way of endowing them with types). For example, to say that f has type A → B, you say:

Ξ A (B ∘ f)

In English: for all x in A, f x is in B.

However, the core logic really isn’t that important; I’ve only been focusing on it because it’s interesting. In fact, a cool thing about Dana is that there is very little dependency between its parts. But I would really like to start making something rather than researching. How come math is so fascinating?

Anyway, I am not sure that IΞ is strong enough. Assuming a “big enough” universe, I’ve been able to construct an equality predicate (the construction is essentially “the smallest reflexive relation”). But I have had little success in constructing any inductive types, such as the naturals. That’s why I want to read Bunder’s thesis — to get ideas.

Not just system IΞ, but logic in general, is fascinating me. Large cardinals in set theory, universe levels in CIC, and related “stratification” ideas abound and unify to create some intuitive notion of truth. In some sense, truth is the strongest consistent such unverse — however there is provably no such thing. In what system should we then work? Is it essential that picking a system of axioms in which to work will always be a part of mathematics? How do you consolidate results which assume different axioms?

That is actually my current goal for Dana’s core. I think the core calculus will be very weak, and you add axioms as you need more (in line with a quote from Dr. Scott himself: “If you want more you have to assume more”). Such axioms will have the same pattern as e.g. the IO monad in Haskell; your assumptions bubble their way to the top. However, it’s a much richer system than “IO or not IO”; you know exactly what you are assuming to run any piece of code. If there is a top level “user OS”, its assumptions will be vast (or maybe there’s some clever way to incrementally add them?).

Anyway, if the itch to make something irritates me so, I can assume I have a strong core logic — whatever it is — and start building things at a higher level. It’s emotionally difficult for me to do so, because I like to feel like I am on a strong foundation (isn’t that the whole point of Dana, after all?).