# Computably Uncountable

We are all familiar with Cantor’s diagonal argument that proves there exist infinite sets which are “larger” than the set of natural numbers. In this post I will show that we can express this argument in the form of a program, thus showing that there are countable sets which are “computably uncountable”.

I begin with the program itself:

type Cantor = Nat -> Bool

diagonal :: (Nat -> Cantor) -> Cantor
diagonal cs n = not (cs n n)


Cantor is “the cantor space”, the type of infinite sequences of booleans. We will call such an infinite sequence “a Cantor“. There are clearly infinitely many Cantors; e.g. take the range of this function which gives False at every position except the one specified:

unit :: Nat -> Cantor
unit m n = m == n


diagonal is (Georg) Cantor’s diagonal argument written as a program — it takes an alleged sequence of all Cantors, and returns a Cantor which does not occur in the sequence, by construction. This function shows by contradiction that we cannot put Cantors in 1 to 1 correspondence with naturals, and thus that there are more Cantors than there are naturals.

So how many Cantors are there? Since Nat -> Bool is a Haskell type — the type of computable functions from Nat to BoolCantors must be representable by programs. We can encode programs as numbers by treating their source code as base-128 numbers. Hence, there are no more Cantors than naturals, and so Cantors can be put into 1 to 1 correspondence with naturals.

Wait — what? There are more Cantors than Nats, but they both have the same size? Something is wrong. Indeed, in the process of this argument we have asserted both

1. “We cannot put Cantors in 1 to 1 correspondence with naturals”
2. Cantors can be put into 1 to 1 correspondence with naturals”

We clearly can’t have both.

### I

The erroneous statement is (2). It is undecidable whether a given program represents a Cantor. If the nth Cantor is ⊥ at n, then diagonal will fail: diagonal cs n = not (cs n n) = not ⊥ = ⊥. Because ⊥ is a fixed point of not, diagonal cannot return an element different from the one it was given. Thus for diagonal to work, we must require that Cantors be fully-defined — no infinite loops!

With this requirement, we can no longer put Cantors in 1 to 1 correspondence with the naturals, because we would have to solve the halting problem. It is not enough that the type of the term is a Cantor, it now must be fully defined for all inputs, and determining that given arbitrary source code is an undecidable problem.

### II

The erroneous statement is (1). Cantors are computable functions, so as we have argued, they have the same cardinality as the naturals. There are no more programs than numbers, so by the definition of equal cardinality we can put them in 1 to 1 correspondence with a function.

The problem with (1) occurs because diagonal takes as its first argument not an arbitrary sequence of Cantors, but a computable sequence of Cantors. If cs is not computable, then neither is diagonal cs (for we no longer have cs‘s source code with which to construct it), and Cantors are defined to be computable sequences. So diagonal fails to contradict our bijection.

### III

The erroneous statement is (2). Section II claims to put Cantors and naturals in 1 to 1 correspondence, but it is lying. Suppose Section II is formulated with respect to some axiom system A. If it were “telling the truth”, we would expect there to be some term f in the language of A such that for every fully defined Cantor program c, there is some natural number n such that we have $A \vdash f(\bar{n}) = \bar{c}$ (i.e. it is a theorem of A that f(1 + 1 + … + 1) = (source code of c)).

Let’s suppose we have written down the axioms of A into a Haskell program, and we have a (partial) function proofSearch :: Nat -> Cantor, which, given a number n, searches for theorems of the form $f(\bar{n}) = \bar{c}$ and compiles and returns the first such c it finds. In the case that there is no such statement, it just runs forever; similarly for the case that c fails to compile. Although cumbersome to write, I’m sure we agree that this is possible to write. If section II is not lying, then we expect that for every natural n, proofSearch n does in fact return a valid Cantor.

evidence :: Cantor
evidence = diagonal proofSearch


Oh my! If section II is the truth, then proofSearch is a total, computable function of type Nat -> Cantor, which we can pass to diagonal to find a Cantor that it missed! So it must have been lying, either (1) about its function f finding every possible Cantor or (2) about it actually possessing such a function (i.e. it “proved” that there is such a function, but it couldn’t actually represent it). In either case, it did not actually create a 1 to 1 correspondence between the naturals and Cantors.

### IV

Left as an exercise for the reader.

Which one is it really?

# What was the probability of that?

A group of college students bask in their crude weekly haven of Texas Hold’em poker. Amid the usual banter, they watch as one deals out the flop: a three of spades, a three of diamonds, and a three of clubs. The room falls silent and tensions are suspended as if the ground had fallen out from beneath the house. The highest card may win this hand, a pocket pair is almost golden, and everyone wonders if someone has the single remaining three. A round of timid betting ensues — a wrong step here could be very expensive. After the round completes, the fourth card is dealt onto the table: the three of hearts. The tensions are dropped as everyone laughs incredulously, and out of the laughter emerges “what’s the probability of that?”

One of the more mathematically adept players pulls out his phone and does a quick calculation: $\frac{4}{52}\frac{3}{51}\frac{2}{50}\frac{1}{49}$ — about 1 in 270,000. Everyone is wowed by the rarity of the event they just witnessed.

That is indeed the correct probability to get four consecutive threes from a deck of cards. But is that really the question here? Surely nearly the same response would have occurred if it had been four fours, or four nines. If it were four aces people would be even more astonished. The same response would have occurred if the cards had been four to a straight flush; e.g. the five through eight of spades. There are many such situations. “Four threes” is the most immediate pattern that we recognize as anomalous, but it is not the only anomalous pattern.

So what event is really being referred to by that? Those specific four cards had a 1 in 6.5 million chance of coming up in the order they did from a player’s perspective before the hand, and a 100% chance of coming up in the order they did from a perspective after the hand [some will note that I am using a Bayesian interpretation of probability at this point]. The probability of the specific real world event that occurred (including the orientations and temperatures of the cards, the reactions of the players, and the taste of Jake Danser’s bite of Chinese dinner 500 miles away), from the perspective of any of the players, is unbelievably small.

In situations where this question is asked, I always jokingly answer the question with “1″. Most of the time people laugh and let the question rest, but sometimes the conversation turns more serious. In this case I try (in all cases so far, in vain) to explain the plurality of perspective at play here. The “serious” answer I have for this question, doing my best to interpret the intended meaning of the statement while incorporating these issues, is a number I cannot calculate but that I can describe: it is the probability that the speaker would have been astonished by the event that occurred; essentially the probability that he would remark “what’s the probability of that?”. I think that is quite a bit higher than one in 270,000, so the event we witnessed was not as rare as the simple calculation would have us believe.

The dissonance of such situations points to a common simplistic view of probability: that events (in the colloquial sense) have probabilities. A distinction that is commonly understood (and commonly misunderstood) is that between frequentism, which talks about running an experiment a bunch of times and calculating the proportion of experiments that satisfied some criterion, and Bayesianism, which talks about the confidence one has in some property being true in terms of a subjective state of knowledge. This is a fascinatingly subtle distinction: they coincide on most but not all questions, and there is much argument about which one is “right” (I think that is a rather silly argument, as if a probability were something bestowed to us from nature). However, both of these views are calculating the probability of a property (a set of events) being true (happening), and both of these views rely on an assumption of prior information: the frequentists on the set-up of the experiment and the Bayesians on their prior state of knowledge about the world. The idea that an event has a single, objective probability says something very strong about the universe: that there is some essential, natural source of randomness (to dispel any quantum objections, I point to E.T. Jaynes’s critique of the view that the randomness in quantum theory is an essential property of nature). But even if there were some essential source of randomness, surely the universe is more determined than our probabilistic calculations assume: from the view of this universe, the deck cannot be uniformly shuffled after only seven shuffles because it knows how your hands and your brain work. So we were never talking about an essential, natural randomness.

In our normal use of probabilities, we don’t run into this problem because we are predicting the future: e.g. “what is the probability that Obama will be re-elected?”. We conceive of this as a single event, but it is actually a wide collection of events. Given some prior knowledge, this set of events has a definite probability. But we are inconsistent about how we treat future events and past events: past events are not collections — they happened, and they happened in exactly one way. From the perspective of now, the probability that it would happen any other way is zero. From the perspective of before the event, we are asking whether “it” would happen or not, but we are entirely unclear about what measurable property we mean by “it”. We mean something different when we refer to events in the future and events in the past.

In summary, probabilities are functions of (1) prior information and (2) a criterion for judging when it is to be considered true. Probability is meaningless to apply to a single event.

N.B. in probability theory, the word event is already defined to mean a set of outcomes. If you read this article with that definition in mind, you will have been very confused :-).

If this post helped you think more clearly about probability, show your appreciation and .

As our understanding progresses, what once were rigid distinctions tend to become blurred. Hence, I am fascinated by the pervasiveness and stability of the three programming language paradigms: imperative, functional (here I mean that word in the pure sense, not HOFs sense — Conal Elliott would probably have me call it denotational), and logical. Whole languages are beginning to blur their orientation, but specific solutions to problems are still typically classifiable along these lines nonetheless. We will see how the three are linked to forms human communication and the style of our discourse — and thus why these distinctions are so stable.

Each paradigm corresponds to a syntactic unit of language. The imperative, as the name suggests, corresponds to sentences in the imperative mood:

def bfs(predicate, root):
queue = Queue()
while not queue.empty():    # While the queue is not empty, do the following:
node = queue.dequeue()    #   Take a node off the queue.
if predicate(node):       #   If it satisfies the predictate,
return node           #     return it.
for n in node.children(): #   Otherwise,


The program is written as a recipe, telling the computer what to do as if a servant. Note that, after qualifying phrases, each sentence begins with an action word in the imperative form, just as this sentence begins with “note”. The object-oriented aspect adds a sort of directorial role to the program, wherein the program is read not as the user telling the computer what to do, but the program telling objects in the program what to do. Sentences are still written in the imperative mood, they can now be directed: “queue, give me an element”, “handle, write down this line.”

But not every sentence in human discourse is imperative, for example this one. The logical captures sentences of relationship, such as:

captures(logical, relationships).


But perhaps we should see a more practical logical program as an example:

mirrors([], []).                  % The empty list mirrors itself.
mirrors([X|XS], L) :-             % [X|XS] mirrors L if
append(LS,[X],L),            %    LS and [X] compose L, and
mirrors(XS, LS).             %    XS mirrors LS.


The program is written as a set of assertions, building a model of its world up from nothing. To run this program, you ask it questions about the world:

?- mirrors(X, [1,2,3]).           % What mirrors [1,2,3]?
X = [3,2,1]


A great deal of human discourse falls into this category: propositional sentences. Most of the sentences in this post fall into that category. However, those familiar with Prolog will know that this is a poor implementation of mirrors (it is quadratic time), and would write it this way instead:

mirror([], R, R).         % The mirror of [] prepended to R is R
mirror([X|XS], Y, R) :-   % The mirror of [X|XS] prepended to Y is
mirror(XS, [X|Y], R). %   the mirror of XS prepended to [X|Y|.


In proper logical style, mirror expresses itself as propositional relationships, with the caveat that the only relationship is "is". Code written this way, relating things by identity rather than propositionally, is actually characteristic of functional style:

mirror [] r = r           -- The mirror of [] prepended to r is r
mirror (x:xs) y =         -- The mirror of (x:xs) prepended to y is
mirror xs (x:y)       --   the mirror of xs prepended to (x:y)


The R in the Prolog code is only connecting together propositions so that we can express this idea in a functional way. The original mirrors predicate is quite unlike this; expressing it in Haskell requires more than a mindless transliteration (however there are still hints of a structural similarity, if a bit elusive).

But I claim that “is” is not the defining linguistic characteristic of functional programs. We could write a functional program with a single equals sign if we were so obfuscatedly-minded; the analogous claim is invalid for logical and imperative programs. The characteristic device of functional programming is the noun: functional programs do not give instructions or express relationships, they are interested in defining objects.

bfs children xs =               -- the BF traversal of xs is
xs ++                       -- xs itself appended to
bfs (concatMap children xs) -- the BF traversal of each of xs's children


The monad is typically used to express instructions as nouns:

main =                            -- the main program is the recipe which
getLine >>= \name ->          -- gets a line then
putStrLn \$ "Hello, " ++ name  -- puts "Hello, " prepended to that line


Haskell elites will object to me using IO as a prototypical example of a monad, but the claim is still valid; look at the words used to define monad actions: tell, ask, get, put, call. These are imperative words. This is not a sweeping generalization, however; for example, the constructors of Escardó’s search monad are nouns.

The following table summarizes the linguistic analogies:

 Paradigm Mechanism Example Imperative Imperative mood Put a piece of bread; put meat on top; put bread on top. Logical Propositional relationships Bread sandwiches meat. Functional Noun phrases Meat sandwiched by bread

Each of these paradigms is pretty good in its area. When we’re issuing commands in a functional language, we pretend we are using an imperative language; when we want to treat complex nouns (such as lists) in an imperative language, we are learning fall back on functional concepts to operate on them. When we are documenting our code or talking about types, we use logical ideas (typeclasses, generics).

The thing that makes me hubristically curious is the realization that these three categories hardly cover the mechanisms available in language. What else can we learn from human languages when expressing ourselves to computers?

This post has focused on the “content code” of the various paradigms. There is another kind of code (mostly in statically typed languages) that is interspersed with this, namely declarations:

class Player {...}
data Player = ...


I have not yet been able to find a good analogy for declarations in language; perhaps they introduce something like a proper noun. I will save this topic for another post.

# Relativism and Language

It is hard for me to imagine that so many people are so wrong. Sure, core beliefs go unexamined. Yes, we often unconsciously repeat taglines we have heard from those we respect instead of attempting to translate our true views. But I must admit I think of all people as essentially wanting to figure it out. Life, the universe, their meaning, how to make the world a better place. Some, who see the world as a competitive, dog-eat-dog place, want to figure it out because it will help them survive. Others, like me, who see the modern (Western — that is all I have direct experience with) world as an essentially benign place, just want to figure it out because of a innate curiosity (no doubt a result of past generations with the former motivation).

So when someone says something which strikes me as wrong, when I have the kneejerking impulse to correct them, this belief of mine kicks in and stops me. Oh my, it didn’t used to; I would happily correct the abundant wrongness in the world. After all, if people think the right way, they will do better for themselves and others. I can’t remember a time when I didn’t have this belief, however, but it has taken a while to trickle its way into my choice of actions.

All through my youth, I was told that I was smart (a pedagogically questionable practice). I didn’t buy it (I’ve always had a rebellious streak). What makes me so special? I wasn’t just born with smartness, I thought. At first this manifested as an individualistic self-righteousness: I must be smart because of the intelligent ways I chose to spend my youth (what? Trampoline, video games, and Power Rangers?). More recently it has manifested as a skepticism of the views of those who tell me I am smart: you only say that because I am articulating things you agree with, so the compliment is a way of affirming your own worldview. Those both seem naive to me now. I don’t know what I currently think about it, I will probably only be able to articulate that once I move to some other view.

I am still skeptical of any innate superiority (however not enough so to avoid writing this post in a way that comes across as advice). So when I stop myself from correcting a wrongness, what do I do? This is the relativism I’ve been talking about.

Words don’t have meaning; in a conversation, the speaker translates meaning into words, and then the listener translates the words into meaning. We have a soft social agreement about how words are used, and that gives rise to trends in our patterns of thought. But the possibility remains — and I use the word possibility only because of a timidness, I really think of it more as a high probability — that the meanings that I have assigned to the words when I hear them are different from the meanings that were used to form them. Indeed, it is unclear what is even meant by two people having the same thought. My brain is not likely to have the ability to represent the thought that produced the words, especially if I disagree with them.

The exercise, then, is this: try to represent those thoughts anyway. How can I think of these words so that the sentence becomes true? Not just slightly less false, but really true. I might have to temporarily reorient my value system; I might have to imagine I grew up in a religious family; I might have to picture the scary possible worlds that might result if the statement were false (that is, beyond the proportion of the consequences I actually predict, already thinking the statement is false). When I remember to do this, I am brought to a calm, understanding level, with few fiery arguments in sight. My contributions to these conversations are transformed into questions instead of assertions — not Socratic “let me lead you to the right answer” questions, but genuine “I want to understand you” questions.

And that is the essence of relativism to me. What you mean by your words is not what I mean by your words. Sentences are uttered with the concept of their truth in mind, and before blasting forth a correction, I first have to understand how they are true. And more often than not, my planned correction is dismantled and replaced by a connected friendship.

# Perspectives on Truth and Realism

Lately I have been considering myself a relativist. To cast away the kneejerks, I don’t consider all belief systems equally valid (with caveats1). Wikipedia sums it up nicely:

… that truth is always relative to some particular frame of reference, such as a language or a culture.

I have noticed an increase in my opposition to what I am currently calling “scientific realism” — the belief that discoveries made by science are true, and other things are false (basically just an incarnation of absolutism). Yesterday I had an impassioned argument (still in good fun, though) with my roommate about our differences in perception. I noticed my emotions firing up around this subject, a symptom begging me to analyze its cause. Humans get very emotional when their thoughts approach a shattering of a core belief, so I am curious if one is near.

This time, instead of a philosophical persuasive essay, I’m just going to write down some of my observations.

In the conversation with my roommate Monty (who I consider quite intelligent), mostly a battle over semantics, I found the following ensemble of his ideas to leave an impression on me:

1. Newtonian gravity is false, and General Relativity is true.
2. If he lived 200 years ago, Newtonian physics would be true.
3. One thing cannot be more true than another (except in the trivial case of one thing being true and the other false, of course).
4. General Relativity and The Standard Model, which are mathematically incompatible, can both be true at the same time.
5. He hasn’t yet seen any evidence that would suggest there are things that can’t eventually be explained by our current scientific ideas.

Taken together, these ideas are fascinating to me. They indicate a different definition of truth than the one I use, and I’m fascinated because I don’t have a concept that I could substitute for it. On surface interpretation, these statements seem inconsistent to me, so I am really curious about the concept from which they arise. (I am pretty sure (5) is just a fallacy though: what would such evidence look like?)

I have met others who claim that they do not have beliefs. I find this to be common among scientific realists. I wonder what definition of “belief” they use to be able to consider themselves devoid of it; so far when I have pried I am just evaded. There are two reasons I evade inquiries: (1) I am not taking the conversation seriously, which may be because it is threatening my beliefs, or other reasons; and (2) the inquiries are using words in ways that don’t have meaning to me, so I answer in riddles that bring out the dissonance2. I usually assume they are doing it because their beliefs are being threatened3; what makes me curious is the possibility that they are evading because of (2)4. Perhaps I am using “belief” incorrectly when asking that question.

Among Skeptics, there is another possible reason to avoid the word “belief”: because it is very close to “faith”, the buzzword of the enemy. Maybe they use the word “truth” to mean what I call “belief”… but then the idea that someone’s beliefs can be false would be nonsense.

I think most of my anti-realism comes from a desire to (at least give due diligence to) respect the belief systems of others. I think I may start considering “true” to be a value judgement (which, as an experiment, I am trying to avoid). I had a debate with a young earth creationist, a belief system I typically have a hard time respecting. After a long time, I think I heard an essential difference, when he said (paraphrasing): “I believe there is a God because I don’t want to live in a world without a God.” That indicates to me a different relationship to truth — that truth and desirability are related concepts — and opened to me the possibility of respecting his belief system a little more.

Dan Piponi made a brilliant comment on twitter during a conversation about realism: “I don’t think ‘reality’ means much. It’s just a placeholder to make the sentence grammatical.”

Notes

1 What exactly does a belief system being “valid” mean?

2 This will happen, for example, if you ask me whether I believe extra-terrestrial life exists, because I get hung up on the definition of “life”. People seem to acknowledge the subtlety of that word, but then keep using the word anyway as if the inability to define it is no big thing: “you know what I mean.” No, I actually don’t.

3 Probably because it confirms my superiority.

4 Possibly because it threatens my superiority.