Tag Archives: math

Algebraic and Analytic Programming

The professor began my undergrad number theory class by drawing a distinction between algebra and analysis, two major themes in mathematics. This distinction has been discussed elsewhere, and seems to be rather slippery (to mathematicians at least, because it evades precise definition).  My professor seemed to approach it from a synesthetic perspective — it’s about the feel of it.  Algebra is rigid, geometric (think polyhedra) , perfect.  The results are beautiful, compact, and eternal.  By contrast, analysis is messy and malleable.  Theorems have lots of assumptions which aren’t always satisfied, but analysts use them anyway and hope (and check later) that the assumptions really do hold up.  Perelman’s famous proof of Poincare’s conjecture, as I understand, is essentially an example of going back and checking analytic assumptions.  Analysis often makes precise and works with the notion of “good enough” — two things don’t have to be equal, they only need to converge toward each other with a sufficiently small error term.

I have been thinking about this distinction in the realm of programming.  As a Haskell programmer, most of my focus is in an algebraic-feeling programming.  I like to perfect my modules, making them beautiful and eternal, built up from definitions that are compact and each obviously correct.  I take care with my modules when I first write them, and then rarely touch them again (except to update them with dependency patches that the community helpfully provides).  This is in harmony with the current practice of denotative programming, which strives to give mathematical meaning to programs and thus make them easy to reason about. This meaning has, so far, always been of an algebraic nature.

What a jolt I felt when I began work at Google.  The programming that happens here feels quite different — much more like the analytic feeling (I presume — I mostly studied algebraic areas of math in school, so I have less experience).  Here the codebase and its dependencies are constantly in motion, gaining requirements, changing direction.  “Good enough” is good enough; we don’t need beautiful, eternal results.  It’s messy, it’s malleable. We use automated tests to keep things within appropriate error bounds — proofs and obviously-correct code would be intractable.  We don’t need perfect abstraction boundaries — we can go dig into a dependency and change its assumptions to fit our needs.

Much of the ideological disagreement within the Haskell community and between nearby communities happens across this line.  Unit tests are not good enough for algebraists; proofs are crazy to an analyst.  QuickCheck strikes a nice balance; it’s fuzzy unit tests for the algebraist.  It gives compact, simple, meaningful specifications for the fuzzy business of testing.  I wonder, can we find a dual middle-ground?  I have never seen an analytic proof of software correctness.  Can we say with mathematical certainty that our software is good enough, and what would such a proof look like?

UPDATE: Here’s a lovely related post via Lu Zeng. Algebra vs. Analysis predicts eating corn?

Follow Your Nose Proofs

We just had the first Categories for the Boulderite meetup, in which a bunch of people who don’t know category theory tried to teach it to each other. Some of the people there had not had very much experience with proofs, so getting “a proof” was hard even though the concepts weren’t very deep. I got the impression that those who had trouble mainly did because they did not yet know the “follow your nose” proof tactic which I learned in my first upper division math class in college. That tactic is so often used that most proofs completely omit it (i.e. assume that the reader is doing it) and skip to when it gets interesting. Having it spelled out for me in that class was very helpful. So here I shall repeat it, mostly for my fellow Categories members.

Decide what to do based on a top-down analysis of the sentence you are trying to prove:

Shape of Sentence Shape of Proof
If P, then Q. (aka. P implies Q) Suppose P. <proof of Q>
P if and only if Q (→) <proof of if P implies Q>. (←) <proof of Q implies P>
For all x such that C(x), Q Given x. Suppose C(x). <proof of Q>
There exists x such that Q. Let x = <something> (requires imagination). <proof of Q>
P or Q Either <proof of P> or <proof of Q> (or sometimes something tricksier like assume not P, <proof of Q>)
P and Q (1) <proof of P>. (2) <proof of Q>.
not P Assume P. <find contradiction> (requires imagination)
X = Y Reduce X and Y by known equalities one step at a time (whichever side is easier first). Or sometimes there are definitions / lemmas that reduce equality to something else.
Something really obvious (like X = X, or 0 ≤ n where n is a natural, etc.) Say “obvious” or “trivial” and you’re done.
Something else Find definition or lemma, substitute it in, continue.

Along the way, you will find that you need to use the things you have supposed. So there is another table for how you can use assumptions.

Shape of assumption Standard usage
If P, then Q (aka P implies Q) Prove P. Then you get to use Q.
P if and only if Q P and Q are equivalent. Prove one, you get the other.
For all x such that C(x), P(x) Prove C(y) for some y that you have, then you get to use P(y).
There exists x such that C(x) Use x and the fact that C(x) somehow (helpful, right? ;-).
P and Q Therefore P / Therefore Q.
P or Q If P then <goal>. If Q then <same goal>. (Or sometimes prove not P, then you know Q)
not P Prove P. Then you’re done! (You have inconsistent assumptions, from which anything follows)
X = Y If you are stuck and have an X somewhere in your goal, try substituting Y. And vice versa.
Something obvious from your other assumptions. Throw it away, it doesn’t help you.
Something else Find definition, substitute it in, continue.

Let’s try some examples. First some definitions/lemmas to work with:

Definition (extensionality): If X and Y are sets, then X = Y if and only if for all x, x \in X if and only if x \in Y.
Definition: X \subseteq Y if and only if for every a, a \in X implies a \in Y.

Theorem: X = Y if and only if X \subseteq Y and Y \subseteq X.

Follow your nose proof.

  • (→) Show X = Y implies X \subseteq Y and Y \subseteq X.
    • Assume X = Y. Show X \subseteq Y and Y \subseteq X.
    • Substitute: Show X \subseteq X and X \subseteq X.
    • We’re done.
  • (←) Show X \subseteq Y and Y \subseteq X implies X = Y.
    • Assume X \subseteq Y and Y \subseteq X. Show X = Y.
    • (expand definition of = by extensionality)
    • Show forall x, x \in X if and only if x \in Y.
    • Given x.
    • (→) Show x \in X implies x \in Y.
      • Follows from the definition of our assumption X \subseteq Y.
    • (←) Show x \in Y implies x \in X.
      • Follows from the definition of our assumption Y \subseteq X.

See how we are mechanically disassembling the statement we have to prove? Most proofs like this don’t take any deep insight, you just execute this algorithm. Such a process is assumed when reading and writing proofs, so in the real world you will see something more like the following proof:

Proof. (→) trivial. (←) By extensionality, x \in X implies x \in Y since X \subseteq Y, and x \in Y implies x \in X since Y \subseteq X.

We have left out saying that we are assuming things that you would naturally assume from the follow your nose proof. We have also left out the unfolding of definitions, except perhaps saying the name of the definition. But when just getting started proving things, it’s good to write out these steps in detail, because then you can see what you have to work with and where you are going. Then begin leaving out obvious steps as you become comfortable.

We have also just justified a typical way to show that two sets are equal: show that they are subsets of each other.

Let’s see one more example:

Definition: Given sets A and B, a function f : A → B is a surjection if for every y \in B, there exists an x \in A such that f(x) = y.

Definition: Two functions f,g : A → B are equal if and only if for all x \in A, f(x) = g(x).

Definition: (g \circ f)(x) = g(f(x)).

Definition: For any set A, the identity \mathit{Id}_A is defined by \mathit{Id}_A(x) = x.

Theorem: Given f : A → B. If there exists f-1 : B → A such that f \circ f^{-1} = \mathit{Id}_B, then f is a surjection.

Follow your nose proof.

  • Given f : A → B.
  • Suppose there exists f-1 : B → A and f \circ f^{-1} = \mathit{Id}_B. Show f is a surjection.
  • By definition, show that for all y \in B, there exists x \in A such that f(x) = y.
  • Given y \in B. Show there exists x \in A such that f(x) = y.
  • Now we have to find an x in A. Well, we have y \in B and a function from B to A, let’s try that:
  • Let x = f^{-1}(y). Show f(x) = y.
  • Substitute: Show f(f^{-1}(y)) = y.
  • We know f \circ f^{-1} = \mathit{Id}_B, so by the definition of two functions being equal, we know f(f^{-1}(y)) = \mathit{Id}_B(y) = y, and we’re done.

Again, notice how we are breaking up the task based on the structure of what we are trying to prove. The only non-mechanical things we did were to find x and apply the assumption that f \circ f^{-1} = \mathit{Id}_B. In fact, usually the interesting parts of a proof are giving values to “there exists” statements and using assumptions (in particular, saying what values you use “for all” assumptions with). Since those are the interesting parts, those are the only parts that an idiomatic proof would say:

Proof. Given y \in B. Let x = f^{-1}(y). f(x) = f(f^{-1}(y)) = y since f \circ f^{-1} = \mathit{Id}_A.

Remember to take it step-by-step; at each step, write down what you learned and what you are trying to prove, and try to make a little progress. These proofs are easy if you follow your nose.

A Gambler In Heaven

A gambler has just lost all but one $1 in Vegas and decides to go for a walk.  Unfortunately he gets hit by a bus but, having lived mostly a good life aside from the gambling, is shown God’s mercy and lands in heaven.  They only have one type of gambling in heaven, it is a simple choice-free game with the following rules:

A coin is tossed.  If it comes up tails, you lose $1.  If it comes up heads, your entire bankroll is tripled.

The gambler only has the $1 he had on him when he died (turns out you keep your money when you go to heaven).  Here is a possible outcome of his playing this game:

  • $1 – H -> $3
  • $3 – T -> $2
  • $2 – H -> $6
  • $6 – T -> $5
  • $5 – T -> $4
  • $4 – T -> $3
  • $3 – T -> $2
  • $2 – T -> $1
  • $1 – T -> $0

And thus he is broke.

The question is this: starting with his $1, what is the probability he will live the rest of eternity broke in heaven? The alternative, presumably, is that he spends eternity doing what he loves most: gambling.  Do all paths eventually lead to bankruptcy a la Gambler’s ruin, or is there a nonzero probability of playing forever?

You may leave your ideas in the comments, and I will post a solution in a few days.

Constructive Reality

I would like to address a statement by Paul Snively during a Twitter conversation.

The notion that math has meaning apart from being computable is perverse.

I admit that all involved in the discussion are computer scientists, and so we would be predisposed toward constructivism and, in particular, this kind of value system. Indeed, I consider myself “a constructivist” — in the right setting, I will fiercely argue with a classicist that “there exists” does not mean what they think it means — but I will not go this far.

The underlying idea of constructivism is that the form of the theorem describes some evidence that must be used to support it. Evidence for (P or Q) needs either evidence for P or evidence for Q; evidence for (exists x such that P(x)) needs a concrete value (e.g. a number) x and evidence for P(x) for that x; evidence for (P implies Q) is a function that maps evidence for P into evidence for Q; and so on. The two theorems (1) “given any integer, there is a prime number larger than it” and (2) “there are not finitely many prime numbers” are in fact different statements. The evidence for (1) must be a computable function which maps integers to prime numbers which are larger; the evidence for (2) is a function which takes evidence that there are finitely many prime numbers (essentially an exhaustive list) and produces a contradiction. (2) is the form of Euclid’s famous proof, but it is not as strong as (1), which gives a computable process that generates primes. Idiomatically we would call (1) constructive and (2) non-constructive, but the finer distinction is that constructive mathematics distinguishes these two statements while classical mathematics considers them equivalent.

In practice, this means that you cannot use proof by contradiction, the identity that “¬∀x. P(x)” implies “∃x. ¬P(x)”, or the axiom of choice (which claims the existence of a function without giving a way to compute it). The necessary evidence can be extracted from proofs constructed using the remaining axioms.

If you alter the laws of evidence, you can recover the law of excluded middle (proof by contradiction), which says that for any proposition P, (P or not P) is true. Classical mathematicians consider it true that “there are either infinitely many or finitely many twin primes”. Constructively, however, this says if you have a proof of this statement, then you either have a proof of P or a proof of (not P). At the time of writing, this is not true of whether there are infinitely many twin primes; we do not yet have a proof either way. But if you allow into your language of evidence the ability to invoke continuations, then we do have such evidence: the one we have is a proof of (not P), which is a function that takes evidence for P and produces a contradiction. So you pass this function evidence for P because you need the contradiction, but instead of giving you the contradiction you wanted it goes back in time to change its mind, now saying that the evidence for (P or not P) is the evidence for P (which it was just given). Yes, it’s ridiculous, but could be considered constructive if you have a different slant on the meaning of evidence.

But don’t be so hasty in using this ridiculous interpretation against the law of excluded middle. The Ackermann function — a function which grows extremely fast — is constructively definable. However, A(4,4) is far, far greater than the number of elementary particles in the known universe. Using the numeral for A(4,4) as evidence is physically impossible, and yet it is considered valid constructive evidence. This puts constructivism on less sound scientific footing: a constructive theorem need not have actual evidence, it need only have evidence in principle. But what principle? How can one justify that A(4,4) is more real than a well-ordering of the real numbers? — we can give concrete evidence for neither. The proof that A(4,4) is a well-defined number relies on abstract reasoning founded in the same logical ideas that gave rise to the law of excluded middle.

This style of argument is associated with ultrafinitism — the idea that even very large finite numbers may not exist (pay attention to the word may — the idea that a finite number does not exist is intentionally outside the realm of ultrafinitism’s ability to answer). Classical mathematics says there exist arbitrary choice functions, constructive mathematics says those may not exist but A(4,4) does, ultrafinitism says that A(4,4) (and sometimes even numbers as small as 2100) may not exist. These distinctions seem all to be rooted in a sort of fight over which Platonic abstract concepts exist. Perhaps some, such as my friend Paul, would say “are meaningful” instead, but it’s the same idea. It is not as if only one of these philosophies has information to extract. Ultrafinite arguments construct observable evidence, constructive arguments construct idealized evidence, classical arguments discuss idealized existence. If you were to rephrase a classical existence argument “there exists a non-recursive set of integers” to “not every set of integers is recursive” then it becomes constructively valid (it is a diagonal argument). In fact, every sentence provable in classical logic has a corresponding sentence provable in constructive logic, by a simple syntactic transformation. I find it, then, irrational to consider that the former be meaningless and the latter meaningful. So we are merely arguing over the semantics of the word “exists” (and in general, “or”, “implies”, etc. as well). We are arguing about what the sentences mean, not whether they are meaningful. Classical existence is different than constructive existence, and neither corresponds to physical existence.

Paul says, “you can’t actually cut a sphere up and reassemble it into two spheres identical in volume to the first.” I respond by saying that you can’t actually separate an arbitrarily small piece of a box either (first of all, a box made of what?), which constructively is allowed. Mathematics is about mentally idealized objects — if we can “in principle” separate an arbitrary piece of a box, then we can also “in principle” cut a sphere up and reassemble it into two spheres of identical volume, they are merely operating by different principles. Fortunately, we can examine the proofs of these theorems to find out by which principles they are operating. But if you are going to bless one “in principle” as meaningful and banish the others — which I beg you not to — I can see no other way than to resort to the physical reality we are given, to ultrafinitism.

Flattr this

Computably Uncountable

We are all familiar with Cantor’s diagonal argument that proves there exist infinite sets which are “larger” than the set of natural numbers. In this post I will show that we can express this argument in the form of a program, thus showing that there are countable sets which are “computably uncountable”.

I begin with the program itself:

type Cantor = Nat -> Bool

diagonal :: (Nat -> Cantor) -> Cantor
diagonal cs n = not (cs n n)

Cantor is “the cantor space”, the type of infinite sequences of booleans. We will call such an infinite sequence “a Cantor“. There are clearly infinitely many Cantors; e.g. take the range of this function which gives False at every position except the one specified:

unit :: Nat -> Cantor
unit m n = m == n

diagonal is (Georg) Cantor’s diagonal argument written as a program — it takes an alleged sequence of all Cantors, and returns a Cantor which does not occur in the sequence, by construction. This function shows by contradiction that we cannot put Cantors in 1 to 1 correspondence with naturals, and thus that there are more Cantors than there are naturals.

So how many Cantors are there? Since Nat -> Bool is a Haskell type — the type of computable functions from Nat to BoolCantors must be representable by programs. We can encode programs as numbers by treating their source code as base-128 numbers. Hence, there are no more Cantors than naturals, and so Cantors can be put into 1 to 1 correspondence with naturals.

Wait — what? There are more Cantors than Nats, but they both have the same size? Something is wrong. Indeed, in the process of this argument we have asserted both

  1. “We cannot put Cantors in 1 to 1 correspondence with naturals”
  2. Cantors can be put into 1 to 1 correspondence with naturals”

We clearly can’t have both.


The erroneous statement is (2). It is undecidable whether a given program represents a Cantor. If the nth Cantor is ⊥ at n, then diagonal will fail: diagonal cs n = not (cs n n) = not ⊥ = ⊥. Because ⊥ is a fixed point of not, diagonal cannot return an element different from the one it was given. Thus for diagonal to work, we must require that Cantors be fully-defined — no infinite loops!

With this requirement, we can no longer put Cantors in 1 to 1 correspondence with the naturals, because we would have to solve the halting problem. It is not enough that the type of the term is a Cantor, it now must be fully defined for all inputs, and determining that given arbitrary source code is an undecidable problem.


The erroneous statement is (1). Cantors are computable functions, so as we have argued, they have the same cardinality as the naturals. There are no more programs than numbers, so by the definition of equal cardinality we can put them in 1 to 1 correspondence with a function.

The problem with (1) occurs because diagonal takes as its first argument not an arbitrary sequence of Cantors, but a computable sequence of Cantors. If cs is not computable, then neither is diagonal cs (for we no longer have cs‘s source code with which to construct it), and Cantors are defined to be computable sequences. So diagonal fails to contradict our bijection.


The erroneous statement is (2). Section II claims to put Cantors and naturals in 1 to 1 correspondence, but it is lying. Suppose Section II is formulated with respect to some axiom system A. If it were “telling the truth”, we would expect there to be some term f in the language of A such that for every fully defined Cantor program c, there is some natural number n such that we have A \vdash f(\bar{n}) = \bar{c} (i.e. it is a theorem of A that f(1 + 1 + … + 1) = (source code of c)).

Let’s suppose we have written down the axioms of A into a Haskell program, and we have a (partial) function proofSearch :: Nat -> Cantor, which, given a number n, searches for theorems of the form f(\bar{n}) = \bar{c} and compiles and returns the first such c it finds. In the case that there is no such statement, it just runs forever; similarly for the case that c fails to compile. Although cumbersome to write, I’m sure we agree that this is possible to write. If section II is not lying, then we expect that for every natural n, proofSearch n does in fact return a valid Cantor.

Now, let us return to familiar lands with a new program:

evidence :: Cantor
evidence = diagonal proofSearch

Oh my! If section II is the truth, then proofSearch is a total, computable function of type Nat -> Cantor, which we can pass to diagonal to find a Cantor that it missed! So it must have been lying, either (1) about its function f finding every possible Cantor or (2) about it actually possessing such a function (i.e. it “proved” that there is such a function, but it couldn’t actually represent it). In either case, it did not actually create a 1 to 1 correspondence between the naturals and Cantors.


Left as an exercise for the reader.

Which one is it really?

Flattr this

What was the probability of that?

A group of college students bask in their crude weekly haven of Texas Hold’em poker. Amid the usual banter, they watch as one deals out the flop: a three of spades, a three of diamonds, and a three of clubs. The room falls silent and tensions are suspended as if the ground had fallen out from beneath the house. The highest card may win this hand, a pocket pair is almost golden, and everyone wonders if someone has the single remaining three. A round of timid betting ensues — a wrong step here could be very expensive. After the round completes, the fourth card is dealt onto the table: the three of hearts. The tensions are dropped as everyone laughs incredulously, and out of the laughter emerges “what’s the probability of that?”

One of the more mathematically adept players pulls out his phone and does a quick calculation: \frac{4}{52}\frac{3}{51}\frac{2}{50}\frac{1}{49} — about 1 in 270,000. Everyone is wowed by the rarity of the event they just witnessed.

That is indeed the correct probability to get four consecutive threes from a deck of cards. But is that really the question here? Surely nearly the same response would have occurred if it had been four fours, or four nines. If it were four aces people would be even more astonished. The same response would have occurred if the cards had been four to a straight flush; e.g. the five through eight of spades. There are many such situations. “Four threes” is the most immediate pattern that we recognize as anomalous, but it is not the only anomalous pattern.

So what event is really being referred to by that? Those specific four cards had a 1 in 6.5 million chance of coming up in the order they did from a player’s perspective before the hand, and a 100% chance of coming up in the order they did from a perspective after the hand [some will note that I am using a Bayesian interpretation of probability at this point]. The probability of the specific real world event that occurred (including the orientations and temperatures of the cards, the reactions of the players, and the taste of Jake Danser’s bite of Chinese dinner 500 miles away), from the perspective of any of the players, is unbelievably small.

In situations where this question is asked, I always jokingly answer the question with “1″. Most of the time people laugh and let the question rest, but sometimes the conversation turns more serious. In this case I try (in all cases so far, in vain) to explain the plurality of perspective at play here. The “serious” answer I have for this question, doing my best to interpret the intended meaning of the statement while incorporating these issues, is a number I cannot calculate but that I can describe: it is the probability that the speaker would have been astonished by the event that occurred; essentially the probability that he would remark “what’s the probability of that?”. I think that is quite a bit higher than one in 270,000, so the event we witnessed was not as rare as the simple calculation would have us believe.

The dissonance of such situations points to a common simplistic view of probability: that events (in the colloquial sense) have probabilities. A distinction that is commonly understood (and commonly misunderstood) is that between frequentism, which talks about running an experiment a bunch of times and calculating the proportion of experiments that satisfied some criterion, and Bayesianism, which talks about the confidence one has in some property being true in terms of a subjective state of knowledge. This is a fascinatingly subtle distinction: they coincide on most but not all questions, and there is much argument about which one is “right” (I think that is a rather silly argument, as if a probability were something bestowed to us from nature). However, both of these views are calculating the probability of a property (a set of events) being true (happening), and both of these views rely on an assumption of prior information: the frequentists on the set-up of the experiment and the Bayesians on their prior state of knowledge about the world. The idea that an event has a single, objective probability says something very strong about the universe: that there is some essential, natural source of randomness (to dispel any quantum objections, I point to E.T. Jaynes’s critique of the view that the randomness in quantum theory is an essential property of nature). But even if there were some essential source of randomness, surely the universe is more determined than our probabilistic calculations assume: from the view of this universe, the deck cannot be uniformly shuffled after only seven shuffles because it knows how your hands and your brain work. So we were never talking about an essential, natural randomness.

In our normal use of probabilities, we don’t run into this problem because we are predicting the future: e.g. “what is the probability that Obama will be re-elected?”. We conceive of this as a single event, but it is actually a wide collection of events. Given some prior knowledge, this set of events has a definite probability. But we are inconsistent about how we treat future events and past events: past events are not collections — they happened, and they happened in exactly one way. From the perspective of now, the probability that it would happen any other way is zero. From the perspective of before the event, we are asking whether “it” would happen or not, but we are entirely unclear about what measurable property we mean by “it”. We mean something different when we refer to events in the future and events in the past.

In summary, probabilities are functions of (1) prior information and (2) a criterion for judging when it is to be considered true. Probability is meaningless to apply to a single event.

N.B. in probability theory, the word event is already defined to mean a set of outcomes. If you read this article with that definition in mind, you will have been very confused :-).

If this post helped you think more clearly about probability, show your appreciation and Flattr this.

So many philosophical pseudo-debates focus on the existence or non-existence of this or that “thing”. Pop-skepticism is at odds with most sects of Christianity about the existence of a God; many skeptics, somehow oblivious of the hypocrisy in which they engage, argue simultaneously that claims must be supported with evidence and that there must be no God. I engaged fleetingly with the university’s skeptics society in a debate about the existence of the electron, in which I argued that the electron was a mathematical tool that enjoyed the same level of existence as a number, and not so much existence as…

Fortunately for me, I did not have to complete the above thought, as the debate was on facebook so I was permitted not to respond once the level of abstraction exceeded me. Rather than the inevitable fate of a face-to-face debate on the subject — in which I would make a fool of myself for failing to possess a well-collected, self-consistent argument, my opponents permitting me to exit the arena now having failed to disrupt their conceptual status quo — the debate fizzled out, and they will probably not remember of their own volition that they had even engaged in it. It is all for the better that my medium has changed, since after some time spent meditating on the question, I have come across something I have not been able to distill into a snappy epigram.

To a “standard model” logical mind, and even to the working mathematician who has not studied logic, existence is a straightforward concept. One can ask whether a mathematical object exists with some property, and assume without argument that one is asking a reasonable question with a yes-or-no answer. However, in the world of mathematical logic — the only logical world whose paradoxes I can comfortably resolve — the notion of existence is rather more slippery. There are the standard objects which one can prove to exist from the axioms, and there are — or perhaps I should say, there are not — objects whose existence is contradictory. But there is a neglected middle class. These objects _____ whether or not you choose to exclude the middle.

The Twin Prime Conjecture (TPC), a famous question still open today in 2011, conjectures that there are infinitely many numbers p such that both p and p+2 are prime. One of these pairs is called a “twin prime”, for example 5 and 7, or 179 and 181. There are many who believe TPC is true, some who believe TPC is false, but among logicians (who crave this sort of result), many believe TPC is “independent of the axioms.” Let us explore the consequences of this latter belief. To be concrete (insofar as such a word can mean anything in such matters), let us suppose that TPC is independent of “ZFC”, the Zermelo Frankel axioms with the Axiom of Choice, the axioms of choice (no pun intended) for popular set theory.

It would be helpful to be reminded of what exactly ZFC is. Aside from the deep fantastic worlds of intuition inhabiting many mathematicians’ minds, it is merely a set of 9 statements about the world of sets. For example, “if two sets have the same members, then they are the same set”, and “given any set, you may form the subset of elements satisfying a particular property”. These are stated in rigorous, precise logical language, so by formal manipulation we can exclude the subtleties of meaning that would abound in any English presentation of these axioms. Logicians like to say that a proof is nothing more than a chain of formal logical sentences arranged according to some simple rules; this view has spread since the advent of programming languages and computerized mathematical assistants.

If TPC were true, then given any number, you could count up from that number and eventually reach a twin prime. If TPC were false, then there would be some number, call it L, above which it would not be possible to find any twin primes. However, since TPC is independent (because we have supposed it), then we know we cannot prove it either way. It may be true, or it may be false; whether there is a third option is too deep a philosophical question to explore here. We may be able to count up from any number and find a twin prime, but we will never be sure that we will not arrive at a point after which there are no more. Or there may in fact be an L above which there are no more, but we shall never be able to write L as a sequence of digits. Again, whether these two comprise all possibilities is not a matter capable of absolute resolution.

There can be no proof that L exists, so, like God to the skeptics, it must not exist. By their own standard, this conclusion is not justified, for, by our assumption, there is no evidence in favor of its nonexistence either. Indeed, we may safely believe in L; if a contradiction would arise from its use, then we could leverage that contradiction to provide a proof that there are infinitely many twin primes, thus TPC would have been provable. After centuries of cautious hypothesis of what would happen if L did exist, we may begin to treat L as any other number. As the ancient Greeks’ unease about the existence of irrational numbers has faded, so too would ours. The naturals would become: 1, 2, 3, 4, 5, … L, L+1, …. We will have answered questions about L, for example it is greater than one million, because have found twin primes greater than one million.

This all happens consistently with the proof that the set of natural numbers is made up of only the numbers 1, 2, 3, 4, 5, …, for that proof does not mean what we think it means. We cannot enumerate all the natural numbers in a theorem; that proof only states that the set of natural numbers is the smallest set made up of zero and successors of elements in that set. If we can actually find a twin prime above any number, but merely not know it, then we might claim L cannot be the successor of any element in this set. But this claim is false, because L is clearly the successor of L-1! L, whether or not or ___ it is one of the familiar numbers, manages to sneak its way into the smallest set containing zero and successors. It is not the set of numbers, but the language about numbers that can be extended by this independence of TPC, and L is not logically distinguishable from “regular” numbers. It is a symbolic phenomenon. But so, too, are the familiar numbers. The only difference is we have chosen to say that zero exists.

Flattr this

Quantum Crackpot in Training

I have been working my way through Volume III of Feynman’s lectures, the one on quantum mechanics. A few months ago I watched his Quantum Electrodynamics lectures for the lay public and I was fascinated by the beauty and simplicity of the presentation. Now I want to dig deeper.

The basic idea is summarized in the quote (can’t find its source, probably Feynman though :-): “Everything that can happen, does. Physics is then reduced to the problem of finding out what can happen.” This is not philosophical many-worlds garbage postulating the existence of infinitely many alternative universes (I will get to that), but instead the interpretation of the Lagrangian form: if you want to find the probability amplitude of some event, you just add up the amplitudes for all the different ways it could happen. The generality of the principle is astounding, and making only very weak additional assumptions it is possible to completely derive the workings of electrons and photons (except for the mass of the electron, which is still a mystery). The rule is not just for electrons and photons though; those are just the easiest kinds of particles to get at. The entire universe works this way: the amplitude of an event is the sum of all the ways (including classically absurd ones) it could happen.

In the beginning of my studies, I was constantly tripped up by my conception of time. In the double slit experiment, a photon interferes with a version of itself leaving the excited atom at a different time. It was very hard to picture this when I was still attached to my idea of time and causality. This is the logic of the universe, not the dynamics. That is, we aren’t really computing the amplitude of an event to happen so much as the amplitude that, given some assumptions are true, some other thing about the universe will be true. We phrase the double slit experiment like this: given that this atom is excited at t0, what is the amplitude that this other atom is exited at t1? There is no notion of happening or the flowing of time, it’s just a connection between statements about the universe. Realizing this was an important step in my understanding. Of course, the way that these two atoms are connected does involve time — that manifests itself in the different “ways it could happen” and thus affects the amplitude.

Ok, so we have this logic which connects facts about the universe together as amplitudes, which are complex numbers. How do we take these amplitudes and get some information we can use? The rule is: the probability of an event, er I mean, a fact, is proportional to the absolute square of the amplitude. Simple enough. So you set up an experiment and calculate the amplitudes for all the different ways it could come out (you have to calculate all the ways, because the probability is only proportional, so you need to normalize them so they sum to one — I find this unsatisfying). Then you do the experiment, and what actually happens at the end of the experiment is one of those ways, proportional to the absolute square of the amplitude for that way.

This is extremely unsatisfying to me. Almost all of the resources I have used for learning QM have described it this way and left it at that. I’m pretty sure it’s because nobody really knows the answer to the next question: when, exactly, do you take the absolute square? If you take it too soon, e.g. before “the experiment” is over, then you will lose the interference effects and do not get an accurate answer. But you can’t just delay taking it forever, because then you only ever have amplitudes, not probabilities. There is this arbitrary barrier between the “quantum” world and the “real” world, and that’s when you take the absolute square. This is intentionally ignoring the idea that your experiment apparatus, your measuring devices, etc. are all governed by the quantum logic above as well, because that is too hard to think about. This is the piece I am determined to understand; I am interested in QM philosophically, not practically, so it is not at all satisfying to me to say “it works in practice, get used to it.”

The theory of quantum decoherence provides half of the answer. It shows how this interpretation of the barrier is equivalent to the state of the experimental apparatus (including the state of you, the scientist performing the experiment) becoming entangled with what happened in the experiment. Eventually the whole universe gets entangled with the result of the experiment and that’s what “really happened”. God got a bunch of amplitudes for the way the universe could be; he took their absolute squares, rolled the dice, and picked one. Now the arbitrary boundary has been pushed out as far as it can go — to the edges of spacetime — instead of being between experiment and apparatus. Quantum decoherence shows a sort of compositionality of this quantum logic. This is getting more satisfying.

I love it because it is right on the edge of my ability to conceptualize. All the “decisions” in the entire universe could go this way or that, and if they both lead to the same thing and have opposite amplitudes, they could interfere with each other and make that thing impossible. It is because the universe is a chaotic system, that small changes give rise to large changes, that we can’t observe quantum interference on large scales. These little decisions are very unlikely to lead to the same state. Entropy gives rise to the classical world.

When I get really deep into philosophizing, I explode into the annoying considerations of consciousness. Perhaps God did not pick a universe at random, but our consciousness did. Our memory must conceive of time linearly, it would violate entanglement not to, and that’s why we think there is a single “chosen” universe instead of the explosion of all possibilities. But whether all possibilities exist or there is a single universe chosen at random is likely not an observable distinction, so it is merely fodder for pipe dreams.

If there were some device that could measure some things about the universe, without disturbance, set up in such a way as to negatively interfere with itself when its measurements were “undesirable”, it could potentially control the way the universe would go. Now you see where the title of this post comes from. I have not been able to sketch this device as a black box, nor fully understand why it should be impossible. I suspect it has something to do with the uncertainty principle, the derivation of which I have yet to completely understand.

Quantum Mechanics is fascinating to me, and I am trying to keep my mind open to the deep, philosophical, passionate curiosity it invokes without descending into the insanity of a quantum crackpot. It is a challenge.

Inspiring you to be a quantum crackpot too? Flattr this


I’m in a postmodernism class at university this semester. Despite the eyerolling of my peers when I mention the topic, I find it very interesting so far. Today’s class got me thinking about some math.

In the past, I have worked a lot with frameworks for doing programming and math. I wanted to find a fundamental system in which all my ideas could be expressed; I latched on to Martin Bunder’s IΞ and IG as being particularly simple and beautiful. I used to be a compulsive programming language designer, searching for simple, beautiful ideas with which to express all programs. I still have the urge now and then, but never follow through anymore.

In today’s pomo class, we were talking about the postmodern view of modernism (as I suspect we will be for quite some time): having some framework in which to conceive of the world, of all of history, and viewing that framework as fundamental. If that framework begins to crack, then we throw it out and make a new framework (calling the new one “modern” this time) in which to conceive of everything. Postmodernism views this cycle as one that continues indefinitely in this stage of history.

Some of the culture of mathematics follows this trend (ZFC, NBG, Löf type theory, intuitionism), but especially programming languages follow this trend. We are always in seek of new frameworks in which all programs can be expressed: structured programming (Dijkstra), object-oriented, functional. There is leakage, but programmers tend to throw out old models with cracks and adopt new ones as if there will be no more cracks. There seems to be a swing toward the functional recently — dare we imagine that it has any cracks?

At this point I suspect some of you are already developing your indignant responses. As programmers, we are entrenched in modernist ideals. I seem to be criticizing this cycle, and surely if I am criticizing then something must replace it. Whatever replaces it is going to be just another cycle. Indeed, as I indulge in my postmodernistic ramblings (which I assure you I have no confidence in — it is only the second week), in the back of my mind I keep imagining a language which transcends these problems. But of course it does not, because it is itself just another fundamental system. The key for me, and all of us trapped in this mode of thought, is merely to observe without trying to replace it with something better, something new.

Another possible response is that this cycle of reinvention is the only way progress can be made. That is a very good point, but in it we are taking for granted the assumption that progress exists and is desirable.

Programming is a difficult thing to talk about in this context, because it is a medium that we use to create things, and we can (at least roughly) measure how good we are at creating things in various frameworks. Being good at creating things is an implicit goal for a programming system, which comes out of the idea that creating things is progress and progress is good. Mathematics may have a nature that will make this exploration clearer, and then we can perhaps take our realizations about mathematics and apply them back to programming.

You might say that mathematics is about progress. After all, the ultimate pursuit of mathematics is to prove as many theorems as possible; or so those who study mathematical frameworks would have you believe. I have never been enamored with math as a competition to prove theorems; I like it because it alters (not “improves”) the way I think. I replace constructs with which I used to represent ideas with new ones. I used to think of real numbers as “fractions with extra stuff”, now I think of them as programs incrementally refining information. It has led me to think more topologically and less algebraically. Is more topologically better? I doubt it; it is just more recent. It permits seeing analogies that we could not previously see, but at the same time it hides analogies that we could previously see (though we do not notice that half, because we have already developed our intuitions about those using the previous framework). Mathematicians of tomorrow may only think topologically, and they will find more analogies by “discovering” algebraic thinking.

Programming is not so different. I do love functional programming, but that is because I have taken a lot of time to develop those ideas. There are object oriented programmers who are very talented at expressing ideas in that language, and I cannot understand their designs (or, perhaps more accurately, why their designs should be considered better than many nearby designs). Good ol’ procedural structured programming is still a good way to communicate to a computer how to do something fast. As the future drives forward, the past is erased; when it is forgotten its niche will re-emerge and it will be re-invented.

On why Goodstein sequences should terminate

In this post, I will give an intuition behind the ridiculous theorem that all Goodstein sequences eventually reach zero. I call it ridiculous because the construction of a Goodstein sequence is so strange that it’s surprising anything can be said about it at all, and that it accelerates upwards so quickly it is hard to imagine it ever coming back down. But after a little exploration, we will see what is happening with the structure of the numbers, and be able to picture an algorithm for computing how long it should take to come back down. Of course, the really interesting thing about Goodstein’s theorem is that it is equivalent to the consistency of Peano Arithmetic (and thus PA cannot prove it). I won’t give an intuition for this part, because I don’t fully understand it yet.

To define a Goodstein sequence, we need to define hereditary base-n notation, which I will denote H-n. To write a number is H-n, first write it in base n, and then (recursively) write all the exponents in H-n. So for example, to write 18 in hereditary base 2, we do:

24 + 21
222 + 21
2221 + 21

To generalize to larger bases, we rewrite any exponent that is greater than or equal to the base.

Now we can define a Goodstein sequence starting at n. First, write n in H-2. Now, change all the 2′s to 3′s in that representation, compute the number, subtract 1, and rewrite in H-3. Change all the 3′s to 4′s, compute the number, subtract 1, and rewrite in H-4. Continue forever or until you reach zero.

Let’s do an example. Let G be the Goodstein sequence starting at 4. I will start the sequence at index 2, so that the index in the sequence is the same as the base.

G(2) = 4 = 22
G(3) = 33 - 1 = 27 - 1 = 26 = 2 * 32 + 2 * 31 + 2
G(4) = 2 * 42 + 2 * 41 + 2 - 1 = 32 + 8 + 2 - 1 = 41 = 2 * 42 + 2 * 41 + 1
G(5) = 2 * 52 + 2 * 51 + 1 - 1 = 50 + 10 + 1 - 1 = 60 = 2 * 52 + 2 * 51
G(6) = 2 * 62 + 2 * 61 - 1 = 72 + 12 - 1 = 83 = 2 * 62 + 61 + 5
G(7) = 2 * 72 + 71 + 5 - 1 = 98 + 7 + 5 - 1 = 109 = 2 * 72 + 71 + 4
G(11) = 2 * 112 + 111 - 1 = 242 + 11 - 1 = 252 = 2 * 112 + 10
G(21) = 2 * 212 - 1 = 882 - 1 = 881 = 212 + 20 * 211 + 20
G(41) = 412 + 20 * 411 - 1 = 1681 + 820 - 1 = 2500 = 412 + 19 * 411 + 40

And so on. So the sequence goes 4, 26, 41, 60, 83, 109, … (A056193). Those who have followed this example closely may already be seeing why this sequence will eventually terminate.

Let’s introduce a device to understand what is happening to the structure of these numbers. At each step, most of the H-n structure stays the same, the base is just increasing. So let’s write each step with a symbol, say ω, instead of the increasing variable. We will see what to do with the fringes in a bit. I will be essentially copying the last representation from above. This example would begin:

G(2) = ωω
G(3) = 2 * ω2 + 2 * ω1 + 2
G(4) = 2 * ω2 + 2 * ω1 + 1
G(5) = 2 * ω2 + 2 * ω1
G(6) = 2 * ω2 + ω + 5
G(7) = 2 * ω2 + ω + 4
G(10) = 2 * ω2 + ω
G(11) = 2 * ω2 + 10

Ah! Now the sequence looks much more regular! At each step we are simply subtracting 1, and if there is an ω at the right, we are replacing it with a finite number. The number we replace it with keeps growing with the index of the sequence, but each time it is finite, which “deconstructs” the structure a little bit. So even though the numbers are growing, the structure of the H-n representation will never be more “complex” than it was in the first step. I haven’t defined complex; for the sophisticates, it is the ordering on Cantor normal forms of ordinal notations.

We haven’t yet seen what happens with limits that are more complex than ω. For example, at some point, the G sequence will reach:

G(n) = n2
G(n+1) = (n+1)2 - 1 = n * (n+1) + n

In our ω-notation:

G(n) = ω2
G(n+1) = n * ω + n

And indeed the structure has become a little simpler. I’m struggling to describe it in words, but I hope these examples have demonstrated how this sequence is always “decreasing in structure”, which is really all there is to Goodstein’s theorem.

A few technical notes: The H-2 representation of all possible starting numbers correspond to the Cantor normal forms, which are ordinal notations involving finite exponentiations of ordinals involving ω. So for Goodstein’s theorem to work, we need to know that the Cantor normal forms are indeed well-ordered, which is equivalent to ε0 (the supremum of all the Cantor normal forms; the limit of ω, ωω, ωωω, …) being well-ordered. There is a hint of the deep connection to ε0, I wish I could say more about it.