# System IG Semantics

I don’t have much time, so here’s just a quick jot. I don’t think System IG can prove this rather obvious proposition:

    |- GL(\a. Ga(Ka)) (\a x. x)


Or translated to more conventional notation: (\a x. x) : (a : Type) -> (a -> a), the type of the polymorphic identity function. In order to prove this, we need the axiom LL (conventionally, Type : Type). But that axiom is inconsistent. So I’ve tweaked the rules a bit to allow the above to be proven, without allowing LL. It’s possible that these tweaks are still consistent; they don’t admit Girard’s paradox, at least.

The rules I’m modifying can be found on page 8 of the above paper, in the IG box. My tweak is, changing rule Gi from:

    ?, Xx |- Yx(Zx)  ;  ? |- LX  ;  x not in FV(?,X,Y,Z)   =>   ? |- GXYZ


To:

    ?, Xx |- Yx(Zx)  ;  ? |- L(GXY)  ; x not in FV(?,X,Y,Z)  =>  ? |- GXYZ


I’m saying that you need to prove that GXY (conventionally, (x:X) -> Y x) is a type before you can say something has that type. Without more modifications, this is equivalent to the original. However, now I add two new rules:

    ? |- L(GL(KL))
? |- GL(KL)p  =>  ? |- L(GLp)


These say: (Type -> Type) : Type, and if p : Type -> Type then ((x:Type) -> p x) : Type. The latter one is the one I’ve heard is still consistent, but you can’t really do anything with it without the former. So, it needs further investigation.

Trying to prove the consistency of this new system, I need a semantics for terms. If I can model these semantics in CIC (the calculus used by Coq), then if CIC is consistent, so is this system. My idea for semantics is as follows:

Bare lambda terms don’t really have semantics; they are modeled by their syntax. But typing proofs have the semantics of whatever they are proving as well-typed. If I use [-] as the meaning function, then the semantics of [Lx] is a type (because it is a proof that x is a type). Formally:

    [Lx] : Type
[Gxyz] : (a:[Lx]) -> [Gx(KL)y] a


Simplified, the latter means that the semantics of a proof that x has type y -> z is a function from y to z. (Rather, it’s the dependent version of that). These semantics motivate the axioms I am adding.

Okay, that’s all the time I have!

# Domain Convergence

Okay, how is this for a definition of convergence:

An infinite sequence xs :: [()] converges if there exists an n such that for every m > n, xs !! m = xs !! n. In other words, this is a bunch of programs which either halt or don’t, and after some point, either they all halt or they all don’t.

Then an infinite sequence xs :: [a] converges if map f xs converges for every f :: a -> ().

Sound good?

# Continuous Stream Fusion

I’ve been able to sketch-implement a version of FRP with a strange distinction: Env vs. Behavior. I’ll give you the intuition, then the rest of the article will develop the semantics more fully. I thought “backwards” to discover this (the word “discover” used under the assumption that it has some merit: questionable), from implementation, to concept, to semantics.

The semantics of Env a are plain ol’ Time -> a, but with a caveat: there is no reactivity allowed. I.e. an Env a yields an a only by looking at stuff “now”. In the implementation, this is just a subtype of IO. The semantics of Behavior are even less defined. I guess it’s semantics are Time -> Time -> a, so it is given a “start time” and returns a time function. It’s sortof like a relative behavior, but shifting it around does not exactly preserve behavior. The implementation is much like Reactive (Env a), where Reactive is from the popular reactive library. But through this exploration I am led to a very different implementation.

To formalize this silliness, first we need to add some composability and define these as transformations. Let’s just fix an e for the whole discussion, and then say the meaning of Env a is (a subtype of) (Time -> e) -> (Time -> a). Now it’s easy to say what we mean by not allowing reactivity: an Env a is a function f of the form f r = g . r for some g. So to determine the value of f at time t, you look at its environment at time t and do something to the result. No peeking at other points in its environment.

For Behavior, I want a scanl, but on continuous functions. You can think of this as a generalization of integration. For a continuous scan to be meaningful, I require its argument to be a continuous function f of type Real -> a -> a, such that f 0 = id. By continuity I mean Scott-continuity: treat Real as an infinite-precision computable real, and then any definable total function is continuous.

Then I can find the integral function ∫f on the interval [0,1] by computing the limit:

  f 0.5  .  f 0.5
f 0.25  .  f 0.25  .  f 0.25  .  f 0.25
...


There is no guarantee that this will converge, or uniformly converge, or any of that good stuff. But we will cover our ears and pretend that it always does.

If f :: Real -> Bool -> Bool, once our precision goes past the modulus of uniform continuity of f (the precision of the information that f will never look at to determine what to do), then ∫f will be constant. So there is some notion of continuity preserved.1

Okay, now we can say what a Behavior is! The meaning of Behavior a is a transformation f :: (Time -> e) -> Time -> Time -> a of the form — hold your breath (I’ll explain) — f r t0 = i . ∫ (j r) (k (r t0)), for functions i,j,k, where j is an Env function with the scan restriction j 0 = id.

Let’s dissect: First, k (r t0) is the initial value when the Behavior is started, which depends on the environment exactly when the Behavior is started and no other time. j r is the integrand, which takes into account the “previous” value and the current environment (only current because j must be an Env function). Finally, the i . out front is so we can compute things more complex than their final type; i.e. a Behavior Bool could have more state under the hood than just a boolean.

This definition is very much like Haskell stream fusion; we take a recursive definition and “flatten it out” into a state and an iterator function. By imposing a simple constraint on the iterator function, we have retrieved continuity of the generated stream, and can also perform continuous stream fusion in the implementation!

I still haven’t figured out how to work events back into this model.

1 I’m committing a blatant foul here. Bool -> Bool is a discrete space, so any continuous function from Real to it has to be constant to begin with. So that paragraph was pretty much meaningless. I still liked it to get a computational handle on things.

# Reactive spaces

My recent days have been spent staring at the ceiling, drawing abstract doodles on a whiteboard, or closing my eyes and watching higher-dimensional images fly through my consciousness. No, I haven’t been on drugs. I’m after a very specific piece of mathematics, to solve a specific problem. But I have no idea what that mathematics is.

I’m after the sameness between functions of time and functions that react to events. Here is my best attempt at putting my images into words:

I will call them (generalized) Reactives, to confuse classical FRP people :-). A Reactive is defined over an environment space, which is a set of environments (with some structure). Time is an environment space; it is uniform and boring, so all of its environments are identical or isomorphic. There is also an Event environment space, whose inhabitants are roughly Events from reactive (a set of separated occurrences).

A Reactive takes an environment and outputs values in terms of it somehow. Reactives have a notion of translation. Say you have a reactive over an event space which is “False until the next mouse click and then True”. Translating this switches which mouse click it is talking about, but not when the mouse clicks occur; so the transition point will always be exactly on an external mouse click. However, since time is a uniform space, translation of a reactive over time does correspond to simple translation, since there is no interesting structure to query.

I don’t know yet what exactly an environment is. I am trying to capture the fact that reactives over an event space can only switch on occurrences of events, whereas reactives over time correspond to continuous functions. If an event environment looks like an FRP Event, what does the time environment look like?

# Compact data types

Join me, fellow readers, as I learn some topology by pretending I know something about it.

I recently uploaded Martin Escardó’s infinite search monad to hackage, under the name infinite-search. It is a delightful little thing to play with, and has inspired some deeper thoughts about the connection between computation and topology.

Quick refresher on topology: a topological space is a set X together with a collection of subsets of X, called the open subsets. The open subsets must include the empty set and the entirety of X itself, and must be closed under arbitrary unions and finite intersections.

So for example, you can generate the standard topology on the real line by taking all the open intervals (a,b) and closing them under arbitrary unions. So for example, the set $\cup \{ (a-\frac{1}{4},a+\frac{1}{4}) | a \in \mathbb{Z} \}$, which is ball of radius 1/2 around every integer, is open. No solitary points are open; every point must have an arbitrarily small cloud of nearby points.

As another example, there is a topology on the natural numbers called the discrete topology, in which you consider every singleton set to be an open set and take arbitrary unions over those. In this topology, every subset of the naturals is open.

A space is compact if “every open cover has a finite subcover”. That means for any collection S of open sets whose union is the entire space, there is a finite collection $S' \subseteq S$ whose union is also the entire space. Obviously all finite topologies are compact, because there are no infinite collections to begin with.

So the discrete topology on the naturals I talked about is not compact. Consider this cover: $\{ \{n\} | n \in \mathbb{N} \}$; just all the singleton open sets. This obviously has no finite subcollection which also unions to the entire set of naturals (if you leave out even one of these sets it fails to be a cover).

Okay, on to computation. Roughly what Martin Escardó’s work says is that compact spaces can be exhaustively searched. That is, if S is a compact space, then given a total function p :: S -> Bool, we can either compute an element of S satisfying p or compute that there is no such element. If we could figure out a way to form topological spaces on our data types, then we can know whether they are exhaustively searchable (and construct algorithms to search them).

A domain is the set of all elements of a data type, including partial ones (with ⊥s in them). So the domain for Bool is { ⊥, True, False }, and the domain for Either () is { ⊥, Left ⊥, Left (), Right ⊥, Right () }. To construct a topology (the Scott topology?) for the type, generate it by the sets of all compatible elements with some finite element of the domain. So the open sets for our Either () would be { { Left (), Right () }, { Left () }, { Right () } }. We also toss in the empty set, since it is the union of zero of these.

Earlier we saw that the naturals were not compact. This is a good sign, because if we could exhaustively search the naturals, then we could write an algorithm to eg. decide the twin-prime conjecture. However, I will now show that the lazy naturals are compact.

The lazy naturals are the data type:

data Nat = Zero | Succ Nat


What is the set of elements of this domain, and what is the topology? The set of elements is just the regular naturals plus the special one let infinity = Succ infinity. The topology has all the regular finite naturals as singleton sets, just like the discrete topology, but it also has subsets generated by { ⊥, Succ ⊥, Succ (Succ ⊥), … }. That is, sets of all of them greater than some natural including infinity. The key is that the singleton { infinity } itself is not open, it only comes from one of these infinite sets.

Now let’s say you have a cover for Nat. It includes infinity, so it must include the a set { n | n > k for some k }. So we pick that one, and then we pick however many sets we need to cover all the naturals less than k; there will be at most k of them. So we have constructed a finite subcover.

So regular naturals are not compact, but Nat is. So regular naturals are not searchable, but Nat is. But we got Nat only by adding to the regular naturals; how did we make it searchable in the process?

There was a little snag in the definition of searchable above: we needed a total predicate. Many functions which we would not be able to search on the regular naturals would get into an infinite loop if we passed the function infinity, and that would mean the predicate is not total. Intuitively, that means we can only search on predicates which are bounded in the size of the natural; which stop looking at some point.

I conjecture that every ADT in Haskell (without strict fields) is compact. It is pretty easy to write combinators which construct Data.Searchable Sets for any ADT; whether they are proper compact Sets (i.e. the search function is in fact total) is another issue.

So that’s it. We can make a compact set of Naturals and exhaustively search them. For fun, here’s a little Haskell implementation demonstrating this:

import Data.Searchable
data Nat = Zero | Succ Nat

nat :: Set Nat
nat = singleton Zero union fmap Succ nat

-- decidable equality on functions from Nat!
eqnatf :: (Eq a) => (Nat -> a) -> (Nat -> a) -> Bool
eqnatf f g = forevery nat \$ \n -> f n == g n


Go ahead, write a Num instance for Nat and play around. Just remember that your predicates need to halt even when given infinity, otherwise they are not total. In particular, this means that you can’t use eqnatf on Nat -> Nat, because equality on Nat itself is not total (infinity == infinity loops).

# The problem with Coq

I’ve written about an idea for a graphical interface to dependent types before. I just played with Coq a little bit, and it’s really fun. I spend 45 minutes proving that every number can be written as either 2n or 2n+1. A feeling of accomplishment washes over me as I finish proving the most substantial result in mathematics history!

Still, the fun of using this system only makes me want to explore this idea more! I like the presentation in the right panel, but I hate the interaction with it. I have to memorize the syntax for a bunch of tactics, typing sequential commands for something that are essentially non-sequential transformations. It’s hard to go back and forth between programming and proving, it’s hard to divert your attention elsewhere for a while. An interface where I could just right click on a hypothesis and see all the ways I could productively transform it would be great in itself.

Damnit! I want a good flexible gui library! I want reactive to be working!

# All functions are continuous, always

Dan Piponi and Andrej Bauer have written about computable reals and their relationship to continuity. Those articles enlightened me, but only by way of example. Each of them constructed a representation for real numbers, and then showed that all computable functions are continuous on that representation.

Today, I will show that all functions are continuous on every representation of real numbers.

I’m going to use the following definition of analytic continuity:

f is continuous if for any chain of open sets x1 ⊇ x2 ⊇ …, i f[xi] = f[∩i xi]

Where ∩ denotes the intersection of a set of sets, and f[x] denotes the image of f under x (the set {f(z) | z in x}).

This means that for a continuous function, the intersection of a bunch of images of that function is the same as the image of the intersection of the sets used to produce those images (whew!). It might take you a little while to convince yourself that this really means continuous in the same way as it is normally presented. The proof is left as an exercise to the reader (yeah, cop-out, I know).

I chose this definition because it is awfully similar to the definition of Scott continuity from domain theory, which all computable functions must have.

A monotone function f is scott-continuous if for any chain of values x1 ⊑ x2 ⊑ …, supi f(xi) = f(supi xi).

Where ⊑ is the “information” partial ordering, and sup is the supremum, or least upper bound, of a chain. Analogous to the last definition, this means that the supremum of the outputs of a function is the same as the function applied to the supremum of the inputs used to create those outputs.

What I will do to show that every computable function is continuous, no matter the representation of reals, is to show that there is a homomorphism (a straightforward mapping) from Scott continuity to analytic continuity.

But first I have to say what it means to be a real number. It turns out this is all we need:

fromConvergents :: [Rational] -> Real
toConvergents :: Real -> [Rational]


These functions convert to and from infinite convergent streams of rationals. They don’t need to be inverses. The requirement we make is that the rational at position n needs to be within 2-n of the actual number represented (the same as Dan Piponi’s). But if a representation cannot do this, then I would say its approximation abilities are inadequate. To make sure it is well-behaved, toConvergents(fromConvergents(x)) must converge to the same thing as x.

Now we will make a homomorphism H from these Reals (lifted, so they can have bottoms in them) to sets (precisely, open intervals) of actual real numbers.

Range will be a function that maps lists to a center point and an error radius.

Range(⊥) = ⟨0,∞⟩
Range(r:⊥) = ⟨r,1⟩
Range(r:rs) = ⟨r’,e/2⟩ where ⟨r’,e⟩ = Range(rs)

And now H:

H(x) = (r-e,r+e) where ⟨r,e⟩ = Range(toConvergents(x))
H(⊑) = ⊇
H(sup) = ∩
H(f) = H ° f ° fromConvergents ° S, where S(x) gives a cofinal sequence of rational numbers less than x that satisfies the error bound requirement above.

The hard part of the proof is H ° f ° fromConvergents = H(f) ° H ° fromConvergents for fully defined inputs; in other words that the homomorphism does actually preserve the meaning of the function. It boils down to the fact that H ° fromConvergents ° S is the identity for fully defined inputs, since Range(x) is just {lim x} when x is fully defined. I expect some of the details to get a bit nasty though. Left as an exercise for a reader less lazy than the author.

And that’s it. It means when you interpret f as a function on real numbers (namely, H), it will always be continuous, so long as the computable real type you’re using has well-behaved toConvergents and fromConvergents functions.

Intuitively, H maps the set of convergents to the set of all possible numbers it could represent. So ⊥ gets mapped to the whole real line, 0:⊥ gets mapped to the interval (-1,1) (all the points within 1 of 0), etc. The analytical notion of continuity above can be generalized to any function on sets, rather than just f[] (the image function of f). This means we can define continuity (which is equivalent to computability) on, for example, functions from Real to Bool.

This was a fairly technical explanation, where I substituted mathematical reasoning for intuition. This is partially because I’m still trying to truly understand this idea myself. Soon I may post a more intuitive / visual explanation of the idea.

If you want to experiment more, answer: what does it mean for a function from Real -> Bool to be continuous?

# The Curry-Howard isomorphism and the duality of → and ×

To people who are familiar with the Curry-Howard isomorphism, this post will probably be trivial and obvious. However, it just clicked for me, so I’m going to write about it. I had passively heard on #haskell that → and × (functions and tuples) were dual, so I played with types a bit, finding dual types, and it never really came together.

I’m not actually sure what I’m talking about here is exactly the CH isomorphism. I’ve been thinking about dependent types, and converting propositional formulae into that calculus. But if it’s not, it’s close.

Let’s use some sort of type bounded quantification in our logic. I was thinking about the following formula:

$\forall n \in \mathbb{N} \, \exists p \in \mathbb{N} \, (p > n \wedge \text{Prime}(p) \wedge \text{Prime}(p+2))$

The corresponding type is:

$(n : \mathbb{N}) \rightarrow [ (p : \mathbb{N}) \times (p > n) \times \text{Prime}\, p \times \text{Prime} (p+2) ]$

In other words, the proposition “for every natural, there is a twin prime greater than it” corresponds to the type “a function which takes a natural and returns a twin prime greater than it”. × corresponds to existentials, because it is the proof: it tells you which thing exists. The first element of the tuple is the thing, the second element is the proof that it has the desired property.

It’s interesting that ∀ corresponds to →, since logical implication also corresponds to →. The difference is that the former is a dependent arrow, the latter is an independent one. Beautifully in the same way ∃ corresponds to ×, and so does ∧.

Oh right, the duality. Knowing this, let’s take the above type and negate it to find its dual.

$(n : \mathbb{N}) \times [ (p : \mathbb{N}) \rightarrow (\neg (p > n) + \neg\text{Prime}\,p + \neg\text{Prime}(p + 2)) ]$

i didn’t bother expanding the encoding of not on the inner levels, because it doesn’t really make anything clearer.

The dual is a pair: a number n and a function which takes any number and returns one of three things: a proof that it is no greater than n, a proof that it is not prime, or a proof that the number two greater than it is not prime. Intuitively, n is a number above which there are no twin primes. If this pair exists, the twin prime conjecture is indeed false.

So yeah, that’s how → and × are dual. It’s pretty obvious now, actually, but it took a while to make sense.

# Set Selectors

I am writing a poker game, and I got mildly annoyed when I went to write the hand classification functions. There was a disparity between the specification and implementation of poker hands; I had to come up with an algorithm to match each type. I didn’t like this, I want the code to match the specification more directly.

This is quite a geeky post. The types of poker hands are very unlikely to change, and the amount of time I’ve spent thinking about this problem already is many times that of solving it directly in the first place. I.e. it would be stupid for someone to pay me by the hour to solve it this way. Still, it gives rise to an interesting generalization that could be very useful.

I decided that the way I would like to specify these things is with “set selectors” over logical expressions. That is, given a finite set U, find a subset R of U such that some logical expression holds in R (i.e. all quantifiers are bounded on R).

This has a straightforward exponential time solution. I’m trying to do better.

I started by classifying logical expressions. In the following, let P(…) be quantifier-free.

• $\exists x P(x)$ is straightforward $O(n)$.
• More generally, $\exists x_1 \exists x_2 \ldots \exists x_k P(x_1, x_2, \ldots x_k)$ is straightforward $O(n^k)$.
• $\forall x P(x)$ is also $O(n)$ to find the largest solution (because the empty set would satisfy it, but that’s not very interesting).
• $\exists x \forall y P(x,y)$ has an obvious solution, same as $\exists x. P(x,x)$. There is no unique largest solution, but there is a unique largest for each x which can be found in $O(n^2)$ time. It’s unclear what the library should do in this scenario.
• $\forall x \forall y P(x,y)$ is called the Clique problem and is NP-complete! Damn!

But the most interesting one so far is the case: $\forall x \exists y P(x,y)$. It turns out that there is a unique largest solution for this, and here is an algorithm that finds it:

Given a finite set U, find the largest subset R such that $\forall x \! \in \! R \, \exists y \! \in \! R \, P(x,y)$.

Let $r_0 = U, r_{n+1} = \{ x \in r_n | \exists y\!\in\!r_n \, P(x,y) \}$. That is, iteratively remove x’s from r that don’t have corresponding y’s. Then define the result $R = \bigcap_i r_i$.

Lemma. There is a natural number $n_f$ such that $r_{n_f} = R$.

Proof. Follows from finite U and $r_{n+1} \subseteq r_n$.

Theorem. $\forall x \! \in \! R \, \exists y \! \in \! R \, P(x,y)$.

Proof. Given $x \in R = r_{n_f} = r_{n_f + 1}.$ Thus there exists $y \in r_{n_f} = R$ such that P(x,y), by the definition of $r_{n_f+1}$.

Theorem. If $R^\prime \subseteq U$ and $\forall x \! \in \! R^\prime \, \exists y \! \in \! R^\prime \, P(x,y)$, then $R^\prime \subseteq R$.

Proof. Pick the least n such that $R^\prime \not\subseteq r_n$. There is an $x \in R^\prime$ with $x \not\in r_n$. The only way that could have happened is if there were no y in rn-1 with P(x,y). But there is a y in R’ with P(x,y), so $R^\prime \not\subseteq r_{n-1}$, contradicting n’s minimality.

The time complexity of this algorithm can be made at least as good as $O(n^2 \log n)$, maybe better.

While that was interesting, it doesn’t really help in solving the general problem (which, I remind myself, is related to poker, where all quantifiers will be existential anyway!). The above algorithm generalizes to statements of the form $\exists w_1 \exists w_2 \ldots \exists w_j \forall x \exists y_1 \exists y_2 \ldots \exists y_k \, P(w_1,w_2,\ldots w_j,x,y_1,y_2, \ldots y_k)$. Each existential before the universal adds an order of magnitude, and I think each one after does too, but I haven’t thought it through.

In fact, I think that, because of the clique problem, any statement with two universal quantifiers will take exponential time, which means I’m “done” (in a disappointing sort of way).

Back to the real world, I don’t like that finding a flush in a hand will take $O(n^5)$ time (16,807 checks for a 7-card hand, yikes), when my hand-written algorithm could do it in $O(n)$. I’m still open to ideas for specifying poker hands without the use of set selectors. Any ideas?

# What’s a natural transformation?

For some reason I’ve had a great amount of trouble learning category theory. I think a lot of that is that most of the literature sucks. But I don’t blame it, in fact I find it encouraging, because it indicates that there is a brain-rewiring involved: people who know category theory cannot teach it, because as a consequence of learning it, they are thinking in a fundamentally different way.

However, thanks to the Stanford Encyclopedia of Philosophy, a resource which consistently provides good intuition for mathematics when the literature is otherwise impenetrable, I think I finally get natural transformations.

I always heard that a natural transformation is “just like a polymorphic function”. I never really got that, especially since wikipedia said a natural transformation acted on functors but a function is a morphism (in Hask).

Let’s work in the context of an example. I’ll look at concat :: [[a]] -> [a] because it has only one argument and only one type variable, so that will simplify things for us. Let’s also recognize that there are two functors appearing here: [] on the right and []°[] on the left.

Here’s the formal definition, which I wish to decode: A natural transformation η relates two functors (F and G) with identical source and target. It maps objects in the source category to morphisms in the target category. For every morphism $f: X \mapsto Y$ in the source category, $\eta_Y \circ F(f) = G(f) \circ \eta_X$.

So here F is []°[] and G is []. Our natural transformation concat must associate objects (types) to morphisms (functions) and satisfy the above equation. To expand that equation, let’s realize that [](f) (where [] is the functor) is map f and []°[](f) is map (map f). These are part of the definitions of these functors, and in Haskell correspond to the definition of fmap.

The equation is thus: for all functions f :: x -> y, concaty . map (map f) = map f . concatx. Holy smokes! That’s the free theorem for concat! (note: that paper was another that I never really grokked; maybe I’m grokking it now)

If I could go back to last week and help myself understand, I would have said: a polymorphic function in Haskell is a collection of morphisms in Hask, not a single one, and the equation above is what guarantees that it’s actually parametrically polymorphic (doesn’t care what type it is).

So the following pseudohaskell would not define a natural transformation:

concat' :: forall a. [[a]] -> [a]
concat'
| a ~ Int   = const []
| otherwise = concat


Because the equation fails to hold for, say, show :: Int -> String: concat'String . map (map f) /= map f . concat'Int because concatInt = const []. That makes sense, because concat' is not parametrically polymorphic.

A question I have is what a polymorphic function is when it’s acting on type constructors that aren’t functors. Is that just a “polymorphic function” and doesn’t necessarily have a categorical parallel?

I guess the weirdest thing about category theory is how natural trasformation was defined. I think I have an intuition for what it is now, but I would never fathom defining it that way. Thinking of objects only as their relationships to other objects is a mind bender.