Compact data types

Join me, fellow readers, as I learn some topology by pretending I know something about it.

I recently uploaded Martin Escardó’s infinite search monad to hackage, under the name infinite-search. It is a delightful little thing to play with, and has inspired some deeper thoughts about the connection between computation and topology.

Quick refresher on topology: a topological space is a set X together with a collection of subsets of X, called the open subsets. The open subsets must include the empty set and the entirety of X itself, and must be closed under arbitrary unions and finite intersections.

So for example, you can generate the standard topology on the real line by taking all the open intervals (a,b) and closing them under arbitrary unions. So for example, the set \cup \{ (a-\frac{1}{4},a+\frac{1}{4}) | a \in \mathbb{Z} \}, which is ball of radius 1/2 around every integer, is open. No solitary points are open; every point must have an arbitrarily small cloud of nearby points.

As another example, there is a topology on the natural numbers called the discrete topology, in which you consider every singleton set to be an open set and take arbitrary unions over those. In this topology, every subset of the naturals is open.

A space is compact if “every open cover has a finite subcover”. That means for any collection S of open sets whose union is the entire space, there is a finite collection S' \subseteq S whose union is also the entire space. Obviously all finite topologies are compact, because there are no infinite collections to begin with.

So the discrete topology on the naturals I talked about is not compact. Consider this cover: \{ \{n\} | n \in \mathbb{N} \}; just all the singleton open sets. This obviously has no finite subcollection which also unions to the entire set of naturals (if you leave out even one of these sets it fails to be a cover).

Okay, on to computation. Roughly what Martin Escardó’s work says is that compact spaces can be exhaustively searched. That is, if S is a compact space, then given a total function p :: S -> Bool, we can either compute an element of S satisfying p or compute that there is no such element. If we could figure out a way to form topological spaces on our data types, then we can know whether they are exhaustively searchable (and construct algorithms to search them).

A domain is the set of all elements of a data type, including partial ones (with ⊥s in them). So the domain for Bool is { ⊥, True, False }, and the domain for Either () is { ⊥, Left ⊥, Left (), Right ⊥, Right () }. To construct a topology (the Scott topology?) for the type, generate it by the sets of all compatible elements with some finite element of the domain. So the open sets for our Either () would be { { Left (), Right () }, { Left () }, { Right () } }. We also toss in the empty set, since it is the union of zero of these.

Earlier we saw that the naturals were not compact. This is a good sign, because if we could exhaustively search the naturals, then we could write an algorithm to eg. decide the twin-prime conjecture. However, I will now show that the lazy naturals are compact.

The lazy naturals are the data type:

data Nat = Zero | Succ Nat

What is the set of elements of this domain, and what is the topology? The set of elements is just the regular naturals plus the special one let infinity = Succ infinity. The topology has all the regular finite naturals as singleton sets, just like the discrete topology, but it also has subsets generated by { ⊥, Succ ⊥, Succ (Succ ⊥), … }. That is, sets of all of them greater than some natural including infinity. The key is that the singleton { infinity } itself is not open, it only comes from one of these infinite sets.

Now let’s say you have a cover for Nat. It includes infinity, so it must include the a set { n | n > k for some k }. So we pick that one, and then we pick however many sets we need to cover all the naturals less than k; there will be at most k of them. So we have constructed a finite subcover.

So regular naturals are not compact, but Nat is. So regular naturals are not searchable, but Nat is. But we got Nat only by adding to the regular naturals; how did we make it searchable in the process?

There was a little snag in the definition of searchable above: we needed a total predicate. Many functions which we would not be able to search on the regular naturals would get into an infinite loop if we passed the function infinity, and that would mean the predicate is not total. Intuitively, that means we can only search on predicates which are bounded in the size of the natural; which stop looking at some point.

I conjecture that every ADT in Haskell (without strict fields) is compact. It is pretty easy to write combinators which construct Data.Searchable Sets for any ADT; whether they are proper compact Sets (i.e. the search function is in fact total) is another issue.

So that’s it. We can make a compact set of Naturals and exhaustively search them. For fun, here’s a little Haskell implementation demonstrating this:

import Data.Searchable
data Nat = Zero | Succ Nat

nat :: Set Nat
nat = singleton Zero `union` fmap Succ nat

-- decidable equality on functions from Nat!
eqnatf :: (Eq a) => (Nat -> a) -> (Nat -> a) -> Bool
eqnatf f g = forevery nat $ \n -> f n == g n

Go ahead, write a Num instance for Nat and play around. Just remember that your predicates need to halt even when given infinity, otherwise they are not total. In particular, this means that you can’t use eqnatf on Nat -> Nat, because equality on Nat itself is not total (infinity == infinity loops).

6 thoughts on “Compact data types

  1. First a small correction: when you define the Scott topology you say “To construct a topology (the Scott topology?) for the type, generate it by the sets of all compatible elements with some finite element of the domain.” That should read: “generate it by the sets which are above all some finite element of the domain”. The word “compatible” means “have common upper bound”, which would give you more than you want.

    As far as your conjecture is concerned, all Scott domains are compact, quite obviously: if you have an open cover, then some element of the cover already covers the bottom element, but then it covers the whole space (because open sets are upper sets).

    But this is not quite what Martin was talking about. He considers compactness with respect to clopen sets (simultaneously open and closed), not open ones. I think in the case of Kleene-Kreisel types, as he calls them, it does not make a difference. Also, “compact” is not the same as searchable. “Compact and overt” is the same as searchable.

  2. Thanks for your clarifications, Andrej.

    For my conjecture, I’m thinking in the modeled type, with possibly infinite elements and no bottoms. The domain structure induces the topology, but bottoms are not included in that topology. In that way, the strict naturals:

    data Nat = Zero | Succ !Nat

    have a discrete topology because it is a flat domain.

    I’m just barely starting to learn about ASD, but the literature is a bit over my head (I’m trying though!). Maybe I’ll know what overt means in a few weeks :-)

  3. Just a few remarks.

    (1) Here is another way of seeing that the total elements of the lazy natural numbers (including infty) is searchable. (Topologically, this set (with the relative Scott topology), is the so-called one-point compactification of the discrete natural numbers.

    surjection :: [Bool] -> Nat
    surjection(False:xs) = Succ(surjection xs)
    surjection(True:_) = Zero

    nat’ :: Set Nat
    nat’ = image surjection cantor

    where cantor is defined as in the cited post.

    (2) For your conjecture about searchability of (total) elements of domain equations (data declarations in Haskell) to be true, you will have to impose restrictions, for two reasons:

    (i) For data declarations such as

    data D = Function(D -> Bool)

    involving so called contravariant occurrences of the defined type, Gordon Plotkin showed that there is no notion of totality (I heard this in a talk in 1997 in a Domains Workshop in Munich, but I don’t think he published that).

    (ii) So assume that you only have covariant occurrences (i.e. only on the righthand side of an arrow). You will still have problems. Consider e.g.

    data Tree = Branch(Integer -> Tree)

    The total elements are the well founded omega-branching trees. But this cannot be searchable as it is not compact.

    So you have to restrict further. One good candidate are the so-called polynomial functors (defined using finite products and sums). Now if you look for the initial solution of the domain equation then you get a minimal notion of totality, and if you look at the final solution (which includes the previous elements plus infinite objects), you get a maximal notion of totality. For the second one, your conjecture ought to be true.

    (3) Regarding searchable = compact + overt, claimed by Andrej, we have to be careful. I think he has in mind his joint work with Paul Taylor. But now there are a few differences, which certainly need to be clarified. Firstly their work is in the framework of ASD (I understand a bit about this, but I am far from an expert, and Andrej knows much more about that), which at the moment doesn’t apply to Haskell (but this may change in the near future). On the other hand, the work here is in the classical framework of higher-type computability theory established by Kleene and Kreisel, which applies to (a large subset of) Haskell. In ASD you use a Sierpinski type (roughly corresponding to the unit type in Haskell) to define compact (universal quantifier) and overt (existential quantifier), rather than the booleans.
    This makes a big difference, because, as Andrej said, booleans classify clopen sets (=open-and-closed sets), whereas the Sierpinski space classifies open sets. If you want to avoid topology, clopen corresponds to decidable in computability theory, and open corresponds to semi-decidable. If you have Sierpinski valued universal and existential quantifiers for the same subset of a type, and moreover you have weak-parallel or, you can certainly construct a boolean valued quantifier. The converse is much trickier. The inputs of the boolean valued quantifier are decidable predicates. How do you construct from this quantifier two quantifiers that accept semi-decidable predicates as inputs? Well, this actually turns out to be possible by a results proved in previous work. You first show that every searchable subset turns out to be a computable image of the Cantor space (rather tricky, and useless, brute force algorithm, using the so-called Kleene-Kreisel density theorem). Now, the cantor space is both (effectively) Sierpinski-compact and Sierpinski-overt, and this property is preserved by computable images. So Andrej is right. (I was doubtful when he said that, and actually wrote to him saying that, postponing a reply to think a bit about that.) But notice that one direction of the proof depends on weak-parallel or, which cannot be defined in Haskell (unless you allow unsafePerformIO and concurrency).

    (4) Andrej said that I define compactness wrt clopens. I don’t. But, it turns out, a subset of a higher-type over the integers, is compact wrt opens iff it is compact wrt clopens. This is a crucial lemma in the development of the theory of searchability, but is not my definition of compactness (I take the usual definition).

    (5) Here “totality” is rather important. I have little clue what arbitrary searchable subsets (including partial elements) look like. It is not that I didn’t bother because partial elements tend to be uninteresting. It is rather than total elements, sets of total elements, and total functions turn out to have very good properties for the subject of searchability (and very bad properties for other considerations).

  4. Luke,

    One further comment. I like very much your concise way of introducing compactness into the picture, in particular the sentence “If we could figure out a way to form topological spaces on our data types, then we can know whether they are exhaustively searchable (and construct algorithms to search them).”.


  5. Martin, thanks for your comments. These topological ideas are getting clearer with every minute. :-)

    When I said “haskell ADTs”, I was implicitly assuming no arrows, thus polynomial functors, and “no strict fields” is the maximal notion of totality you referred to. Cool, maybe I’ll try to prove it using the surjection approach you outlined.

    The “unamb” package implements said parallel or.

    I get the impression that topology could be pretty important to software engineering once we have the tools to talk about it. That’s why I’m interested in this stuff.

  6. Here is is another way of defining the Cantor space.
    (I use the notation of my original post.)

    cantor’ :: S [Bool]
    cantor’ = image (False:) cantor’ `union` image (True:) cantor’

    If you unfold the definitions, you conclude that this is the same as Berger’s algorithm in the post “seemimgly impossible functional programs”.

    And here is yet another natural way of defining search of the lazy natural numbers:

    mu :: (Nat -> Bool) -> Nat
    mu p = if p Zero then Zero else Succ(mu(p.Succ))

    This is the minimization operator from recursion theory, except that you get infty rather than bottom.
    Then nat as you defined it is equal to (S mu).

    (Thus, mu is to nat as berger’s algorithm is to cantor’.)

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s