# Fun wth ΠΣ

I have somewhat irrationally chosen ΠΣ as the core language for Dana. I’ve been struggling to create a small, beautiful implementation. But I am getting closer and closer — it’s just about finding the right combination of representations so they fit together nicely — and my hope is that when I am finished the implementation will be under 300 lines or so (only for the bootstrap, once I get to unsafe compilation it will of course grow).

Meanwhile, I’ve been playing with the prototype implementation to get a feel for the language. It is very pleasant to work with, even without the “necessary” programmer niceties such as inference. The typechecker is partial, and it is occasionally annoying when it loops. But it is predictable, and as I gain more experience I am seeing how to avoid them, just as one avoids infinite loops when writing programs. The convenience gained by a parital typechecker more than makes up for this annoyance.

One of the things I played with is encoding sum types as product types with a church encoding. In my experiment it worked, so it’s possible that sigma types are not necessary at all. To warm up, here is a non-dependent encoding for pairs in Haskell.

type Pair a b = forall c. (a -> b -> c) -> c

pair :: a -> b -> Pair a b
pair x y cc = cc x y

-- Here's how we destruct
unpair :: Pair a b -> (a,b)
unpair p = p (\x y -> (x,y))


So the encoding I used was just the dependent analog of the same thing, which I will reveal shortly. For those familiar with Agda, you will feel right at home in ΠΣ’s syntax. For those not, the trick is the notation (x : A) -> B, which is just like the type A -> B in Haskell, but first names the argument of type A, so that B may depend on it.

Sigma : (A : Type) -> (A -> Type) -> Type.
Sigma = \A -> \P -> (C : Type) -> ((x:A) -> P x -> C) -> C.

Pair : (A : Type) -> (P : A -> Type)
-> (x : A) -> P x -> Sigma A P.


Look for the resemblance of the “Sigma =” line with the “type Pair a b” line above.

The encoding of Nat is straightforward (exactly the same as the encoding with ΠΣ’s built in sigma type):

Nat : Type.
Nat = Sigma {'Z,'S} (\tag ->
Case tag of
{ 'Z -> {'Unit}
| 'S -> Nat
}).


And using it is almost as convenient; there are a few extra necessary type annotations.

iterate : (A : Type) -> Nat -> (A -> A) -> (A -> A).
iterate = \A -> \n -> \f -> \x0 ->
n A (\tag -> \ns ->
Case tag of
{ 'Z -> x0
| 'S -> f (iterate A ns f x0)
}).


Notably, the “A” after “n” is absent when using the built-in syntax.

Meanwhile, Sigma types make a great encoding for modules. I will switch to the built-in syntax now, because it is a bit cleaner. It looks like this: (x : A ; B), which is just like the Haskell pair (A,B), except that it names the element of type A so that B may depend on it. Also, ; associates to the right, so we can make multi-field syntax.

So here is a first-class Nat module:

NatModule : Nat : Type
; zero : Nat
; succ : Nat -> Nat
; iterate : (A : Type) -> Nat -> (A -> A) -> (A -> A)
; {'Unit}.
NatModule = Nat, zero, succ, iterate, 'Unit.


So the core calculus already supports first-class modules (this is true of any dependently typed language). Throw a little syntax sugar on that and I might start to prefer this to Haskell.

Which is good, because the plan is to bootstrap in Haskell and then switch to all-PiSigma, until I can build a rudimentary Haskellish interpreter that compiles down to PiSigma.

I’m doing all this work because everything has to be first-class in Dana, because there is no compilation phase. It’s all tied together in one big jumble.

# A world without orphans

An orphan instance is where you have a situation like this:

module Foo where
class Foo a where
foo :: a

module Bar where
newtype Bar = Bar Int

module Baz where
instance Foo Bar where
foo = Bar 0


I.e., a class is instantiated with a type in a module where neither the class nor the type was declared. GHC issues a warning in this case, because it has to put that instance in a global table that is scanned for every compilation unit. In non-orphan cases, it knows that the instance will be in scope, because at the point of usage, both the class an the type are in scope, so the instance came with one of them.

What happens if we change that warning into an error? No orphan instances allowed!

Wonderful things, that’s what. The most notable, in my opinion, being that the very common desire for superclassing is now perfectly acceptable (modulo the many implementation concerns). For example:

class Additive a where
(^+^) :: a -> a -> a
instance (Num a) => Additive a where
(^+^) = (+)


The problem used to be that, let’s say someone made a type Foo and instantiated it as Num and as Additive. Which Additive instance do we use, the one here or the one they defined? This problem vanishes if we don’t allow orphan instances, because Additive Foo and Num Foo would have to be instantiated in the same place, when Additive is in scope, and thus this superclass instance is in scope. So the collision would be caught modularly at compile time (that is, it’s not when you just happen to use these two modules together that the universe explodes, it’s precisely where the problem occurs). The module that defined Foo would be informed that he is not allowed to define an Additive instance, because one was already defined for him.

# Existential Memoization

FRP has this essential trade-off of computation vs. garbage collection. I can come up with implementations which are lazy in their computation, but never forget anything; and I can come up with implementations which forget but are too eager in their implementation (in which garbage collection has too much effect on runtime performance, because we need to kill threads with weak references and stuff). But I can’t come up with an implementation which has both. And my current theory is — I never thought I’d say this — Haskell isn’t good enough!

This language + RTS extension is still in its early stages, and I’ll have to think and prototype more to hash out the details and verify its feasibility. But since I have decided to embark on the Dana project, I am now comfortable with relying on new language features and runtime to get the job done, which previously has not been an option.

Here’s the motivating example. You have a bunch of behaviors — functions from Time to a — which are defined progressively, i.e. they use earlier results to compute later ones. In the best case, we could model these as plain functions of Time, but of course that is not practical in Haskell. In that case, it is lazy computation that never forgets. My idea is to have enough support so that it can forget.

The reason it is allowed to forget is that Time itself is also progressive; i.e. if you have a reference to a some time t, you cannot get to any earlier times. So when the last reference to a time before t disappears, all behaviors can forget everything before time t.

But that’s not a very good solution by itself (unfortunate, because it can be accomplished in Haskell with weak references). The reason is that a single naughty value that holds on to an early time will ruin it for the rest of the program — no behaviors can forget anything after that time. How do we keep the effects more local. Well, when the garbage collector does its run, instead of marking objects as “alive” or “dead”, it can mark them as “needed after t”. So you might mark an object multiple times, with decreasing times each time you see it.

And of course there is nothing special about time. The extension ought to be general enough to handle “stuff like this”. I haven’t even thought about how to make a garbage collector that does this efficiently.

Alright, enough motivation. Onto the language design. I want to add support for ‘progressive folds’, and memoize them with garbage collection. Say you have this data type:

  data U s where
Succ :: U s -> U s


This looks inane, right? It is unpopulated. Or is it? We can fold a progressive function over this:

  foldU :: (b -> b) -> b -> exists s. (U s, (U s -> b))


I.e. we can define functions over differences of nodes, i.e. in terms of a starting point (which is kind of what the s type variable denotes).

Also, when a particular value U s disappears, we can purge that entry from the memo table and everything that goes along with it. We also know that references inside the scope of the existential eliminator can’t escape, which will end up giving us the time-like relationships I described.

I’m still thinking about the mathematical foundations of this. I.e. what does such an “unrooted” data type mean? This fold also feels less general than it could be; is it possible to introduce two unrooted functions at the same time, or use different recursion schemes, … etc.

# The Third Virtue

Ever since working with the Anygma team, the idea of a purely functional operating system has been stirring in my head: An operating system that stays denotational and composable all the way down to the device drivers. In this world, we have to make no sacrifices like IO to interface with the impure ideas around us. We can still communicate with the outside world by modeling them in a pure way.

Just to be clear: I will not be accepting comments about the feasibility of the idea itself (eg. don’t say “but the idea of an operating system is necessarily stateful” or any of that bull). Comments and criticisms about the approach I describe are more than welcome.

So this has been stirring away. Then at Thursday’s Boulder functional programmers group, somebody mentioned Squeak. I had seen Squeak a long time ago, and was amazed and impressed by the concept. There is no distinction between compile time and runtime, between source code and program. Everything is just objects floating around in the world. And of course this matches perfectly with the purely functional operating system idea.

It sucks to program with text. Programs are not text. Ultimately, programs are their denotation, but we need a representation. I say that representation is abstract syntax. Our program editors output abstract syntax, which the compiler turns into real objects. Text doesn’t have to enter the equation anywhere.

I don’t think this idea fits Haskell’s type system very well. We need something closer to dependent types, because types have to be objects floating around in the world too. So, while I’m at it, I think I’m going to make a pure language — pretty close to, but not exactly Haskell — which is the core abstraction language. Maybe something both expressive and barebones like PiSigma. But I’ll need a real-world language to go on top of that.

Hubris indeed.

So! What’s the large-scale plan? I am going to assume nobody will help me (other than to offer suggestions) in these early stages. This is a Grand Vision (TM), and I doubt anybody but me is going to get excited enough to contribute until it starts coming together at least a little. But I am completely excited, as this is the culmination of an idea that has been developing for years. Oh, and that was at the beginning of the large-scale plan because excitement is the top point in my book — engineering is secondary to motivation.

I don’t really know anything about drivers or hardware or any of that, and this undertaking is already way too huge for me. Therefore, I’m going to go the route of Squeak and implement an image-based sandbox — an OS inside an OS. I’ll start by choosing a core language and making a really slow proof-of-concept interactive text-based interpreter for it. At that point, many options open up, depending on what I’m inspired to do at any moment: improve the language frontend, work on graphical hooks, start work on a faster interpreter or JIT, start designing the FRP essentials, ….

As for the FRP essentials, why do I think they will be more possible in this new concept than in Haskell? Well, I have a sophisticated RTS extension that I think will handle that, which I will blog about riiiight… now! (And there are a few ways to compromise in case that idea doesn’t pan out)

Oh, btw, I’m calling this project Dana, after da man Dana Scott.

For quite a while, a package called MonadRandom by Cale Gibbard has been available on Hackage. At first it seems like just a little silly convenience module. But I argue that it should be used instead of System.Random whenever possible, not just when it’s more convenient. Here’s why:

What’s a RandomGen?

    class RandomGen g where
next :: g -> (Int, g)
split :: g -> (g,g)
genRange :: g -> (Int,Int)


It’s this thing… this thingamajigger, if you prefer, which has a next, a split, and a genRange. Cool, that’s great. Um… so, what are those things? What do they mean? And what do they have to do with randomness? next. NEXT? That doesn’t seem very random!

MonadRandom is very humble in its self-description:

Computation type: Computations which consume random values.

Binding strategy: The computation proceeds in the same fashion as the identity monad, but it carries a random number generator that may be queried to generate random values.

But it means much more than it says. Rand a is not merely an “identity monad carrying a random number generator” (a random number generator being one of those what-the-heck-is-thats above). Rand a is a probability distribution of a’s! Which is simple, meaningful and easy to reason about!

We can model it like so:

    Rand a = a -> Real
fmap f p x = ∑ p[f-1[{x}]]
return x = δ(x)
(p >>= f) x = ∑y p(y) f(y)(x)


With getRandom etc. being standard distributions on various data types.

As long as your computations are in Rand (I don’t give a hoot about RandT), you can use this semantics for precise, mathematical reasoning. What did you have when you were using RandomGen?

# Ha! I can’t even get Events right

No wonder FRP has been so difficult. In going back to basics, forgetting about behaviors and just doing Events, I realize that my belief that Events are “the easy part” has been flawed. Neither of the semantics I’ve had in mind are both implementable and appropriate. Or rather, they both are, but neither of them can do as much as I thought they could.

I will leave my previously preferred semantics of Event a = Waiter a = Time -> (Time,a), for annoying practical reasons (it also violates intuition about what an event is).

The reactive semantics of Event, Event a = [(Time,a)], is no good because there is always a least occurrence. This violates relativity and intuition (relativity being an intuitive concept I’ve developed regarding the lack of a “global start time”).

So I’m left with Event a = Set (Time, a), or Set (Time, Set a) to handle multiple occurrences at the same time. I am actually inclined to resort to the former, and require a Monoid instance on a to merge. Anyway: what’s the problem with this one? No temporal operations are allowed on it, because it must be observant. Consider mouseClick :: Event Point, the event of mouse clicks. There may be many mouse clicks before the program started, before we were listening for them. If we then sequence mouseClick with some other event, then we have to know about those earlier mouse clicks to detect it. Also consider everySecond :: Time -> Event (), which occurs every second before and after the given time. There are infinitely many occurrences of this event, and so there are infinitely many simultaneous occurrences of everySecond t >> everySecond t. We can not possibly handle all of them.

I know I didn’t motivate that very well, but hopefully you get the idea. If an Event is a set of occurrences, then we must be able to see all of them, including ones that depend on events that happened when we weren’t watching.

I conclude that Event is not a Monad or even Applicative, and all we have is the following interface:

    data Event a
instance Functor Event
instance (Monoid a) => Monoid (Event a)
filter :: (a -> Bool) -> Event a -> Event a
withTime :: Event a -> Event (Time,a)


Actually, as far as primitives go, I prefer this to filter:

    partition :: Event (Either a b) -> (Event a, Event b)


But they are definable in terms of each other, so it really doesn’t matter except for efficiency concerns. Look how partition is dual to zip. Isn’t that cool?

But in exchange for meaning and implementability, we have given up much of Event’s usefulness. Note that we can no longer even construct the event of double clicks. That makes sense to me, though, because what if the first of the two occurred before we were watching?

I can think of a bunch of ways to get back bits and pieces of the usefulness — EventParsers and the like — and I’m still experimenting with which will be the most effective.

# Ridding ourselves of IO before there is a good replacement

I have been researching FRP for about a year now without finding or discovering an implementation I like. I don’t know if there is one — maybe my standards are just too high. However, it has initiated me into the more general interest of purely functional foundations.

Since I am tired of thinking and want to start hacking again, I am abandoning FRP proper for the time being. But I am continuing in spirit, by continuing with foundations. I will be developing this project in the Frag repository, where I keep my FRP stuff.

The idea of the Frag project is simple: get back to making the software I want to make — mostly games — but shun IO. I.e. the content of these will only be built on semantically meaningful abstractions (which are in turn implemented on IO, of course), but without concern for making it a “perfect” FRP implementation. Said abstractions will go in the Frag project. The main point is to get away from the world of fuzzy, complex operational semantics, replacing it lazily with simple denotational ones.

But this is a foundational project, meaning that once the foundations are replaced, I allow myself to do arbitrary software engineering on top of it. I think this is an important point.

Currently in the repository is a simple Event abstraction. I’m seeing how much I can write with just that, without worrying about modeling conceptually continuous things (like mouse position). Secretly my plan is to accidentally come across a good interaction model which boils down to Event, and is thus inherently meaningful (simplicity can be refined out later).

I’m having trouble foreseeing how I will model eg. network interaction using Events. Graphical games are pretty easy, since conceptually they always exist, and you’re just choosing to view them. I can’t see how network fits into a model like this, since a network program can have side-effects on another computer. Can anyone think of a way that network interactions are about being instead of doing?

# Domain Convergence

Okay, how is this for a definition of convergence:

An infinite sequence xs :: [()] converges if there exists an n such that for every m > n, xs !! m = xs !! n. In other words, this is a bunch of programs which either halt or don’t, and after some point, either they all halt or they all don’t.

Then an infinite sequence xs :: [a] converges if map f xs converges for every f :: a -> ().

Sound good?

# Continuous Stream Fusion

I’ve been able to sketch-implement a version of FRP with a strange distinction: Env vs. Behavior. I’ll give you the intuition, then the rest of the article will develop the semantics more fully. I thought “backwards” to discover this (the word “discover” used under the assumption that it has some merit: questionable), from implementation, to concept, to semantics.

The semantics of Env a are plain ol’ Time -> a, but with a caveat: there is no reactivity allowed. I.e. an Env a yields an a only by looking at stuff “now”. In the implementation, this is just a subtype of IO. The semantics of Behavior are even less defined. I guess it’s semantics are Time -> Time -> a, so it is given a “start time” and returns a time function. It’s sortof like a relative behavior, but shifting it around does not exactly preserve behavior. The implementation is much like Reactive (Env a), where Reactive is from the popular reactive library. But through this exploration I am led to a very different implementation.

To formalize this silliness, first we need to add some composability and define these as transformations. Let’s just fix an e for the whole discussion, and then say the meaning of Env a is (a subtype of) (Time -> e) -> (Time -> a). Now it’s easy to say what we mean by not allowing reactivity: an Env a is a function f of the form f r = g . r for some g. So to determine the value of f at time t, you look at its environment at time t and do something to the result. No peeking at other points in its environment.

For Behavior, I want a scanl, but on continuous functions. You can think of this as a generalization of integration. For a continuous scan to be meaningful, I require its argument to be a continuous function f of type Real -> a -> a, such that f 0 = id. By continuity I mean Scott-continuity: treat Real as an infinite-precision computable real, and then any definable total function is continuous.

Then I can find the integral function ∫f on the interval [0,1] by computing the limit:

  f 0.5  .  f 0.5
f 0.25  .  f 0.25  .  f 0.25  .  f 0.25
...


There is no guarantee that this will converge, or uniformly converge, or any of that good stuff. But we will cover our ears and pretend that it always does.

If f :: Real -> Bool -> Bool, once our precision goes past the modulus of uniform continuity of f (the precision of the information that f will never look at to determine what to do), then ∫f will be constant. So there is some notion of continuity preserved.1

Okay, now we can say what a Behavior is! The meaning of Behavior a is a transformation f :: (Time -> e) -> Time -> Time -> a of the form — hold your breath (I’ll explain) — f r t0 = i . ∫ (j r) (k (r t0)), for functions i,j,k, where j is an Env function with the scan restriction j 0 = id.

Let’s dissect: First, k (r t0) is the initial value when the Behavior is started, which depends on the environment exactly when the Behavior is started and no other time. j r is the integrand, which takes into account the “previous” value and the current environment (only current because j must be an Env function). Finally, the i . out front is so we can compute things more complex than their final type; i.e. a Behavior Bool could have more state under the hood than just a boolean.

This definition is very much like Haskell stream fusion; we take a recursive definition and “flatten it out” into a state and an iterator function. By imposing a simple constraint on the iterator function, we have retrieved continuity of the generated stream, and can also perform continuous stream fusion in the implementation!

I still haven’t figured out how to work events back into this model.

1 I’m committing a blatant foul here. Bool -> Bool is a discrete space, so any continuous function from Real to it has to be constant to begin with. So that paragraph was pretty much meaningless. I still liked it to get a computational handle on things.

# Compact data types

Join me, fellow readers, as I learn some topology by pretending I know something about it.

I recently uploaded Martin Escardó’s infinite search monad to hackage, under the name infinite-search. It is a delightful little thing to play with, and has inspired some deeper thoughts about the connection between computation and topology.

Quick refresher on topology: a topological space is a set X together with a collection of subsets of X, called the open subsets. The open subsets must include the empty set and the entirety of X itself, and must be closed under arbitrary unions and finite intersections.

So for example, you can generate the standard topology on the real line by taking all the open intervals (a,b) and closing them under arbitrary unions. So for example, the set $\cup \{ (a-\frac{1}{4},a+\frac{1}{4}) | a \in \mathbb{Z} \}$, which is ball of radius 1/2 around every integer, is open. No solitary points are open; every point must have an arbitrarily small cloud of nearby points.

As another example, there is a topology on the natural numbers called the discrete topology, in which you consider every singleton set to be an open set and take arbitrary unions over those. In this topology, every subset of the naturals is open.

A space is compact if “every open cover has a finite subcover”. That means for any collection S of open sets whose union is the entire space, there is a finite collection $S' \subseteq S$ whose union is also the entire space. Obviously all finite topologies are compact, because there are no infinite collections to begin with.

So the discrete topology on the naturals I talked about is not compact. Consider this cover: $\{ \{n\} | n \in \mathbb{N} \}$; just all the singleton open sets. This obviously has no finite subcollection which also unions to the entire set of naturals (if you leave out even one of these sets it fails to be a cover).

Okay, on to computation. Roughly what Martin Escardó’s work says is that compact spaces can be exhaustively searched. That is, if S is a compact space, then given a total function p :: S -> Bool, we can either compute an element of S satisfying p or compute that there is no such element. If we could figure out a way to form topological spaces on our data types, then we can know whether they are exhaustively searchable (and construct algorithms to search them).

A domain is the set of all elements of a data type, including partial ones (with ⊥s in them). So the domain for Bool is { ⊥, True, False }, and the domain for Either () is { ⊥, Left ⊥, Left (), Right ⊥, Right () }. To construct a topology (the Scott topology?) for the type, generate it by the sets of all compatible elements with some finite element of the domain. So the open sets for our Either () would be { { Left (), Right () }, { Left () }, { Right () } }. We also toss in the empty set, since it is the union of zero of these.

Earlier we saw that the naturals were not compact. This is a good sign, because if we could exhaustively search the naturals, then we could write an algorithm to eg. decide the twin-prime conjecture. However, I will now show that the lazy naturals are compact.

The lazy naturals are the data type:

data Nat = Zero | Succ Nat


What is the set of elements of this domain, and what is the topology? The set of elements is just the regular naturals plus the special one let infinity = Succ infinity. The topology has all the regular finite naturals as singleton sets, just like the discrete topology, but it also has subsets generated by { ⊥, Succ ⊥, Succ (Succ ⊥), … }. That is, sets of all of them greater than some natural including infinity. The key is that the singleton { infinity } itself is not open, it only comes from one of these infinite sets.

Now let’s say you have a cover for Nat. It includes infinity, so it must include the a set { n | n > k for some k }. So we pick that one, and then we pick however many sets we need to cover all the naturals less than k; there will be at most k of them. So we have constructed a finite subcover.

So regular naturals are not compact, but Nat is. So regular naturals are not searchable, but Nat is. But we got Nat only by adding to the regular naturals; how did we make it searchable in the process?

There was a little snag in the definition of searchable above: we needed a total predicate. Many functions which we would not be able to search on the regular naturals would get into an infinite loop if we passed the function infinity, and that would mean the predicate is not total. Intuitively, that means we can only search on predicates which are bounded in the size of the natural; which stop looking at some point.

I conjecture that every ADT in Haskell (without strict fields) is compact. It is pretty easy to write combinators which construct Data.Searchable Sets for any ADT; whether they are proper compact Sets (i.e. the search function is in fact total) is another issue.

So that’s it. We can make a compact set of Naturals and exhaustively search them. For fun, here’s a little Haskell implementation demonstrating this:

import Data.Searchable
data Nat = Zero | Succ Nat

nat :: Set Nat
nat = singleton Zero union fmap Succ nat

-- decidable equality on functions from Nat!
eqnatf :: (Eq a) => (Nat -> a) -> (Nat -> a) -> Bool
eqnatf f g = forevery nat \$ \n -> f n == g n


Go ahead, write a Num instance for Nat and play around. Just remember that your predicates need to halt even when given infinity, otherwise they are not total. In particular, this means that you can’t use eqnatf on Nat -> Nat, because equality on Nat itself is not total (infinity == infinity loops).