The Plan
Last September, I decided that it was time to get a programming job again. After two months of trying to find paid work (of any kind, $10 would have been great!) as a composer, I realized that it’s really hard. There are a lot of people willing to work for free, and without much of a scoring portfolio (as opposed to the “pure music” I do) I have no way to distinguish myself to the studios that have a budget. Also, a lot of games want orchestral scores, and I don’t have the hardware and software I need to make convincing-sounding synthetic orchestral scores. Also, I’m sure once I get the necessary hardware and software, I will need time to practice with it. In short, I needed money and time. I am extremely fortunate to have, in my free-flowing way, stumbled onto a skill that is valued by the economy, and so I decided it was once again time to utilize that skill to achieve my other goals. I planned to live reasonably cheaply, save up money so that I can buy equipment and support myself for enough time to build up a portfolio by doing free projects.
Now I have been programming for Clozure for almost six months. As far as jobs go, it’s great. I get to work in my favorite language, Haskell, and they give me enough freedom to experiment with designs and come up with solutions that not only work, but that I would even consider good. My fear of programming jobs was based on having jobs where I constantly have to compromise my values, either by working in crappy languages or on startup-style timelines where there is no time to lose. With this job, I feel reunited with my love of software, and my inspirations for developer support tools have been once again ignited.
And so I have amended the plan: after I have saved enough money to support myself for several years, I will not only attempt to bootstrap a career composing, but dedicate my current work week to making a reality the software ideas which have been floating around in my head for half a decade. This prospect really excites me — the reason I have not been able to make my ideas is mostly the time pressure: there’s was always something else I should be doing, and so I always felt guilty working on my pet projects. I wonder, what am I capable of if my pet projects are the main thing?
I want to revive CodeCatalog. Max and I lost steam on that project for a number of reasons.
- Due to family pressure, I returned to school.
- I fell in love with a girl and got my heart all broken. That can be kind of a downer.
- The priorities of the project compromised my vision. We were attempting to use modern wisdom to make the project successful: first impressions and intuitive usability came first. Our focus was on making it pretty and satisfying to use (which took a long time since neither of us were experienced web front-end developers), and that required me to strip off the most interesting parts of the project because noobs wouldn’t immediately understand it.
So I want to re-orient (3) to make it more satisfying for me. I want to allow myself to make the large strides that I envisage rather than baby-stepping toward success — to encourage myself to use my own talents in design and abstraction rather than trying to be a front-end person, to emphasize the exciting parts (what Audrey Tang calles -Ofun). By funding myself, I will not feel the guilt that comes with working on a project at the same time as (1). I can do no more than hope that something like (2) doesn’t happen. (I have a wonderful, stable and supportive relationship right now, so if that continues, that’d cover it :-)
I have many ideas; the reason I want to return to CodeCatalog in particular is mainly because I have identified most of my ideas as aspects of this project. My specific fancies change frequently (usually to things I have thought about before but never implemented), and so by focusing on this project in a researchy rather than producty way, I can entertain them while still working toward a larger goal and eventually benefitting the community.
Here is a summary of some ideas that fit in the CodeCatalog umbrella (just because I’m excited and want to remember):
- Inter-project version control — I have always been frustrated by the inability of git and hg to merge two projects while still allowing interoperation with where they came from. The “project” quantum seems arbitrary, and I want to globalize it.
- Package adapters — evolving the interface of a package without breaking users of the old interface by rewriting the old package in terms of the new one. There is a great deal that can be done automatically in this area with sufficient knowledge about the meaning of changes. I talked with Michael Sloan about this some, and some of the resulting ideas are contained in this writeup.
- Informal checked documentation — documenting the assumptions of code in a machine-readable semi-formal language, to get the computer to pair-program with you (e.g. you write a division x/y and you have no y /= 0 assumption in scope, you’d get a “documentation obligation” to explain in english why y can’t be 0).
- Structural editing — coding by transforming valid syntax trees. Yes it’d be cool, but the main reason it’s compelling to me is in its synergy with other features. Once you have the notion of focusing on expressions, holes with contextual information (a la Agda), semi-automatic creation of package and data-type adapters, smarter version control (e.g. a change might rename all references to an identifier, even the ones that weren’t there when the change was made) all come as natural extensions to the idea.
I think the challenge for me will be to focus on one of these for long enough to make it cool before getting distracted by another. My plan for that is to set short-term goals here on my blog and use it to keep myself in check. I am considering involving other people in my project as a way to keep myself focused (i.e. maybe I can make a little mini-kickstarter in which my devotees can pledge small amounts in exchange for me completing a specific goal on time).
This is all 12-18 months away, which feels like a long time, but in the grand scheme is not that long in exchange for what I see as the potential of this endeavor. I’m just excited and couldn’t help but to think about it and get pumped up. Thanks for reading!
Oh, despite the date, this is totally not an April Fools joke (as far as I know ;-).
Constructions on Typeclasses, Part 1: F-Algebras
This post is rendered from literate Haskell. I recommend doing the exercises inline, so use the source.
> {-# LANGUAGE DeriveFunctor
> , DeriveFoldable
> , DeriveTraversable
> , TypeOperators #-}
>
> import Control.Applicative
> import Data.Foldable
> import Data.Traversable
Certain kinds of typeclasses have some very regular instances. For example, it is obvious how to implement (Num a, Num b) => Num (a,b) and (Monoid a, Monoid b) => Monoid (a,b), and similarly if F is some applicative functor, (Num a) => Num (F a) and (Monoid a) => (Monoid F a) are obvious. Furthermore, these instances (and many others) seem to be obvious in the same way.
(+) a b = (+) <$> a <*> b
mappend a b = mappend <$> a <*> b
fromInteger n = pure (fromInteger n)
mempty = pure mempty
And take them on pairs:
(x,x') + (y,y') = (x + y, x' + y')
(x,x') `mappend` (y,y') = (x `mappend` y, x' `mappend` y')
fromInteger n = (fromInteger n, fromInteger n)
mempty = (mempty , mempty)
It would be straightforward for these cases to derive the necessary implementations from the type signature. However, it would be nice if there were a more abstract perspective, such that we didn’t have to inspect the type signature to find the operations – that they could arise from some other standard construction. Further, it is not quite as obvious from the the type signature how to automatically instantiate methods such as
mconcat :: (Monoid m) => [m] -> m
without making a special case for [], whereas hopefully a more abstract perspective would inform us what kinds of type constructors would be supported.
In this post, we will see such an abstract perspective. It comes from (surprise!) category theory. I disclaim that I’m still a novice with category theory (but in the past few weeks I have gained competence by studying). So we will not get very deep into the theory, just enough to steal the useful concept and leave the rest behind. I welcome relevant insights from the more categorically educated in the comments.
F-Algebras
The unifying concept we will steal is the F-algebra. An F-algebra is a Functor f and a type a together with a function f a -> a. We can make this precise in Haskell:
> type Algebra f a = f a -> a
I claim that Num and Monoid instances are F-algebras over suitable functors. Look at the methods of Monoid:
mempty :: m
mappend :: m -> m -> m
We need to find a functor f such that we can recover these two methods from a function of type f m -> m. With some squinting, we arrive at:
> data MonoidF m > = MEmpty > | MAppend m m > > memptyF :: Algebra MonoidF m -> m > memptyF alg = alg MEmpty > > mappendF :: Algebra MonoidF m -> (m -> m -> m) > mappendF alg x y = alg (MAppend x y)
Exercise 1: work out the functor NumF over which Num instances are F-algebras, and write the methods of Num in terms of it.
Exercise 2: for each of the standard classes Eq, Read, Show, Bounded, and Integral, work out whether they are expressible as F-algebras. If so, give the functor; if not, explain or prove why not.
Exercise 3: write a function toMonoidAlg which finds the MonoidF-algebra for a given instance m of the Monoid class.
Combining Instances
Motivated by the examples in the introduction, we can find the “instance” for pairs given instances for each of the components.
> pairAlg :: (Functor t) => Algebra t a -> Algebra t b -> Algebra t (a,b) > pairAlg alga algb tab = (alga (fmap fst tab), algb (fmap snd tab))
Also, we hope we can find the instance for an applicative functor given an instance for its argument
applicativeAlg :: (Functor t, Applicative f)
=> Algebra t a -> Algebra t (f a)
but there turns out to be trouble:
applicativeAlg alg tfa = ...
We need to get our hands on an t a somehow, and all we have is a t (f a). This hints at something from the standard library:
sequenceA :: (Traversible t, Applicative f) => t (f a) -> f (t a)
which indicates that our functor needs more structure to implement applicativeAlg.
> applicativeAlg :: (Traversable t, Applicative f) > => Algebra t a -> Algebra t (f a) > applicativeAlg alg tfa = fmap alg (sequenceA tfa)
Now we should be able to answer the query from the beginning:
Exercise 4: For what kinds of type constructors c is it possible to automatically derive instances for (a) pairs and (b) Applicatives for a typeclass with a method of type c a -> a. (e.g. mconcat :: [a] -> a). Demonstrate this with an implementation.
Combining Classes
Intuitively, joining the methods of two classes which are both expressible as F-algebras should give us another class expressible as an F-algebra. This is demonstrated by the following construction:
> data (f :+: g) a = InL (f a) | InR (g a) > deriving (Functor, Foldable, Traversable) > > coproductAlg :: (Functor f, Functor g) > => Algebra f a -> Algebra g a -> Algebra (f :+: g) a > coproductAlg falg _ (InL fa) = falg fa > coproductAlg _ galg (InR ga) = galg ga
So now we can model a subclass of both Num and Monoid by type NumMonoidF = NumF :+: MonoidF.
Exercise 5: We hope to be able to recover Algebra NumF a from Algebra NumMonoidF a, demonstrating that the latter is in fact a subclass. Implement the necessary function(s).
Exercise 6: Given the functor product definition
> data (f :*: g) a = Pair (f a) (g a) > deriving (Functor, Foldable, Traversable)
find a suitable combinator for forming algebras over a product functor. It may not have the same form as coproduct’s combinator! What would a typeclass formed by a product of two typeclasses interpreted as F-algebras look like?
Free Constructions
One of the neat things we can do with typeclasses expressed as F-algebras is form free monads over them – i.e. form the data type of a “syntax tree” over the methods of a class (with a given set of free variables). Begin with the free monad over a functor:
> data Free f a > = Pure a > | Effect (f (Free f a)) > deriving (Functor, Foldable, Traversable) > > instance (Functor f) => Monad (Free f) where > return = Pure > Pure a >>= t = t a > Effect f >>= t = Effect (fmap (>>= t) f)
(Church-encoding this gives better performance, but I’m using this version for expository purposes)
Free f a can be interpreted as a syntax tree over the typeclass formed by f with free variables in a. This is also called an “initial algebra”, a term you may have heard thrown around in the Haskell community from time to time. We demonstrate that a free construction over a functor is a valid F-algebra for that functor:
> initialAlgebra :: (Functor f) => Algebra f (Free f a) > initialAlgebra = Effect
And that it is possible to “interpret” an initial algebra using any other F-algebra over that functor.
> initiality :: (Functor f) => Algebra f a -> Free f a -> a > initiality alg (Pure a) = a > initiality alg (Effect f) = alg (fmap (initiality alg) f)
Exercise 7: Give a monoid isomorphism (a bijection that preserves the monoid operations) between Free MonoidF and lists [], ignoring that Haskell allows infinitely large terms. Then, using an infinite term, show how this isomorphism fails.
Next time: F-Coalgebras
How GADTs inhibit abstraction
Today I want to talk about this snippet:
This program ought to be well-behaved — it has no recursion (or recursion-encoding tricks), no undefined or error, no incomplete pattern matches, so we should expect our types to be theorems. And yet we can get inconsistent. What is going on here?
Exercise: Identify the culprit before continuing.
The problem lies in the interaction between GADTs and generalized newtype deriving. Generalized newtype deriving seems to be broken here — we created a type B which claims to be just like A including instances, but one of A‘s instances relied on it being exactly equal to A. And so we get a program which claims to have non-exhaustive patterns (in unSwitchB), even though the pattern we omitted should have been impossible. And this is not the worst that generalized newtype deriving can do. When combined with type families, it is possible to write unsafeCoerce. This has been known since GHC 6.7.
In this post I intend to explore generalized newtype deriving and GADTs more deeply, from a more philosophical perspective, as opposed to just trying to plug this inconsistency. There are a few different forces at play, and by looking at them closely we will see some fundamental ideas about the meaning of types and type constructors.
Generalized newtype deriving seems reasonable to us by appealing to an intuition: if I have a type with some structure, I can clone that structure into a new type — basically making a type synonym that is a bit stricter about the boundaries of the abstraction. But the trouble is that you can clone parts of the structure without other parts; e.g. if X is an applicative and a monad, and I declare newtype Y a = Y (X a) deriving (Monad), then go on to define a different Applicative instance, I have done something wrong. Monad and applicative are related, so you can’t just change them willy nilly as though they were independent variables. But at the very least it seems reasonable that you should be able to copy all the structure, essentially defining a type synonym but giving it a more rigorous abstraction boundary. But in Haskell, this is not possible, and that is because, with extensions such as GADTs and type families, not all of a type’s structure is clonable.
I’m going to be talking a lot about abstraction. Although the kind of abstraction I mean here is simple, it is one of the fundamental things we do when designing software. To abstract a type is to take away some of its structure. We can abstract Integer to Nat by taking away the ability to make negatives — we still represent as Integer, but because the new type has strictly fewer operations (it must be fewer — after all we had to implement the operations somehow!) we know more about its elements, and finely tuning that knowledge is where good software engineering comes from.
When implementing an abstraction, we must define its operations. An operation takes some stuff in terms of that abstraction and gives back some stuff in terms of that abstraction. Its implementation must usually use some of the structure of the underlying representation — we define addition on Nat by addition on Integer. We may take it for granted that we can do this; for example, we do not have trouble defining:
sum :: [Nat] -> Nat
even though we are not given any Nats directly, but instead under some type constructor ([]).
One of the properties of type constructors that causes us to take this ability to abstract for granted is that if A and B are isomorphic (in a sense that will become clear in a moment), then F A and F B should also be isomorphic. Since we, the implementers of the abstraction, are in possession of an bijection between Nats and the Integers that represent them, we can use this property to implement whatever operations we need — if they could be implemented on Integer, they can be implemented on Nat.
This isomorphism property looks like a weak version of saying that F is a Functor. Indeed, F is properly a functor from a category of isomorphisms in which A and B are objects. Every type constructor F is a functor from some category; which category specifically depends on the structure of F. F's flexibility to work with abstractions in its argument is determined by that category, so the more you can do to that category, the more you can do with F. Positive and negative data types have all of Hask as their source category, so any abstractions you make will continue to work nicely under them. Invariant functors like Endo require bijections, but fortunately when we use newtype to create abstractions, we have a bijection. This is where generalized newtype deriving gets its motivation -- we can just use that bijection to substitute the abstraction for its representation anywhere we like.
But GADTs (and type families) are different. A functor like Switch b has an even smaller category as its domain: a discrete category. The only thing which is isomorphic to A in this category is A itself -- whether there is a bijection is irrelevant. This violates generalized newtype deriving's assumption that you can always use bijections to get from an abstraction to its representation and back. GADTs that rely on exact equality of types are completely inflexible in their argument, they do not permit abstractions. This, I claim, is bad -- you want to permit the user of your functor to make abstractions.
(Aside: If you have a nice boundary around the constructors of the GADT so they cannot be observed directly, one way to do this when using GADTs is to simply insert a constructor that endows it with the necessary operation. E.g. if you want it to be a functor from Hask, just insert
Fmap :: (a -> b) -> F a -> F b
If you want it to be a functor from Mon (category of monoids), insert:
Fmap :: (Monoid n) => MonoidHom m n -> F m -> F n
(presumably F m already came with a `Monoid` dictionary). These, I believe, are free constructions -- giving your type the structure you want in the stupidest possible way, essentially saying "yeah it can do that" and leaving it to the consumers of the type to figure out how.)
In any case, we are seeing something about GADTs specifically that simple data types do not have -- they can give a lot of different kinds of structure to their domain, and in particular they can distinguish specific types as fundamentally different from anything else, no matter how similarly they may behave. There is another way to see this: defining a GADT which mentions a particular type gives the mentioned type unclonable structure, such that generalized newtype deriving and other abstraction techniques which clone some of a type's structure no longer succeed.
DI Breakdown
I’m having a philosophical breakdown of the software engineering variety. I’m writing a register allocation library for my current project at work, referencing a not-too-complex algorithm which, however, has many degrees of freedom. Throughout the paper they talk about making various modifications to achieve different effects — tying variables to specific registers, brazenly pretending that a node is colorable when it looks like it isn’t (because it might work out in its favor), heuristics for choosing which nodes to simplify first, categorizing all the move instructions in the program to select from a smart, small set when the time comes to try to eliminate them. I’m trying to characterize the algorithm so that those different selections can be made easily, and it is a wonderful puzzle.
I also feel aesthetically stuck. I am feeling too many choices in Haskell — do I take this option as a parameter, or do I stuff it in a reader monad? Similarly, do I characterize this computation as living in the Cont monad, or do I simply take a continuation as a parameter? When expressing a piece of a computation, do I return the “simplest” type which provides the necessary data, do I return a functor which informs how the piece is to be used, or do I just go ahead and traverse the final data structure right there? What if the simplest type that gives the necessary information is vacuous, and all the information is in how it is used?
You might be thinking to yourself, “yes, Luke, you are just designing software.” But it feels more arbitrary than that — I have everything I want to say and I know how it fits together. My physics professor always used to say “now we have to make a choice — which is bad, because we’re about to make the wrong one”. He would manipulate the problem until every decision was forced. I need a way to constrain my decisions, to find what might be seen as the unique most general type of each piece of this algorithm. There are too many ways to say everything.
Computably Uncountable
We are all familiar with Cantor’s diagonal argument that proves there exist infinite sets which are “larger” than the set of natural numbers. In this post I will show that we can express this argument in the form of a program, thus showing that there are countable sets which are “computably uncountable”.
I begin with the program itself:
type Cantor = Nat -> Bool diagonal :: (Nat -> Cantor) -> Cantor diagonal cs n = not (cs n n)
Cantor is “the cantor space”, the type of infinite sequences of booleans. We will call such an infinite sequence “a Cantor“. There are clearly infinitely many Cantors; e.g. take the range of this function which gives False at every position except the one specified:
unit :: Nat -> Cantor unit m n = m == n
diagonal is (Georg) Cantor’s diagonal argument written as a program — it takes an alleged sequence of all Cantors, and returns a Cantor which does not occur in the sequence, by construction. This function shows by contradiction that we cannot put Cantors in 1 to 1 correspondence with naturals, and thus that there are more Cantors than there are naturals.
So how many Cantors are there? Since Nat -> Bool is a Haskell type — the type of computable functions from Nat to Bool — Cantors must be representable by programs. We can encode programs as numbers by treating their source code as base-128 numbers. Hence, there are no more Cantors than naturals, and so Cantors can be put into 1 to 1 correspondence with naturals.
Wait — what? There are more Cantors than Nats, but they both have the same size? Something is wrong. Indeed, in the process of this argument we have asserted both
- “We cannot put Cantors in 1 to 1 correspondence with naturals”
- “Cantors can be put into 1 to 1 correspondence with naturals”
We clearly can’t have both.
I
The erroneous statement is (2). It is undecidable whether a given program represents a Cantor. If the nth Cantor is ⊥ at n, then diagonal will fail: diagonal cs n = not (cs n n) = not ⊥ = ⊥. Because ⊥ is a fixed point of not, diagonal cannot return an element different from the one it was given. Thus for diagonal to work, we must require that Cantors be fully-defined — no infinite loops!
With this requirement, we can no longer put Cantors in 1 to 1 correspondence with the naturals, because we would have to solve the halting problem. It is not enough that the type of the term is a Cantor, it now must be fully defined for all inputs, and determining that given arbitrary source code is an undecidable problem.
II
The erroneous statement is (1). Cantors are computable functions, so as we have argued, they have the same cardinality as the naturals. There are no more programs than numbers, so by the definition of equal cardinality we can put them in 1 to 1 correspondence with a function.
The problem with (1) occurs because diagonal takes as its first argument not an arbitrary sequence of Cantors, but a computable sequence of Cantors. If cs is not computable, then neither is diagonal cs (for we no longer have cs‘s source code with which to construct it), and Cantors are defined to be computable sequences. So diagonal fails to contradict our bijection.
III
The erroneous statement is (2). Section II claims to put Cantors and naturals in 1 to 1 correspondence, but it is lying. Suppose Section II is formulated with respect to some axiom system A. If it were “telling the truth”, we would expect there to be some term f in the language of A such that for every fully defined Cantor program c, there is some natural number n such that we have (i.e. it is a theorem of A that f(1 + 1 + … + 1) = (source code of c)).
Let’s suppose we have written down the axioms of A into a Haskell program, and we have a (partial) function proofSearch :: Nat -> Cantor, which, given a number n, searches for theorems of the form and compiles and returns the first such c it finds. In the case that there is no such statement, it just runs forever; similarly for the case that c fails to compile. Although cumbersome to write, I’m sure we agree that this is possible to write. If section II is not lying, then we expect that for every natural n, proofSearch n does in fact return a valid Cantor.
Now, let us return to familiar lands with a new program:
evidence :: Cantor evidence = diagonal proofSearch
Oh my! If section II is the truth, then proofSearch is a total, computable function of type Nat -> Cantor, which we can pass to diagonal to find a Cantor that it missed! So it must have been lying, either (1) about its function f finding every possible Cantor or (2) about it actually possessing such a function (i.e. it “proved” that there is such a function, but it couldn’t actually represent it). In either case, it did not actually create a 1 to 1 correspondence between the naturals and Cantors.
IV
Left as an exercise for the reader.
Which one is it really?
Sentences and Paradigms
As our understanding progresses, what once were rigid distinctions tend to become blurred. Hence, I am fascinated by the pervasiveness and stability of the three programming language paradigms: imperative, functional (here I mean that word in the pure sense, not HOFs sense — Conal Elliott would probably have me call it denotational), and logical. Whole languages are beginning to blur their orientation, but specific solutions to problems are still typically classifiable along these lines nonetheless. We will see how the three are linked to forms human communication and the style of our discourse — and thus why these distinctions are so stable.
Each paradigm corresponds to a syntactic unit of language. The imperative, as the name suggests, corresponds to sentences in the imperative mood:
def bfs(predicate, root):
queue = Queue()
queue.add(root) # First, add the root to the queue.
while not queue.empty(): # While the queue is not empty, do the following:
node = queue.dequeue() # Take a node off the queue.
if predicate(node): # If it satisfies the predictate,
return node # return it.
for n in node.children(): # Otherwise,
queue.add(n) # add each of its children to the queue.
The program is written as a recipe, telling the computer what to do as if a servant. Note that, after qualifying phrases, each sentence begins with an action word in the imperative form, just as this sentence begins with “note”. The object-oriented aspect adds a sort of directorial role to the program, wherein the program is read not as the user telling the computer what to do, but the program telling objects in the program what to do. Sentences are still written in the imperative mood, they can now be directed: “queue, give me an element”, “handle, write down this line.”
But not every sentence in human discourse is imperative, for example this one. The logical captures sentences of relationship, such as:
captures(logical, relationships).
But perhaps we should see a more practical logical program as an example:
mirrors([], []). % The empty list mirrors itself.
mirrors([X|XS], L) :- % [X|XS] mirrors L if
append(LS,[X],L), % LS and [X] compose L, and
mirrors(XS, LS). % XS mirrors LS.
The program is written as a set of assertions, building a model of its world up from nothing. To run this program, you ask it questions about the world:
?- mirrors(X, [1,2,3]). % What mirrors [1,2,3]? X = [3,2,1]
A great deal of human discourse falls into this category: propositional sentences. Most of the sentences in this post fall into that category. However, those familiar with Prolog will know that this is a poor implementation of mirrors (it is quadratic time), and would write it this way instead:
mirror([], R, R). % The mirror of [] prepended to R is R
mirror([X|XS], Y, R) :- % The mirror of [X|XS] prepended to Y is
mirror(XS, [X|Y], R). % the mirror of XS prepended to [X|Y|.
In proper logical style, mirror expresses itself as propositional relationships, with the caveat that the only relationship is "is". Code written this way, relating things by identity rather than propositionally, is actually characteristic of functional style:
mirror [] r = r -- The mirror of [] prepended to r is r
mirror (x:xs) y = -- The mirror of (x:xs) prepended to y is
mirror xs (x:y) -- the mirror of xs prepended to (x:y)
The R in the Prolog code is only connecting together propositions so that we can express this idea in a functional way. The original mirrors predicate is quite unlike this; expressing it in Haskell requires more than a mindless transliteration (however there are still hints of a structural similarity, if a bit elusive).
But I claim that “is” is not the defining linguistic characteristic of functional programs. We could write a functional program with a single equals sign if we were so obfuscatedly-minded; the analogous claim is invalid for logical and imperative programs. The characteristic device of functional programming is the noun: functional programs do not give instructions or express relationships, they are interested in defining objects.
bfs children xs = -- the BF traversal of xs is
xs ++ -- xs itself appended to
bfs (concatMap children xs) -- the BF traversal of each of xs's children
The monad is typically used to express instructions as nouns:
main = -- the main program is the recipe which
getLine >>= \name -> -- gets a line then
putStrLn $ "Hello, " ++ name -- puts "Hello, " prepended to that line
Haskell elites will object to me using IO as a prototypical example of a monad, but the claim is still valid; look at the words used to define monad actions: tell, ask, get, put, call. These are imperative words. This is not a sweeping generalization, however; for example, the constructors of Escardó’s search monad are nouns.
The following table summarizes the linguistic analogies:
| Paradigm | Mechanism | Example |
| Imperative | Imperative mood | Put a piece of bread; put meat on top; put bread on top. |
| Logical | Propositional relationships | Bread sandwiches meat. |
| Functional | Noun phrases | Meat sandwiched by bread |
Each of these paradigms is pretty good in its area. When we’re issuing commands in a functional language, we pretend we are using an imperative language; when we want to treat complex nouns (such as lists) in an imperative language, we are learning fall back on functional concepts to operate on them. When we are documenting our code or talking about types, we use logical ideas (typeclasses, generics).
The thing that makes me hubristically curious is the realization that these three categories hardly cover the mechanisms available in language. What else can we learn from human languages when expressing ourselves to computers?
This post has focused on the “content code” of the various paradigms. There is another kind of code (mostly in statically typed languages) that is interspersed with this, namely declarations:
class Player {...}
data Player = ...
I have not yet been able to find a good analogy for declarations in language; perhaps they introduce something like a proper noun. I will save this topic for another post.
New Languages
It has been some time since I learned a new programming language. Perhaps Haskell has me so firmly entrenched in “desirable properties” that any new language I look at either looks like another Javascript (a decent language sullied by terrible syntax) or Smalltalk (a decent language sullied by terrible engineering). I completely admit that I have had opportunities to learn new languages that I have turned down because I didn’t find the language’s ideas aesthetically pleasing after reading about them for ten minutes. But all in all I have been unimpressed by new languages and haven’t felt inspired to learn them to the degree I would need to use them.
It has also been some time since I have been excited about programming. My readers are surely aware of this as the topics of this blog meander this way and that away from my expertise. I used to love seeking the elegant design, simmering 200 lines down into 40, hypothesizing about language features and module systems, imagining worlds in which all modern software was built upon my paradigm.
I think these two things are related. I am not sure which way the causation goes, or even if there is a causal relationship. But thinking back on my time as a programmer, the times when I was most productive I was learning: working in a new language or working in a new domain. I still think CodeCatalog is a great idea, in total a few hard weeks’ work, and I can’t convince myself to write another line of the damned thing. That’s because I know how it’s gonna go; I know what I have to do; there is no mystery to it.
What if, instead of twisting my arm and trying to force myself into “good behavior”, I embraced this aspect of myself? There has to be some beautiful experimental kernel to that project; there has to be some beautiful way to express it. And it is certainly possible, even likely, that the result won’t end up looking and feeling like StackOverflow or Google Whatever (beta).
So what?
Have I been so brainwashed by the business of software that I will abandon a project because I cannot inspire myself to bring it to business standards? I think it’s because we wanted to make money. It would be nice not to have to worry about paying rent, I admit, but not worth exchanging for the beauty of an inspired work of code.
Someone invent a beautiful web/database language. I mean beautiful and unique — not practical or productive or real-world or familiar or interoperable or scalable. I don’t care about those adjectives, those are for engineers.
My paradigm is better than your paradigm
On twitter I participated in the short snarky exchange:
@stevedekorte – Threads sharing state by default is like variables being global by default.
@luqui – state is like globals :-)
@stevedekorte – @luqui only shared state – which is why FP ultimately fails – it trades comprehensibility for shared state optimizations
@luqui – @stevedekorte, wow, sneaking “FP ultimately fails” as if an obvious truth in a reply to a Haskell programmer
@stevedekorte – @luqui, a bit like sneaking in “[all] state is like globals” to an OO programmer? :-)
@stevedekorte – @psnively @luqui my only issue with FP is the decision to trade expressivity and reusability for less state while calling it progress
The conversation goes on (and on) between many twitterites, having a fun but serious argument about this and that benefit of this and that style. Dynamic/static types come up, OO vs. functional, usefulness, mathematical foundation, learning curves; none of the standard artillery is spared. What irritates me about this style of argument is all the sweeping adjectives (1) used with no definition, thus impossible to support with evidence, and (2) synonymized with better.
In this post, I will draw attention to this irritating vocabulary, so that the next time you use it you can realize how little you are saying.
(Disclaimer: this post is not intended to criticize stevedekorte specifically, even though I identify two of the terms he used below. It was a long, typical programming zealot argument, and all parties involved were being equally dumb :-)
Expressive
A person is expressive if he expresses himself — he has an idea and wants to write it down. So I would say a language is expressive if it allows or enables the programmer to be expressive. Languages that restrict expression are not expressive. So we have the following facts:
- Dynamically typed languages are more expressive than corresponding statically typed ones, because statically typed languages forbid you from expressing some ideas.
- Multiparadigmatic languages are more expressive than languages which are paradigmatically pure, because the former allow you to express yourself if you are not thinking within the framework.
- A language which you are fluent in is more expressive than a language you do not know very well.
By these observations, we might guess that Perl is the most expressive language, Haskell is the least.
Do you notice yourself already equating expressive with good, and thus perhaps feeling offended? Everyone wants an expressive language, right? Here are some reasons some programmers might not want an expressive language:
- Most of my ideas are made of bullshit. Have you ever had a great idea, gone to write it in your blog, and realized it was nonsense because you were unable to write it? So writing is less expressive than thinking. Is thinking better than writing?
- Every programmer has a different way of structuring their thoughts. An expressive language will bring out the differences in thought structure between programmers, and introduce impedance mismatches between programmers on a shared project.
I’m not arguing that expressiveness is bad. I’m just arguing that it doesn’t mean good, it means expressive.
Reusable
A language “is reusable” (to abuse language a bit) if code written in that language can be easily reused.
This “obvious” statement is hiding something very important; namely, reused how? For what? We are in an unfortunate situation in programming: code is designed to be reused in a particular way, and if you want to reuse it in a different way you are pretty much out of luck. An OO widget library is designed for the addition of new types of widgets, but if you want to reuse a program written in the library on a new platform you are in for a lot of work. A functional drawing library is designed so that you can transform and export your drawings in an open-ended variety of ways, composing new ways out of old ones; but if you need to draw a circle you have to build it out of lines, even if there is a much better way to draw a circle on your target. (This is essentially the expression problem).
An abstraction will always expose some things and conceal others. Different languages enable abstraction in different ways, which makes exposing certain things easier and others harder. The zealot will reply, “but in my experience, real-world programs are naturally organized around <insert preferred paradigm>, and <insert uncomfortable paradigm> doesn’t support that style as easily.” I would suggest to this zealot to look deeper into the definition of “real-world” to discover its many facets of subjectivity. (What domain do you work in? Who architected the real-world software you have worked on, and what was their background? What kinds of programs do you consider not to exist in the real world, and what values are you using to minimize them?)
Easy to learn
A language is easier to learn than another language if it takes less time to become competent/fluent programming in that language.
I don’t think this one is as automatically synonymized with “good”. Haskell programmers are aware how much relative effort was required to learn Haskell, and are usually grateful that they put in the effort. But all other things being equal, a language easier to learn ought to be better than one harder to learn.
The deadly omission in the above definition is that people are doing the learning. A language is easier or harder to learn to a single person, and that is entangled with their background, their approach, and their ambitions. When arguing “X is easier to learn than Y”, I encourage you to add one of the following phrases to the end:
- for programmers who already know Z.
- for people with a mathematical background.
- for people with a non-mathematical background.
- for children.
- for me.
Or something similar. The following phrases do not count.
- for almost everyone.
- for people with a typical background.
- for people who want to do useful things.
I’ll close this section with this remark: Haskell is the easiest language to learn, because I already know it.
Conclusion
I know I am frequently irritated by many of these kinds of words, and I’ve only mentioned three here. But you see where I am going. Respect the values of your fellow engineers. If you are a polyglot and like a paradigm, it probably comes from a set of values you have — a set of things you consider important, and that differ from the values of others. Concentrate on that; communicate your values, and try to understand the values of others. If you have toyed with a paradigm and quickly lost interest because of some surface feature (I have — e.g. I will throw out a language without support for closures) or impression, consider the possibility that you like what you like because simply because it is familiar (other equivalent notions: easy, natural). Which is fine, some people like to tinker with their thinking patterns more than others, but remember that you have trained yourself to think in a particular way, so consider that your “objective” judgements about these ultimately human languages could be biased.
(For the record, that last phrase originally read: “all of your ‘objective’ judgements about one of these ultimately human languages are biased”, which is quite a bit stronger than what I had intended)
The “Whole Program” Fallacy
This StackOverflow question has generated a buzz of zealous reactions in the Haskell community. Here are the important bits of the question:
I often find this pattern in Haskell code:
options :: MVar OptionRecord options = unsafePerformIO $ newEmptyMVar doSomething :: Foo -> Bar doSomething = unsafePerformIO $ do opt <- readMVar options doSomething' where ...Basically, one has a record of options or something similar, that is initially set at the programs beginning. As the programmer is lazy, he don’t wants to carry the options record all over the program. … Now each part of the program has to use unsafePerformIO again, just to extract the options.
In my opinion, such a variable is considered pragmatically pure (don’t beat me). …
In this post I will give my own zealous reaction.
To ask a question like this assumes something about the nature of software. The assumption is hiding in these phrases: all over the program, each part of the program. Here, the poster assumes that a program is a large, monolithic beast such that every part of it will need access to this variable, and yet the definition of this variable is not known to the programmer. That means that the program depends on this value. If we purely model the structure of the above example program, we see that every function depends on OptionRecord. So we have (taking the context of a compiler):
parse :: OptionRecord -> String -> AST compile :: OptionRecord -> AST -> LinkerObject simplify :: OptionRecord -> AST -> AST freeVars :: OptionRecord -> AST -> Set Variable safeName :: OptionRecord -> Set Variable -> Variable -> Variable
These are perhaps not the cleanest signatures for parse, compile, and simplify, but they are conceivable in the real world. There is some junk — surely not all three of those functions depend on every option of OptionRecord. It would be cleaner to declare that they depend on exactly the things they actually depend on.
But the problem becomes much more unsettling at freeVars. freeVars takes an OptionRecord — staying true to the original problem description, it must, because it or a function it calls may end up depending options. But what on earth could a global OptionRecord determine about a free variables function? Perhaps there are multiple ways to find free variables — do we count type variables, what scoping mechanism to use — but those are not global options. Different functions will require different behaviors out of that function depending on what they are doing.
We even get such pathologies as shortestPath :: OptionRecord -> Graph -> Node -> Node -> [Node] — a plain, simple, reusable graph algorithm which somehow depends on this global options record. We have no way of telling the compiler — or, more importantly, ourselves — that this algorithm really has nothing to do with the specific compiler we are implementing. Somewhere deep in shortestPath‘s call chain, there is a call to some function which calls an error function which depends on one of the options. Suddenly this beautiful, well-defined function is not reusable. To take it out and use it in another project means to include OptionsRecord in that project, and OptionsRecord has things about compiler and type system extensions, configuration files, who-knows-what, but certainly having nothing to do with graphs. Sure, we can go and dig out the OptionsRecord, replace it with a record that is more suited to the program we are reusing the code in. But you have to go read, understand, mutate code that you just want to work please so you can get on with your project. We have all suffered the head-throbbing pain of integration problems. This is their source.
When I think of software as thousands of lines of specification for something, my mind jumps to problems like the original question. How am I going to write something so huge purely without it being really inconvenient? I see the need for global options, often global state, things ending with Manager (often a global trying to convince you it is a good abstraction), big systems talking about big ideas which are only applicable to my project.
But I have begun to think about software another way. Consider 100 lines. That is potentially a lot of information. The only reason 100 lines is not very much in the code world is because we lack the vocabulary to say what we mean. We are caught up in the details of manipulating lists of identifiers, building sorting trees, defining what we mean by “first” in this or that context. Could you describe your project in 100 lines of English? Perhaps not, but you could get a lot closer than with 100 lines of code.
I’m beginning to think that my latest greatest software project should be as small as possible. I need to build up vocabulary so I can describe it in a small space, but that vocabulary is not a part of my project. That vocabulary belongs to everyone in the same-ish domain of software. And nobody else cares about the meaning of OptionsRecord.
When I think of software this way, the idea that I need to pass around an OptionsRecord as a parameter to every function in my project starts to make sense. Every part of my project depends on its options, but my project is just a little thing that is standing on the shoulders of the giants of vocabulary used to get it there. I don’t mind passing options around between a few functions after I load them.
This is an ideal. I hope to move software closer to this ideal with my CodeCatalog project. Your current project probably cannot be phrased in a few hundred lines right now. But think about what it would look like if you structured it as a small main program + a lot of supporting vocabulary. Think of the supporting vocabulary as something any project could use. What does that mean for the modularity, reusability, and adaptability of your code? What does it mean for the clarity of your specification? What does it mean about the concept of “architecture”?
Searchable Data Types
A few years ago, Martín Escardó wrote an article about a seemingly-impossible program that can exhaustively search the uncountably infinite "Cantor space" (infinite streams of bits). He then showed that spaces that can be thus searched form a monad (which I threw onto hackage), and wrote a paper about the mathematical foundations which is seemingly impossible to understand.
Anyway, I thought I would give a different perspective on what is going on here. This is a more operational account, however some of the underlying topology might peek out for a few seconds. I’m no topologist, so it will be brief and possibly incorrect.
Let’s start with a simpler infinite space to search. The lazy naturals, or the one-point compactification of the naturals:
data Nat = Zero | Succ Nat
deriving (Eq,Ord,Show)
infinity = Succ infinity
Let’s partially instantiate Num just so we have something to play with.
instance Num Nat where
Zero + y = y
Succ x + y = Succ (x + y)
Zero * y = Zero
Succ x * y = y + (x * y)
fromInteger 0 = Zero
fromInteger n = Succ (fromInteger (n-1))
We wish to construct this function:
search :: (Nat -> Bool) -> Maybe Nat
Which returns a Nat satisfying the criterion if there is one, otherwise Nothing. We assume that the given criterion is total, that is, it always returns, even if it is given infinity. We’re not trying to solve the halting problem. :-)
Let’s try to write this in direct style:
search f | f Zero = Just Zero
| otherwise = Succ <$> search (f . Succ)
That is, if the predicate worked for Zero, then Zero is our guy. Otherwise, see if there is an x such that f (Succ x) matches the predicate, and if so, return Succ x. Make sense?
And it seems to work.
ghci> search (\x -> x + 1 == 2)
Just (Succ Zero)
ghci> search (\x -> x*x == 16)
Just (Succ (Succ (Succ (Succ Zero))))
Er, almost.
ghci> search (\x -> x*x == 15)
(infinite loop)
We want it to return Nothing in this last case. It’s no surprise that it didn’t — there is no condition under which search is capable of returning Nothing, that definition would pass if Maybe were defined data Maybe a = Just a.
It is not at all clear that it is even possible to get what we want. But one of Escardó’s insights showed that it is: make a variant of search that is allowed to lie.
-- lyingSearch f returns a Nat n such that f n, but if there is none, then
-- it returns a Nat anyway.
lyingSearch :: (Nat -> Bool) -> Nat
lyingSearch f | f Zero = Zero
| otherwise = Succ (lyingSearch (f . Succ))
And then we can define our regular search in terms of it:
search' f | f possibleMatch = Just possibleMatch
| otherwise = Nothing
where
possibleMatch = lyingSearch f
Let’s try.
ghci> search' (\x -> x*x == 16) -- as before
Just (Succ (Succ (Succ (Succ Zero))))
ghci> search' (\x -> x*x == 15)
Nothing
Woah! How the heck did it know that? Let’s see what happened.
let f = \x -> x*x == 15
lyingSearch f
0*0 /= 15
Succ (lyingSearch (f . Succ))
1*1 /= 15
Succ (Succ (lyingSearch (f . Succ . Succ)))
2*2 /= 15
...
That condition is never going to pass, so lyingSearch going to keep taking the Succ branch forever, thus returning infinity. Inspection of the definition of * reveals that infinity * infinity = infinity, and infinity differs from 15 once you peel off 15 Succs, thus f infinity = False.
With this example in mind, the correctness of search' is fairly apparent. Exercise for the readers who are smarter than me: prove it formally.
Since a proper Maybe-returning search is trivial to construct given one of these lying functions, the question becomes: for which data types can we implement a lying search function? It is a challenging but approachable exercise to show that it can be done for every recursive polynomial type, and I recommend it to anyone interested in the subject.
Hint: begin with a Search data type:
newtype Search a = S ((a -> Bool) -> a)
Implement its Functor instance, and then implement the following combinators:
searchUnit :: Search ()
searchEither :: Search a -> Search b -> Search (Either a b)
searchPair :: Search a -> Search b -> Search (a,b)
newtype Nu f = Roll { unroll :: f (Nu f) }
searchNu :: (forall a. Search a -> Search (f a)) -> Search (Nu f)
More Hint: is searchPair giving you trouble? To construct (a,b), first find a such that there exists a y that makes (a,y) match the predicate. Then construct b using your newly found a.
The aforementioned Cantor space is a recursive polynomial type, so we already have it’s search function.
type Cantor = Nu ((,) Bool)
searchCantor = searchNu (searchPair searchBool)
ghci> take 30 . show $ searchCantor (not . fst . unroll . snd . unroll)
"(True,(False,(True,(True,(True"
We can’t expect to construct a reasonable Search Integer. We could encode in the bits of an Integer the execution trace of a Turing machine, as in the proof of the undecidability of the Post correspondence problem. We could write a total function validTrace :: Integer -> Bool that returns True if and only if the given integer represents a valid trace that ends in a halting state. And we could also write a function initialState :: Integer -> MachineState that extracts the first state of the machine. Then the function \machine -> searchInteger (\x -> initialState x == machine && validTrace x) would solve the halting problem.
The reason this argument doesn’t work for Nat is because the validTrace function would loop on infinity, thus violating the precondition that our predicates must be total.
I hear the following question begging to be explored next: are there any searchable types that are not recursive polynomials?


Recent Comments