Monthly Archives: March 2013

Follow Your Nose Proofs

We just had the first Categories for the Boulderite meetup, in which a bunch of people who don’t know category theory tried to teach it to each other. Some of the people there had not had very much experience with proofs, so getting “a proof” was hard even though the concepts weren’t very deep. I got the impression that those who had trouble mainly did because they did not yet know the “follow your nose” proof tactic which I learned in my first upper division math class in college. That tactic is so often used that most proofs completely omit it (i.e. assume that the reader is doing it) and skip to when it gets interesting. Having it spelled out for me in that class was very helpful. So here I shall repeat it, mostly for my fellow Categories members.

Decide what to do based on a top-down analysis of the sentence you are trying to prove:

Shape of Sentence Shape of Proof
If P, then Q. (aka. P implies Q) Suppose P. <proof of Q>
P if and only if Q (→) <proof of if P implies Q>. (←) <proof of Q implies P>
For all x such that C(x), Q Given x. Suppose C(x). <proof of Q>
There exists x such that Q. Let x = <something> (requires imagination). <proof of Q>
P or Q Either <proof of P> or <proof of Q> (or sometimes something tricksier like assume not P, <proof of Q>)
P and Q (1) <proof of P>. (2) <proof of Q>.
not P Assume P. <find contradiction> (requires imagination)
X = Y Reduce X and Y by known equalities one step at a time (whichever side is easier first). Or sometimes there are definitions / lemmas that reduce equality to something else.
Something really obvious (like X = X, or 0 ≤ n where n is a natural, etc.) Say “obvious” or “trivial” and you’re done.
Something else Find definition or lemma, substitute it in, continue.

Along the way, you will find that you need to use the things you have supposed. So there is another table for how you can use assumptions.

Shape of assumption Standard usage
If P, then Q (aka P implies Q) Prove P. Then you get to use Q.
P if and only if Q P and Q are equivalent. Prove one, you get the other.
For all x such that C(x), P(x) Prove C(y) for some y that you have, then you get to use P(y).
There exists x such that C(x) Use x and the fact that C(x) somehow (helpful, right? ;-).
P and Q Therefore P / Therefore Q.
P or Q If P then <goal>. If Q then <same goal>. (Or sometimes prove not P, then you know Q)
not P Prove P. Then you’re done! (You have inconsistent assumptions, from which anything follows)
X = Y If you are stuck and have an X somewhere in your goal, try substituting Y. And vice versa.
Something obvious from your other assumptions. Throw it away, it doesn’t help you.
Something else Find definition, substitute it in, continue.

Let’s try some examples. First some definitions/lemmas to work with:

Definition (extensionality): If X and Y are sets, then X = Y if and only if for all x, x \in X if and only if x \in Y.
Definition: X \subseteq Y if and only if for every a, a \in X implies a \in Y.

Theorem: X = Y if and only if X \subseteq Y and Y \subseteq X.

Follow your nose proof.

  • (→) Show X = Y implies X \subseteq Y and Y \subseteq X.
    • Assume X = Y. Show X \subseteq Y and Y \subseteq X.
    • Substitute: Show X \subseteq X and X \subseteq X.
    • We’re done.
  • (←) Show X \subseteq Y and Y \subseteq X implies X = Y.
    • Assume X \subseteq Y and Y \subseteq X. Show X = Y.
    • (expand definition of = by extensionality)
    • Show forall x, x \in X if and only if x \in Y.
    • Given x.
    • (→) Show x \in X implies x \in Y.
      • Follows from the definition of our assumption X \subseteq Y.
    • (←) Show x \in Y implies x \in X.
      • Follows from the definition of our assumption Y \subseteq X.

See how we are mechanically disassembling the statement we have to prove? Most proofs like this don’t take any deep insight, you just execute this algorithm. Such a process is assumed when reading and writing proofs, so in the real world you will see something more like the following proof:

Proof. (→) trivial. (←) By extensionality, x \in X implies x \in Y since X \subseteq Y, and x \in Y implies x \in X since Y \subseteq X.

We have left out saying that we are assuming things that you would naturally assume from the follow your nose proof. We have also left out the unfolding of definitions, except perhaps saying the name of the definition. But when just getting started proving things, it’s good to write out these steps in detail, because then you can see what you have to work with and where you are going. Then begin leaving out obvious steps as you become comfortable.

We have also just justified a typical way to show that two sets are equal: show that they are subsets of each other.

Let’s see one more example:

Definition: Given sets A and B, a function f : A → B is a surjection if for every y \in B, there exists an x \in A such that f(x) = y.

Definition: Two functions f,g : A → B are equal if and only if for all x \in A, f(x) = g(x).

Definition: (g \circ f)(x) = g(f(x)).

Definition: For any set A, the identity \mathit{Id}_A is defined by \mathit{Id}_A(x) = x.

Theorem: Given f : A → B. If there exists f-1 : B → A such that f \circ f^{-1} = \mathit{Id}_B, then f is a surjection.

Follow your nose proof.

  • Given f : A → B.
  • Suppose there exists f-1 : B → A and f \circ f^{-1} = \mathit{Id}_B. Show f is a surjection.
  • By definition, show that for all y \in B, there exists x \in A such that f(x) = y.
  • Given y \in B. Show there exists x \in A such that f(x) = y.
  • Now we have to find an x in A. Well, we have y \in B and a function from B to A, let’s try that:
  • Let x = f^{-1}(y). Show f(x) = y.
  • Substitute: Show f(f^{-1}(y)) = y.
  • We know f \circ f^{-1} = \mathit{Id}_B, so by the definition of two functions being equal, we know f(f^{-1}(y)) = \mathit{Id}_B(y) = y, and we’re done.

Again, notice how we are breaking up the task based on the structure of what we are trying to prove. The only non-mechanical things we did were to find x and apply the assumption that f \circ f^{-1} = \mathit{Id}_B. In fact, usually the interesting parts of a proof are giving values to “there exists” statements and using assumptions (in particular, saying what values you use “for all” assumptions with). Since those are the interesting parts, those are the only parts that an idiomatic proof would say:

Proof. Given y \in B. Let x = f^{-1}(y). f(x) = f(f^{-1}(y)) = y since f \circ f^{-1} = \mathit{Id}_A.

Remember to take it step-by-step; at each step, write down what you learned and what you are trying to prove, and try to make a little progress. These proofs are easy if you follow your nose.

Constructions on Typeclasses, Part 1: F-Algebras

This post is rendered from literate Haskell. I recommend doing the exercises inline, so use the source.

> {-# LANGUAGE DeriveFunctor
>            , DeriveFoldable
>            , DeriveTraversable
>            , TypeOperators #-}
> import Control.Applicative
> import Data.Foldable
> import Data.Traversable

Certain kinds of typeclasses have some very regular instances. For example, it is obvious how to implement (Num a, Num b) => Num (a,b) and (Monoid a, Monoid b) => Monoid (a,b), and similarly if F is some applicative functor, (Num a) => Num (F a) and (Monoid a) => (Monoid F a) are obvious. Furthermore, these instances (and many others) seem to be obvious in the same way.

(+) a b     = (+)     <$> a <*> b
mappend a b = mappend <$> a <*> b

fromInteger n = pure (fromInteger n)
mempty        = pure mempty

And take them on pairs:

(x,x')     +     (y,y')  = (x     +     y, x'     +     y')
(x,x') `mappend` (y,y')  = (x `mappend` y, x' `mappend` y')

fromInteger n = (fromInteger n, fromInteger n)
mempty        = (mempty       , mempty)

It would be straightforward for these cases to derive the necessary implementations from the type signature. However, it would be nice if there were a more abstract perspective, such that we didn’t have to inspect the type signature to find the operations – that they could arise from some other standard construction. Further, it is not quite as obvious from the the type signature how to automatically instantiate methods such as

mconcat :: (Monoid m) => [m] -> m

without making a special case for [], whereas hopefully a more abstract perspective would inform us what kinds of type constructors would be supported.

In this post, we will see such an abstract perspective. It comes from (surprise!) category theory. I disclaim that I’m still a novice with category theory (but in the past few weeks I have gained competence by studying). So we will not get very deep into the theory, just enough to steal the useful concept and leave the rest behind. I welcome relevant insights from the more categorically educated in the comments.


The unifying concept we will steal is the F-algebra. An F-algebra is a Functor f and a type a together with a function f a -> a. We can make this precise in Haskell:

> type Algebra f a = f a -> a

I claim that Num and Monoid instances are F-algebras over suitable functors. Look at the methods of Monoid:

mempty :: m
mappend :: m -> m -> m

We need to find a functor f such that we can recover these two methods from a function of type f m -> m. With some squinting, we arrive at:

> data MonoidF m
>     = MEmpty
>     | MAppend m m
> memptyF :: Algebra MonoidF m -> m
> memptyF alg = alg MEmpty
> mappendF :: Algebra MonoidF m -> (m -> m -> m)
> mappendF alg x y = alg (MAppend x y)

Exercise 1: work out the functor NumF over which Num instances are F-algebras, and write the methods of Num in terms of it.

Exercise 2: for each of the standard classes Eq, Read, Show, Bounded, and Integral, work out whether they are expressible as F-algebras. If so, give the functor; if not, explain or prove why not.

Exercise 3: write a function toMonoidAlg which finds the MonoidF-algebra for a given instance m of the Monoid class.

Combining Instances

Motivated by the examples in the introduction, we can find the “instance” for pairs given instances for each of the components.

> pairAlg :: (Functor t) => Algebra t a -> Algebra t b -> Algebra t (a,b)
> pairAlg alga algb tab = (alga (fmap fst tab), algb (fmap snd tab))

Also, we hope we can find the instance for an applicative functor given an instance for its argument

applicativeAlg :: (Functor t, Applicative f)
               => Algebra t a -> Algebra t (f a)

but there turns out to be trouble:

applicativeAlg alg tfa = ...

We need to get our hands on an t a somehow, and all we have is a t (f a). This hints at something from the standard library:

sequenceA :: (Traversible t, Applicative f) => t (f a) -> f (t a)

which indicates that our functor needs more structure to implement applicativeAlg.

> applicativeAlg :: (Traversable t, Applicative f)
>                => Algebra t a -> Algebra t (f a)
> applicativeAlg alg tfa = fmap alg (sequenceA tfa)

Now we should be able to answer the query from the beginning:

Exercise 4: For what kinds of type constructors c is it possible to automatically derive instances for (a) pairs and (b) Applicatives for a typeclass with a method of type c a -> a. (e.g. mconcat :: [a] -> a). Demonstrate this with an implementation.

Combining Classes

Intuitively, joining the methods of two classes which are both expressible as F-algebras should give us another class expressible as an F-algebra. This is demonstrated by the following construction:

> data (f :+: g) a = InL (f a) | InR (g a)
>     deriving (Functor, Foldable, Traversable)
> coproductAlg :: (Functor f, Functor g)
>              => Algebra f a -> Algebra g a -> Algebra (f :+: g) a
> coproductAlg falg _ (InL fa) = falg fa
> coproductAlg _ galg (InR ga) = galg ga

So now we can model a subclass of both Num and Monoid by type NumMonoidF = NumF :+: MonoidF.

Exercise 5: We hope to be able to recover Algebra NumF a from Algebra NumMonoidF a, demonstrating that the latter is in fact a subclass. Implement the necessary function(s).

Exercise 6: Given the functor product definition

> data (f :*: g) a = Pair (f a) (g a)
>     deriving (Functor, Foldable, Traversable)

find a suitable combinator for forming algebras over a product functor. It may not have the same form as coproduct’s combinator! What would a typeclass formed by a product of two typeclasses interpreted as F-algebras look like?

Free Constructions

One of the neat things we can do with typeclasses expressed as F-algebras is form free monads over them – i.e. form the data type of a “syntax tree” over the methods of a class (with a given set of free variables). Begin with the free monad over a functor:

> data Free f a
>     = Pure a
>     | Effect (f (Free f a))
>     deriving (Functor, Foldable, Traversable)
> instance (Functor f) => Monad (Free f) where
>     return = Pure
>     Pure a >>= t = t a
>     Effect f >>= t = Effect (fmap (>>= t) f)

(Church-encoding this gives better performance, but I’m using this version for expository purposes)

Free f a can be interpreted as a syntax tree over the typeclass formed by f with free variables in a. This is also called an “initial algebra”, a term you may have heard thrown around in the Haskell community from time to time. We demonstrate that a free construction over a functor is a valid F-algebra for that functor:

> initialAlgebra :: (Functor f) => Algebra f (Free f a)
> initialAlgebra = Effect

And that it is possible to “interpret” an initial algebra using any other F-algebra over that functor.

> initiality :: (Functor f) => Algebra f a -> Free f a -> a
> initiality alg (Pure a) = a
> initiality alg (Effect f) = alg (fmap (initiality alg) f)

Exercise 7: Give a monoid isomorphism (a bijection that preserves the monoid operations) between Free MonoidF and lists [], ignoring that Haskell allows infinitely large terms. Then, using an infinite term, show how this isomorphism fails.

Next time: F-Coalgebras

How GADTs inhibit abstraction

Today I want to talk about this snippet:

This program ought to be well-behaved — it has no recursion (or recursion-encoding tricks), no undefined or error, no incomplete pattern matches, so we should expect our types to be theorems. And yet we can get inconsistent. What is going on here?

Exercise: Identify the culprit before continuing.

The problem lies in the interaction between GADTs and generalized newtype deriving. Generalized newtype deriving seems to be broken here — we created a type B which claims to be just like A including instances, but one of A‘s instances relied on it being exactly equal to A. And so we get a program which claims to have non-exhaustive patterns (in unSwitchB), even though the pattern we omitted should have been impossible. And this is not the worst that generalized newtype deriving can do. When combined with type families, it is possible to write unsafeCoerce. This has been known since GHC 6.7.

In this post I intend to explore generalized newtype deriving and GADTs more deeply, from a more philosophical perspective, as opposed to just trying to plug this inconsistency. There are a few different forces at play, and by looking at them closely we will see some fundamental ideas about the meaning of types and type constructors.

Generalized newtype deriving seems reasonable to us by appealing to an intuition: if I have a type with some structure, I can clone that structure into a new type — basically making a type synonym that is a bit stricter about the boundaries of the abstraction. But the trouble is that you can clone parts of the structure without other parts; e.g. if X is an applicative and a monad, and I declare newtype Y a = Y (X a) deriving (Monad), then go on to define a different Applicative instance, I have done something wrong. Monad and applicative are related, so you can’t just change them willy nilly as though they were independent variables. But at the very least it seems reasonable that you should be able to copy all the structure, essentially defining a type synonym but giving it a more rigorous abstraction boundary. But in Haskell, this is not possible, and that is because, with extensions such as GADTs and type families, not all of a type’s structure is clonable.

I’m going to be talking a lot about abstraction. Although the kind of abstraction I mean here is simple, it is one of the fundamental things we do when designing software. To abstract a type is to take away some of its structure. We can abstract Integer to Nat by taking away the ability to make negatives — we still represent as Integer, but because the new type has strictly fewer operations (it must be fewer — after all we had to implement the operations somehow!) we know more about its elements, and finely tuning that knowledge is where good software engineering comes from.

When implementing an abstraction, we must define its operations. An operation takes some stuff in terms of that abstraction and gives back some stuff in terms of that abstraction. Its implementation must usually use some of the structure of the underlying representation — we define addition on Nat by addition on Integer. We may take it for granted that we can do this; for example, we do not have trouble defining:

    sum :: [Nat] -> Nat

even though we are not given any Nats directly, but instead under some type constructor ([]).

One of the properties of type constructors that causes us to take this ability to abstract for granted is that if A and B are isomorphic (in a sense that will become clear in a moment), then F A and F B should also be isomorphic. Since we, the implementers of the abstraction, are in possession of an bijection between Nats and the Integers that represent them, we can use this property to implement whatever operations we need — if they could be implemented on Integer, they can be implemented on Nat.

This isomorphism property looks like a weak version of saying that F is a Functor. Indeed, F is properly a functor from a category of isomorphisms in which A and B are objects. Every type constructor F is a functor from some category; which category specifically depends on the structure of F. F's flexibility to work with abstractions in its argument is determined by that category, so the more you can do to that category, the more you can do with F. Positive and negative data types have all of Hask as their source category, so any abstractions you make will continue to work nicely under them. Invariant functors like Endo require bijections, but fortunately when we use newtype to create abstractions, we have a bijection. This is where generalized newtype deriving gets its motivation -- we can just use that bijection to substitute the abstraction for its representation anywhere we like.

But GADTs (and type families) are different. A functor like Switch b has an even smaller category as its domain: a discrete category. The only thing which is isomorphic to A in this category is A itself -- whether there is a bijection is irrelevant. This violates generalized newtype deriving's assumption that you can always use bijections to get from an abstraction to its representation and back. GADTs that rely on exact equality of types are completely inflexible in their argument, they do not permit abstractions. This, I claim, is bad -- you want to permit the user of your functor to make abstractions.

(Aside: If you have a nice boundary around the constructors of the GADT so they cannot be observed directly, one way to do this when using GADTs is to simply insert a constructor that endows it with the necessary operation. E.g. if you want it to be a functor from Hask, just insert

    Fmap :: (a -> b) -> F a -> F b

If you want it to be a functor from Mon (category of monoids), insert:

    Fmap :: (Monoid n) => MonoidHom m n -> F m -> F n

(presumably F m already came with a `Monoid` dictionary). These, I believe, are free constructions -- giving your type the structure you want in the stupidest possible way, essentially saying "yeah it can do that" and leaving it to the consumers of the type to figure out how.)

In any case, we are seeing something about GADTs specifically that simple data types do not have -- they can give a lot of different kinds of structure to their domain, and in particular they can distinguish specific types as fundamentally different from anything else, no matter how similarly they may behave. There is another way to see this: defining a GADT which mentions a particular type gives the mentioned type unclonable structure, such that generalized newtype deriving and other abstraction techniques which clone some of a type's structure no longer succeed.