# Generalization

Okay, here’s the latest on my type inference algorithm: (1) it rocks, (2) it is still having trouble with generalization. My current example is the following (using ML-style reference syntax):

```    fix \f x { x := !x + 1; if 10 < !x then x else f x }
```

That fix business is how you implement recursion under the hood. Now, the correct type of this function is:

```    ^a ^b (b -> b) [ b <: Ref a ] [ Int <: a <: Num ]
```

This type may not even be obvious to type theorists. Let me explain: the caret (^) means “the greatest lower bound over”, which can be read roughly as “for all”. The form of such a type is ^variable (type_involving_variable) [constraints_for_variable]. That circumfix syntax is awkward, since the constraints for a are way at the end of the expression. Still, let’s try to read it: for all a, for all b, the type is a function which takes a b and returns a b, as long as b is a subtype of a reference to an a, as long as a is at least an Int and at most a Num. Yowch, that’s a mouthfull. The paraphrase of that is, this function takes any reference to a number (at least an integer; i.e. no “just even number” subtypes or anything) and returns that reference (in this case, changing its contents first). You can see that that is correct given the code.

I manually entered the equations for that expression into my type inferencer (it doesn’t do AST traversal yet, but that part is trivial). I looked over the two-odd pages of output for the reduced equations, and they were right on. That’s the (1) it rocks part. Then it tried to generalize those equations, and this is what it came up with:

```    ^a ^b (b -> b) [ b <: Ref Int, b <: Ref a ] [ Int <: a <: Num ]
```

It looks pretty close. There’s just this pesky little b <: Ref Int clause, which is wrong! It says that whatever you pass this function has to be a subtype of Ref Int. But a Ref Num is not a subtype of a Ref Int; in fact, Ref x is only a subtype of Ref Int when x is itself Int. Some type checkers would actually type it this way, saying that the function’s type is Ref Int -> Ref Int, but for mine, that is not acceptable. So what’s going wrong with the algorithm?

Okay, so the generalization algorithm is given a bunch of equations involving type variables and other types. For each variable, the algorithm wants to find either (a) a direct substitution, where you can safely replace that variable with another type, or (b) a minimal set of constraints on that variable which must form acyclic dependencies (cyclic dependencies would lead to an infinite loop when we generalize over those variables).

It currently does this by finding all constraints on each variable, and then gradually reducing them. That is, it looks through every equation, and when it sees an equation that looks like a <: v, where v is a variable, it adds a as a “lower bound”, and when it sees v <: a, then it adds a as an ‘upper bound”. What you end up with is a mapping from variables to their bounds, like, say: a = { Int | Num }, b = { a | c }, c = { Ref b | a, Num, Str }, etc. Then it reduces them by eliminating any lower bounds which are lower than other lower bounds. That is, if you had a = { Int, Num | }, it would eliminate Int, because Num being a lower bound already implies that Int is. It will do this for variables and other types, consulting the equations to see what is a subtype of what.

After that, it looks for substitutions. That is, if you have a constraint that looks like a = { b | } or a = { | b } or a = { b | b }, then we say “okay, just bind a to b” and we substitute b for a everywhere (including in the other constraints). Then it goes back to the reduction phase. It alternates between these two phases until there is nothing left to do. You end up with a substitution and constraints that meet the requirements.

But I think the fact that it creates substitutions out of a = { b | } and a = { | b }is wrong. a = { b | b } is correct; that means that a and b are the same type. But a = { b | } says that a could be b, or any supertype of b, so making a substitution out of that is wrong. But I can’t just take out those two rules, because then we end up generalizing over every variable we solved for (the types end up looking like ^a ^b ^c ^d ^e ^f ^g ^h ^i ...).

Something that I have noticed that I haven’t figured out how to incorporate yet is some reductions you can make based on what kind of generalization you are doing (greatest lower bound, whch is typical, and least upper bound, which is quite uncommon but still supported). For example, ^a (a) [ Int <: a ]. the greatest lower bound of all types which are supertypes of Int, is obviously just Int. I’m wondering if I can take that idea and run with it to get a good generalization. However, it is not entirely simple, because ^a (a -> a) [ Int <: a ] does not reduce to Int -> Int; in fact it is not an arrow type at all, which means it must be written as a greatest lower bound! I proved something similar back when I was first exploring this calculus.

So I suppose I could say when you try to generalize over a variable, look at the type in which it is used. If it only appears in covariant position (on the right of an arrow, or not involved in an arrow), then substitute its greatest lower bound if it has one. If it only appears in contravariant position (on the left of an arrow), substitute its least upper bound if it has one. If it appears in both positions (appearing as an argument to eg. Ref counts as both positions) then you cannot substitute it, so just generalize. Let’s take our example from above. I’ll be working from this dump of my algorithm’s thought process, namely the equations section at the beginning.

We are trying to generalize to the greatest lower bound of 20 (that is the variable I assigned the result). 20 is only directly involved in one equation: 107 <: 20, so we can safely replace 20 by 107. Now we are trying to find the greatest lower bound of 107. 107 has several upper bounds, but we don’t care about those (unless it had no lower bounds, then we would). It only has one lower bound, namely (10 -> 19). Now here’s where it gets tricky. I have implemented it (several times, followed by a head slap and delete) where it just tries to find the least upper bound of 10 and the greatest lower bound of 19, but that doesn’t work, because, say, 19 could depend on 10. Only after you’re sure that they’re independent can you do that. So… what do we do now?

# Complete Lattice Type System

I implemented a simple-minded type inferencer for meme (all monotypes–no polymorphism). Then I started implementing “Builtin” AST nodes so that I could have polymorphism in specific examples, like if. That got me wondering whether I could do polymorphism, as long as it was declared. I ran through the type inferencer algorithm by hand for hours, trying out different features to handle that case (this was an extremely tedious process). Finally I took my simple type inferencer and added a bunch of features to experiment with in code, rather than by hand. It went through several iterations, the first being universal types instantiate themselves on the left side of a <:, and “generalize” on the right. This generalization was similar to instantiation, but it would be come an “uppercase” variable, which would need to be instantiated again if it ever passed through a function argument.

Of course, that first iteration was a miserable failure. After messing around with it and racking my brain a bit more, I eventually came up with a way to do suprema and infima à la my last type theory post. And after fixing a few key things, IT WORKED. That is to say, it worked on my example. But it wasn’t a totally trivial example; it underlined the key thing I wanted it to do. Namely, this expression: (\id -> (id 1, id "hi")) ((\x -> x) :: forall a. a -> a), which is better than Haskell does :-).

I made it so you mark the type variables your are solving for as either supremum or infimum types. Function parameters can safely be infima, and function return values can safely be suprema. All the rest, I think, can be marked singular (i.e. standing for an atomic or an arrow type). Then there are a few laws that reduction must follow with these types. Namely: limit types (that’s the name I give to infima, suprema, and universal types together) cannot instantiate in equations involving other limit types (this makes sense mathematically; if the two types are equal, and they truly are nonsingular, then there is in fact no valid instantiation). Then there are rules for instantiating limit types when they are in equations with singular types.

However, the algorithm isn’t totally clean. It does some hackish stuff to make sure that it terminates (in this example; I’m nowhere near proving that the algorithm always terminates). It marks instantiated variables as “I have been instantiated, don’t instantiate me again”, which I don’t consider to be a very clean or correct solution. However, it does seem to work (and I’ve tried giving the algorithm input that I thought would make it infinite loop, such as self-referential types, and it didn’t).

Anyway, I’m not going to go too in-depth at the moment. I have to get to bed. Suffice it to say that it’s very cool. The source code is here (needs GHC 6.6 to run). To see that the output worked, note that the result of id 1 above is the variable v4, and the result of id "hi" is the variable v6. Somewhere in the big spew of output, you’ll see the lines Int <: v4 and Str <: v6, and (very important) not Int <: v6 or vice versa.

# Perl 6 Rules: Elementary Compositionality and More Contrived Vocabulary

Larry Wall made a change to Synopsis 5, section “Longest-token matching” splitting the alternation operator | in two: | and ||. The idea is that || works just like the old one: try to match the left any way you can, and if that fails, then try the right. This makes sense to me, and the double-or name makes sense too (that’s how it works in perl code). On the other hand, the single | operator has some wildly strange semantics:

So for regex matching purposes we define we define token patterns as those patterns containing no whitespace that can be matched without side-effects or self-reference.

To that end, every regex in Perl 6 is required to be able to distinguish its “pure” patterns from its actions, and return its list of initial token patterns (transitively including the token patterns of any subrule called by the “pure” part of that regex, but not including any subrule more than once, since that would involve self reference). A logical alternation using | then takes two or more of these lists and dispatches to the alternative that matches the longest token prefix. This may or may not be the alternative that comes first lexically.

He’s going for two things here: raw speed and dwimming on the longest-token rule. I think he hit the first one pretty much smack on, and missed the second one, shooting himself in the ankle instead.

This longest token rule violates something which I just made up now called elementary compositionality. Even though it is nothing more than a figment of my mind at the moment, I believe it to be very important. It guarantees that you won’t get into trouble when you try to refactor; and let’s face it, refactoring is what modern software engineering is all about.

Let me try to define in very precise terms what elementary compositionality is. First, let’s define the language of a rule to be the set of all strings that it matches exactly. That is, \$a ε Lang(rule) iff \$a ~~ /^<rule>\$/. Then, to borrow a term from model theory, two rules are elementarily equivalent if they have exactly the same language. Finally, elementary compositionality is a property of a matching strategy where: if a, a', b, and c are rules, where b is elementarily equivalent to c, and a' is obtained by replacing every occurence of b with c in a, then a is elementarily equivalent to a'.

Stated less formally: if you replace every call to a rule with a call to an elementarily equivalent rule, nothing changes.

Now for the pragmatics. Perl 6 rules are very rich, and we can’t expect elementary compositionality everywhere. For example, the <commit> directive violates it, and it can be violated if you rename your captures (another thing that bugs me: there is no encapsulation in rules, but that is a discussion for another day). But those are predictable somewhat. I mean, you added <commit> precisely to change large-scale matching semantics: that’s what it’s for. Also, people are accessing your \$<reslut>, so if you call it your \$<mother> instead, people aren’t going to be able to access it. The renaming thing is also fairly refactor-friendly, since you can rename and rebind captures as you wish. However, what is unpredictable (and also extremely hard to remedy) is the following: say you have a rule which matches any IPv4 address:

```    token byte { <[0-1]> <digit>**{2} | 2 <[0-4]> <digit> | 25<[0-5]> }
token ip { <byte> \. <byte> \. <byte> \. <byte> }
```

You realize what an ugly and unclear way to write byte that is, so you rewrite it:

```    token byte { (<digit>**{3}) { fail unless \$1 < 256 } }
```

Ahh, much nicer. Except one problem. The rule:

```    rule literal { <float> | <ip> }
```

has just changed semantics! On the input 123.456.789.012, literal used to match the entire string as an IP, but now it matches 123.456 only as a float (then probably fails due to ratcheting when the grammar doesn’t accept .789 as the next thing).

Now, unfortunately, your little refactor has just disabled an optimization. The early, ugly byte was expressed as a regular expression, so it could be combined with ip the other common prefixes of literal (and sibling rules) into a DFA. The new, pretty byte cannot be compiled that way. I think this is one of the reasons Larry went with the “automatically detect tokens” approach. But I don’t think this is sufficient reason to violate elementary compositionality everywhere (to drive home the point: run this example in reverse, and in trying to optimize your grammar, you have also changed semantics).

The solution I have in mind at the moment, and I admit it is not totally optimal, is to have the user specify what he thinks the tokens are, so that the compiler can really match the longest token, not the longest “token-looking-ish thing”. The reason that is not optimal is because sometimes it is nice to inline syntax into rules without having to go through the token abstraction, and we’re not really sure which ones of those to call tokens. Maybe we could make up a simple heuristic, such as all constant strings are considered tokens (at the rule level, not the transitive closure level), which isn’t too different from the first version of the | / || separation. But the point is that I should be able to take control from Perl’s dwimmery, say “no, you’re wrong”, and ensure that I can refactor without fear from then on.

# Multiple Mice Support for Games

Some time ago, we had a two-player game whose most natural input method was the mouse. That’s too bad, because most OSes make it exceedingly difficult to handle two mice separately; they just clump the mice together, so both mice control the same cursor.

Today I stumbled upon a teeny little cross-platform library called manymouse, which handles different mice separately! The interface is really easy (a little low-level, but workable); I got it incorporated hackishly into our SDL sword game in about twenty minutes. So, that’s cool.

# Complete Upper Bound Type System

I have been working on a type system for the last couple days, in case you haven’t noticed. But now I have one concrete and axiomatized, which is strong enough to infer the type of (f -> f 0) id. “Oh boy,” you might say, “what an accomplishment”. However, the generality of the system makes it seem like it can check many, if not all, other types. I just don’t know whether it’s decidable (but at least now that it is axiomatized, I can start to analyze it formally). And again, I don’t care if it’s decidable, as long as I can get an algorithm that gets it right “most of the time”.

Haskell types the identity function as id :: forall a. a -> a. That is to say, there is one identity function for every type a, and it has type a -> a. But arguments to functions must be “monomorphic”; that is, when you pass something which has a whole set of values or types, Haskell has to pick one before passing. That’s the reason (f -> (f 42, f "hello")) id does not compile: because f has type a -> a for some type a, not for every type (it has to be a specific function, not a whole class of them). id is perfectly capable of working on both integers and strings, so it is safe to pass its polymorphic form as f, but it is not well typed.

This type system is different: the type forall a. a -> a is a single type. In fact, nothing is typed with a whole set of types; every value that could be thus typed in Haskell has a single, unique type in this type system. This is done by defining the type of forall a. a -> a to be the least upper bound of the set of all types of that form.

A little type theory background: There is a type Top which is a supertype of every type (i.e. it admits any value), and there is a type Bot which is a subtype of every type (i.e. it admits no values).

Here are my axioms. The non-logical symbols are (<:) for “subtype” and A(x,y) for the arrow type “x -> y”.

1. ∀ a ∀ b ∀ c ((a <: b ∧ b <: c) ⇒ a <: c)
2. ∀ a ∀ b ((a <: b ∧ b <: a) ⇔ a = b)
3. ∀ a ∀ b ∀ c ∀ d (A(a,b) <: A(c,d) ⇔ (c <: a ∧ b <: d))
4. for each atomic type A in the calculus: ∀ a ∀ b (A ≠ A(a,b))
5. Schema: for each wff φ(x): ∃ a such that the following hold:
1. ∀ x (φ(x) ⇒ x <: a)
2. ∀ x (∀ y (φ(y) ⇒ y <: x) ⇒ a <: x)
3. ∀ x (x <: a ⇒ (x = a ∨ ∃ y (x <: y ∧ φ(y))))

And here is an explanation of each of the axioms:

# Lambda Constraint Typing

Here’s an idea for a type inference algorithm. For my project, I do not need an algorithm which determines types, just an algorithm which can answer questions about them. Also, I am not checking types; the fundamental assumption of my language is that the programmer is always right. I am simply getting as much information as I can in order to help understand what the programmer meant.

This is a form of polymorphism which, instead of generalizing during unification just carries around its constraints. It is a sort of let-polymorphism which works on all lambda abstractions. I’ll show by example. Consider the Haskell expression (f -> (f 0, f "")) id. Haskell rejects this program as ill-typed, but it clearly makes sense. So let’s try to make sense of it.

```We want to type infer the expression:

(λf -> (f 0, f "")) id

-- Give types to the subexpressions of the lambda
f :: a
f 0 :: b
f "" :: c
(f 0, f "") :: (b,c)

-- Find constraints
a <: Int -> b   -- from the application f 0
a <: Str -> c   -- from the application f ""

-- Generalize to lambda (bind all free variables under forall)
-- The | is to be read "as long as"
λf -> (f 0, f "")
:: forall a b c. a -> (b,c) | a <: Int -> b
; a <: Str -> c

-- This is the type of id (a builtin, so we knew that)
id :: forall d. d -> d

(λf -> (f 0, f "")) id
-- Instantiate the lambda, substituting the type of id for a
:: forall b c. (b,c) | (forall d. d -> d) <: Int -> b
; (forall d. d -> d) <: Str -> c
-- Expand the equations into dependent sets
:: forall b c. (b,c) | (exists d. Int <: d ; d <: b)
; (exists d. Str <: d ; d <: c)
-- Reduce constraints
:: forall b c. (b,c) | Int <: b
; Str <: c
-- Constraint specification
:: (Int, Str)
```

It looks pretty and powerful. After playing with this system a little bit, I discovered exactly how powerful it is. When you start putting constrained types into constraints, it becomes clear that you are embedding all of first-order logic. (For those who aren’t familiar with this kind of thing, the most common problem with type systems is that they are too powerful, not not powerful enough: once you can embed PA, it is possible to write a program that will never finish compiling!).

# Semantic Inference

I’m beginning to think about the finer details of my experimental language. In particular, I think that most of the semantic inference will come from type inference. And that is tricky, because I want subtyping and I want references (for type theory virgins, that’s a really tough combination; i.e. it is undecidable in general). And it is even trickier because some of the usage information is going to be missing.

However, I am not trying to do type inference in order to do type checking, which makes it a bit nicer. I am doing type inference in order to get whatever information I can out of the code to reconstruct the semantics the user was intending. So if you say something like:

```    let x = 4
print x
```

No type inference need be done, because the program is already semantically unambiguous.

What I want is a calculus which can pick objects of various types and combine them together to get a particular type. For example:

```    list = [1,2,3,4,5]
total = 0
for i {
total += i
}
```

We can figure out that list :: List Int, total :: Int, and i :: Int. We also have that (after desugaring) for :: (a -> (), List a) -> (). The block given to for is of type Int -> () (by our inference on i), so it is looking for a List Int to fill the gap. list is clearly our candidate.

My guess is that this is not decidable. But this isn’t the kind of language that cares about that. It’ll just say “uh, I can’t figure this one out, please give me more info” and then the user will be all “ok, here you go”. It’s not about getting everything right, it’s just about reducing the programmer’s workload in common cases.

Okay, so how do we deal with references (a side note, Firefox 2 has a spell checker and it doesn’t think that “Okay” is a word. Hmph.)? You can’t infer them if you have a type calculus with subtyping (you can use colored local type inference to get a lot of them right, though). First, because I’m a pure functional kind of dude nowadays, I think I’ll do references the ML way: you can only assign variables once, but you can put a reference in a variable and change the reference. That, of course, does not solve the problem. So I’m going to recommend that references be explicitly typed. If they aren’t, the type will default to Any; i.e. no semantic inference can be done with them.

So I think the bulk of my language will be a lambda calculus with a semantic/type inferencer as described above, plus some nice name correspondence rules.

Here’s something that’s a little strange: name resolution can depend on type inference. Consider this code:

```    class Point {
has x, y;
}
pt = getLocation();
print x
```

This will not compile if getLocation‘s type cannot be determined. But if you annotate it getLocation :: Point, this will compile, since when it sees the unresolved identifier x, it will look for things in its scope which have an x member. Since that looks so funny, I may want to reserve that kind of inference .x instead.

Even weirder: type inference can depend on naming. This is the inflector module, which I haven’t talked about much. If you see code like this:

```    class Point {
has x,y;
}
point = getLocation();
```

Without any information about getLocation (or any other way to determine the type of point), the language will assume that point is of type Point. I would also like it to assume that points is of type List Point (or maybe something more general like Aggregate Point). Finally, I’d like it to build a connection that infers foo :: a implies foos :: List a (and vice versa).

I’m not certain if this incredibly presumptuous type inferencer plus a semantic inferencer based on that will give me everything I want, but it is possible that it will do an awful lot.

UPDATE: Ooh, I was lying in bed thinking about this, and I realized that the sort of type inference we are doing is much weaker than usual type inference. The fact that we don’t care what the type is, we’re only asking questions about it (eg. does it fit in an Aggregate?), makes the job much simpler (doable with an equation reduction algorithm) and also allows for a more versatile type system. Cool!

# Love, part 2

Okay, I just took the bus back from Karlin’s house, and I wanted to post before the high wears off.

She makes me feel wonderful now. I am really lucky to have someone like that. Who cares about the future; worrying about it takes attention away from the present. I had forgotten that I am really enjoying the present.

We make deep eye-contact a lot. During dinner, she pulled one of those “oh crap, he looked up, I’d better look back down” moves. She walked with me back to the bus stop, about ten minutes through the cold (that means a lot coming from her). Love, schmuv. It’s just a word. She really likes me, and we have a powerful connection. It’s totally stupid to be sad about this.

# Improv Band

I just posted the following to craigslist Denver:

I’m a keyboardist who has been playing for 12 years. I’m interested in starting an improv band, but somewhat different from a “jam band”. I want to see how improv goes when there is always one person not playing his instrument, but instead “conducting” the whole process (but that person changes from jam to jam). I think this could allow for dynamics and solidity that isn’t often heard in “jam music”, and could be quite a unique sound.

I am open to many styles. The most important thing is that you have to be _capable_ of dynamic playing (eg. being able to bring it waaay down low now and then), and you have to be a good listener. All instrumentalists are welcome: I have no preference for the instrumentation (every instrument brings its own unique contribution, and improv has a way of accommodating that nicely).

We’ll see how it goes.