Generalization

Okay, here’s the latest on my type inference algorithm: (1) it rocks, (2) it is still having trouble with generalization. My current example is the following (using ML-style reference syntax):

```    fix \f x { x := !x + 1; if 10 < !x then x else f x }
```

That fix business is how you implement recursion under the hood. Now, the correct type of this function is:

```    ^a ^b (b -> b) [ b <: Ref a ] [ Int <: a <: Num ]
```

This type may not even be obvious to type theorists. Let me explain: the caret (^) means “the greatest lower bound over”, which can be read roughly as “for all”. The form of such a type is ^variable (type_involving_variable) [constraints_for_variable]. That circumfix syntax is awkward, since the constraints for a are way at the end of the expression. Still, let’s try to read it: for all a, for all b, the type is a function which takes a b and returns a b, as long as b is a subtype of a reference to an a, as long as a is at least an Int and at most a Num. Yowch, that’s a mouthfull. The paraphrase of that is, this function takes any reference to a number (at least an integer; i.e. no “just even number” subtypes or anything) and returns that reference (in this case, changing its contents first). You can see that that is correct given the code.

I manually entered the equations for that expression into my type inferencer (it doesn’t do AST traversal yet, but that part is trivial). I looked over the two-odd pages of output for the reduced equations, and they were right on. That’s the (1) it rocks part. Then it tried to generalize those equations, and this is what it came up with:

```    ^a ^b (b -> b) [ b <: Ref Int, b <: Ref a ] [ Int <: a <: Num ]
```

It looks pretty close. There’s just this pesky little b <: Ref Int clause, which is wrong! It says that whatever you pass this function has to be a subtype of Ref Int. But a Ref Num is not a subtype of a Ref Int; in fact, Ref x is only a subtype of Ref Int when x is itself Int. Some type checkers would actually type it this way, saying that the function’s type is Ref Int -> Ref Int, but for mine, that is not acceptable. So what’s going wrong with the algorithm?

Okay, so the generalization algorithm is given a bunch of equations involving type variables and other types. For each variable, the algorithm wants to find either (a) a direct substitution, where you can safely replace that variable with another type, or (b) a minimal set of constraints on that variable which must form acyclic dependencies (cyclic dependencies would lead to an infinite loop when we generalize over those variables).

It currently does this by finding all constraints on each variable, and then gradually reducing them. That is, it looks through every equation, and when it sees an equation that looks like a <: v, where v is a variable, it adds a as a “lower bound”, and when it sees v <: a, then it adds a as an ‘upper bound”. What you end up with is a mapping from variables to their bounds, like, say: a = { Int | Num }, b = { a | c }, c = { Ref b | a, Num, Str }, etc. Then it reduces them by eliminating any lower bounds which are lower than other lower bounds. That is, if you had a = { Int, Num | }, it would eliminate Int, because Num being a lower bound already implies that Int is. It will do this for variables and other types, consulting the equations to see what is a subtype of what.

After that, it looks for substitutions. That is, if you have a constraint that looks like a = { b | } or a = { | b } or a = { b | b }, then we say “okay, just bind a to b” and we substitute b for a everywhere (including in the other constraints). Then it goes back to the reduction phase. It alternates between these two phases until there is nothing left to do. You end up with a substitution and constraints that meet the requirements.

But I think the fact that it creates substitutions out of a = { b | } and a = { | b }is wrong. a = { b | b } is correct; that means that a and b are the same type. But a = { b | } says that a could be b, or any supertype of b, so making a substitution out of that is wrong. But I can’t just take out those two rules, because then we end up generalizing over every variable we solved for (the types end up looking like ^a ^b ^c ^d ^e ^f ^g ^h ^i ...).

Something that I have noticed that I haven’t figured out how to incorporate yet is some reductions you can make based on what kind of generalization you are doing (greatest lower bound, whch is typical, and least upper bound, which is quite uncommon but still supported). For example, ^a (a) [ Int <: a ]. the greatest lower bound of all types which are supertypes of Int, is obviously just Int. I’m wondering if I can take that idea and run with it to get a good generalization. However, it is not entirely simple, because ^a (a -> a) [ Int <: a ] does not reduce to Int -> Int; in fact it is not an arrow type at all, which means it must be written as a greatest lower bound! I proved something similar back when I was first exploring this calculus.

So I suppose I could say when you try to generalize over a variable, look at the type in which it is used. If it only appears in covariant position (on the right of an arrow, or not involved in an arrow), then substitute its greatest lower bound if it has one. If it only appears in contravariant position (on the left of an arrow), substitute its least upper bound if it has one. If it appears in both positions (appearing as an argument to eg. Ref counts as both positions) then you cannot substitute it, so just generalize. Let’s take our example from above. I’ll be working from this dump of my algorithm’s thought process, namely the equations section at the beginning.

We are trying to generalize to the greatest lower bound of 20 (that is the variable I assigned the result). 20 is only directly involved in one equation: 107 <: 20, so we can safely replace 20 by 107. Now we are trying to find the greatest lower bound of 107. 107 has several upper bounds, but we don’t care about those (unless it had no lower bounds, then we would). It only has one lower bound, namely (10 -> 19). Now here’s where it gets tricky. I have implemented it (several times, followed by a head slap and delete) where it just tries to find the least upper bound of 10 and the greatest lower bound of 19, but that doesn’t work, because, say, 19 could depend on 10. Only after you’re sure that they’re independent can you do that. So… what do we do now?