Awesome Efficient MMD

I’m currently implementing Class::Abstract, which will become my Perl 6 multimethod proposal. I pounded my head against a wall for a couple hours, and I came up with a really awesome multimethod dispatch algorithm. It isn’t Manhattan distance, of course, because I hate Manhattan distance. Actually, it’s more general: it’s an algorithm for quickly finding all possible candidates, without using any algorithm to disambiguate which ones. It is Θ(a m t) (where a is arity, m is number of methods, and t is number of types) in space, which beats the hell out of Parrot’s current Θ(t a), especially since t is usually much larger than the other parameters. It’s also fast-O(a+m) in time, where fast-O is technically O(a+m), but the bounding constant is quite small relative to most asymptotic measures. That is, it can be implemented in a really tight, CPU register-only loop, on top of the fact that a and m are generally small.

On to the algorithm. There’s some massaging you have to do at compile time first. First organize your types into states. In traditional OO this isn’t very hard, in Class::Abstract it’s a bit more difficult, but nothing past elementary graph theory. Anyway, any state an object could be in gets an identifier. I haven’t given it a lot of thought in the traditional model, but I think that just means that every concrete class gets an identifier. So if you have B and C derived from A, but A is an abstract class (has unimplemented methods), then you have two identifiers: B and C.

Let’s take this hierarchy: B and C derived from A, D derived from C, all concrete. We’ll just call the states by the same name as the classes. Note, though, that when an object is in state A, that means that it is exactly A, not just a derivative of A. Let’s consider these methods:

    foo(A, A)
    foo(A, C)
    foo(B, A)

Now what you have to do is replace each of your type names with the set of states that they represent. For instance, if you’re a D then you’re also a C, so C is written CD. Correspondingly, A is written ABCD. So our candidate methods turn into these in terms of states (I’ve also assigned each one a number):

    0 foo(ABCD, ABCD)
    1 foo(ABCD,   CD)
    2 foo( B  , ABCD)

Now we turn it on its side and make a m-element map for each parameter, indexed by state. The element with index s is a bit vector that represents in which variants s appears in the current parameter. Best explained by example:

    Parameter 1     Parameter 2
    -----------     -----------
    A  011          A  101
    B  111          B  101
    C  011          C  111
    D  011          D  111

1A has the vector 011 because A appears in the first parameter in variants 0 and 1 (bit positions 0 and 1) and not in variant 2. See how that works?

Now we’re finished with the massaging. It comes time to dispatch the method. Look up the bit vector for each state and parameter accordingly (for instance, if given C, B, you’d pick 1C and 2B), and then bitand them together. The remaining 1s correspond to your candidate methods. Let’s try, oh, C and B.

    1C => 011
    2B => 101
          001 => variant 0

Let’s try another one. Given the parameters B and A:

    1B => 111
    2A => 101
          101 => variant 0 or 2

That last one is ambiguous. That’s when you’d sick your disambiguator on it to find out that 2 is better. For Manhattan distance algorithms that’ll cost you a single Θ(t2) table for the whole program. In Class::Abstract’s mathematically sane algorithm, the methods themselves are partially ordered, which I’ve shown takes Θ(m) space and O(1) time: two simple integer comparisons.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s