Monthly Archives: September 2020

Propositional Documentation

It’s arguable that the main purpose of a compiler is to give error messages, and producing executable code is only its secondary function. In my day-to-day work, the number of times I compile knowing there will be errors (in order to find them) far outnumbers the number of times I compile to produce a running program.

One of the most useful and, when it’s fast enough, enjoyable workflows when working with a compiler is that of adding or removing a parameter to a function. This function needs extra context about X, so we add it as a parameter, and then go into a quick compile cycle in which the compiler immediately tells us all the sites where X needs to be synthesized. If we can’t synthesize it from a given site, we add it as a parameter and keep going. Once all the errors are gone, the change is usually correct.

In principle many types of contract modification follow this same pattern: A function now requires that one of its parameters never be null; so we check its use sites, and if the non-nullness can’t be inferred at that site, we check sites transitively until we have inferred the result. But the compiler offers us no help here, so without some external tracking tool it is hard to tell when you have actually checked all sites. Furthermore, in another branch somebody may have added a new use of the function, and (especially in a distributed setting) we cannot check all branches, and there is no way to guarantee that the error path is not reintroduced when one of these branches is merged.

One could argue that you should just use a better language that doesn’t allow null pointers (in fact the frequency of the flow I mentioned above is a major boon to languages that make nullness explicit). But there are always assumptions that outthink the type system of whatever (mainstream-ish) language you choose.

  • A number or vector is nonzero
  • An integer is within a certain range
  • An object has already been initialized
  • Two parameters are related in some particular way, such as being two representations of the same conceptual object
  • A resource handle is in a particular state

In order to make effective use of compiler-assisted sanity-checks, we do not need to launch ourselves into dependently-typed la-la land where our program is guaranteed always to terminate and never to produce any errors. (In fact sometimes it is nicer to have programs produce runtime errors than have a perfect compiler, so you can get concrete information from the debugger about the circumstances of the error.) It would get us 80% of the way there, and perhaps without the headache of proving that many obviously impossible things really are impossible, if we could make use of this “transitive” structure of assumptions in a more general context than traditional type systems provide.

So that brings me to a half-idea that has been bouncing around in my head for a number of years, which I call propositional documentation. The essential idea is to formalize many of the types of assumptions and guarantees that we already see in documentation, such that they are visible in an abstract form to the compiler. Yes in a “proofy” way, but not necessarily in a “hereditarily verified” way––only in a way such that it would be enough for a developer to insert a snippet indicating that they have thought about it and verified a certain assumption, or otherwise propagated the requirement outward to their users.

Along with the type signature of your functions, you include assumptions and guarantees (or preconditions and postconditions, if you are a temporal thinker). The assumptions are given to you to work with, and the guarantees need to be synthesized before the function returns, or else it is a compile-time error. Guarantees can be synthesized in a loose variety of ways, again playing more of a role of “checked documentation” than airtight verification.

  • By a runtime assertion connected to the property.
  • By a runtime randomized property-check if the property is too sophisticated for an assertion.
  • By a proper proof term in some proof language for manipulating these propositions.
  • By a “trust me” proof term, annotated by some checked assumptions, with the implication of connecting them using informal reasoning.

And, like documentation, it is up to the developer and her team to balance the formality and security of these syntheses against the effort they demand.

I am picturing a semi-verbose and flexible syntax which allows propositions to also look something like documentation. Without editor support this could be irritating, but with the support of a good IDE I think it could be quite natural. For example (I apologize, my dayjob is in C++ so that’s my native tongue at the moment):

// Introduce the property [ _ is non-null ], and indicate that
// it can be synthesized by an assertion if necessary.
property [ 'foo' is non-null ]
   assertion (foo != nullptr);

property [ 'foo' is initialized ]
   requires [ 'foo' is non-null ]
   assertion (foo->isInitialized);

//assumes [ 'foo' is non-null ]
//guarantees [ 'foo' is initialized ] 
void initialize (Foo* foo)
{
    foo->counter = 0;
    foo->isInitialized = true;
    prop_assert [ 'foo' is initialized ];
}

//assumes [ 'foo' is initialized ]
void increment (Foo* foo)
{
    ++foo->counter;
}

int main()
{
    Foo foo;

    increment (&foo); 
      // error, could not prove [ '&foo' is initialized ]
    
    // -----

    prop_assert [ '&foo' is initialized ];
      // error, could not prove [ '&foo' is non-null ]

    // -----

    prop_admit [ '&foo' is non-null ], "It's a stack variable, dude!"; 
    prop_assert [ '&foo' is initialized ];
      // assertion inserted, runtime error

    // ----

    prop_admit [ '&foo' is non-null ], "It's a stack variable, dude!";
    initialize (&foo);  // Produces [ '&foo' is initialized ]
    increment (&foo);  // OK
} 

The advantages are clear: not only does it provide more information to the compiler to help coordinate developers and make changes, it also explicates the programmer’s reasoning process. The challenge in such a system is how to make it detailed enough to be useful without so detailed that it is cumbersome. Nonetheless I think a system like this would be useful even at fairly basic levels of sophistication, only needing to integrate with the host language’s name resolution and parameter passing semantics (though for C++ even this would be a major effort…).

I would be curious to see what challenges arise as this vague half-idea meets the complex reality of a general purpose programming language.