Playground Programming

Every now and then, I have a small idea about a development environment feature I’d like to see. At that point, I usually say to myself, “make a prototype to see if I like it and/or show the world by example”, and start putting pressure on myself to make it. But of course, there are so many ideas and so little time, and at the end of the day, instead of producing a prototype, I manage just to produce some guilt.

This time, I’m just going to share my idea without any self-obligation to make it.

I’m working on Chrome’s build system at Google. We are switching the build scripts to a new system which uses an ingenious testing system that I’ve never seen before (though it seems like the kind of thing that would be well-known). For each build script, we have a few test inputs to run it on. The tests run all of our scripts on all of their test inputs, but rather than running the commands, they simply record the commands that would have been run into “test expectation” files, which we then check into source control.

Checking in these auto-generated files is the ingenious part. Now, when we want to change or refactor anything about the system, we simply make the change, regenerate the expectations, and do a git diff. This will tell us what the effects of our change are. If it’s supposed to be a refactor, then there should be no expectation diffs. If it’s supposed to change something, we can look through the diffs and make sure that it changed exactly what it was supposed to. These expectation files are a form of specification, except they live at the opposite end of the development chain.

This fits in nicely with a Haskell development flow that I often use. The way it usually goes: I write a function, get it to compile cleanly, then I go to ghci and try it on a few conspicuous inputs. Does it work with an empty list? What about an infinite list (and I trim the output if the output is also infinite to sanity check). I give it enough examples that I have a pretty high confidence that it’s correct. Then I move on, assuming it’s correct, and do the same with the next function.

I really enjoy this way of working. It’s “hands on”.

What if my environment recorded my playground session, so that whenever I changed a function, I could see its impact? It would mark the specific cases that changed, so that I could make sure I’m not regressing. It’s almost the same as unit tests, but with a friendlier workflow and less overhead (reading rather than writing). Maybe a little less intentional and therefore a little more error-prone, but it would be less error-prone than the regression testing strategy I currently use for my small projects (read: none).

It’s bothersome to me that this is hard to make. It seems like such a simple idea. Maybe it is easy and I’m just missing something.

About these ads

6 thoughts on “Playground Programming

  1. You might want to look at the IDE that is distributed with the Inform language. It records all of your interactive sessions… then every time you re-compile, it re-runs all of your past commands to get you back to where you were. I think it might have diffy things of some sort but I never looked to deeply into it.

  2. I like to write short tests in comments above the function and then evaluate them with doctest. It’s really nice to see a few simple examples alongside the type annotation, when you want to know what is a function supposed to do.

    So, another similar workflow could be to have an interpreter with your program running in the background and have your IDE call it to evaluate the test expression just after you write it. The IDE would insert the result into the comment and you would just check whether it’s right. It would be just like a recorded playground session, but you would have better control over which of the examples gets into the test-suite. And they would also serve as handy documentation.

  3. Sounds like these “expectation tests” are similar to mocks (usually mock objects in the context of OOP).

    In traditional testing, we throw some known inputs into a function then check the output (possibly using ‘stubs’ which contain only the structure needed for the test, eg. using “undefined” for unneeded record fields). This works well when passing around data.

    When we’re passing around functions (including objects, which are full of methods) we can instead use ‘mocks’ which essentially test that they have been used correctly. For example, we might specify that a particular function must be called twice, with arguments which obey some relation. In which case, the test will pass iff this ‘mock function’ is called exactly twice, with arguments that obey the relation.

    This can be quite natural in OOP, since objects (at least in Kay’s original formulations) are ‘mini-algebras’ with clear, first-order boundaries (higher-order code usually gets ‘flattened’ into distinct classes, which leads to OO stereotypes like “FooFactoryBuilderHelper”). This lets us easily mock their limited interface using reflection.

    Functions in FP are more powerful (more use of higher-order functions), and the scope of their algebras also tend to be larger (eg. modules, although type-classes would work well with mocks) so this would be more difficult in general, since we usually can’t use reflection to ‘look under a lambda’. Augmenting the execution context to trace which calls are made sounds like a cleaner implementation, treating the whole program as the algebra.

    Also, the part about checking traces into version control is a clever idea!

  4. I think checking in expected outputs is a fairly standard technique – I’ve seen it in multiple contexts before.

  5. I like this idea and I’ve wondered about something similar. It would be possible to use QuickCheck to generate random data to feed into a function, and the inputs and outputs could be stored. Later versions of the function could be run on the inputs to check that they produce the same outputs.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s