Programming for a culture approaching singularity

In 2001, Ray Kurzweil published his inspiring, controversial essay The Law of Accelerating Returns. I acknowledge that the essay sometimes seems far-fetched, and perhaps some details are overly (or underly) idealistic, but in essence I find it very convincing. I will take it as an assumption for this article, and I also assume my readers are familiar with the gist of the essay.

I read Kurzweil’s essay about six weeks ago, and it has set off a chain reaction in my consciousness, causing me to reconsider my perceptions of technology. At first, it seemed to murder my denotational idealism, destroy my hopes of Dana (my purely functional operating system). It filled me with an excited fear, as if I were being carried down the rapids of a great river, trying not to hit the next rock.

Now the ideas have settled and my idealism is born again with a new perspective. The spirit of Dana still has a place, and I will describe what I think that is.

We are at a unique inflection point in the process of technological acceleration. In the last 50-100 years, orders of magnitude of progress began to occur in a single human lifetime. Technology is folding upon itself, each new invention making use of more and more recent knowledge and tools.

Dana was originally envisaged to facilitate this growth by observing that the vast majority of software out there is not compositional, thus we are re-inventing many wheels. Dana’s goal was to provide a framework in which everything must be compositional. But my early execution of the idea was misguided. It may have had a noble goal, but I was pretending that I was at the beginning of an acceleration of software, not the middle. Dana’s implementation did not make use of modern technology, and thus its development was an order of magnitude or two slower than the rate of modern software’s development. Further, its despotic purism would never catch on: it was trying to change the direction of an avalanche by throwing a snowball.

A truly modern compositional environment must embrace the flow of knowledge. Those who try to reverse the paradigm will end up reinventing all the technology that the mainstream is already using, and by the time they are finished the mainstream will be lightyears ahead. The state of the art may be ugly, but it exists, which is more than the purists can say. One person — one team — cannot develop software anymore; to do something new, we must employ the labor of the entire world.

Currently, the most constraining property of software is that for it to be reusable, it must be designed for reuse. It is hard to reuse libraries in a way for which they were not intended. And in fact most of the coding that goes on is not in libraries at all — people aren’t intending their code to be reused by anyone. They just want to get their product out the door and get some cash flowing in. Below is a reified mental picture of this.

The state of progress in reusable software

The bulb on the bottom is the software designed for reuse. It grows slowly and uniformly, mostly occurring in the open-source world, created by people who like to write good software. The rays at the top are individual, result-oriented projects. This code is not typically designed to be reused. These projects grow quickly in a single direction, but if someone else wants to go in a similar direction, they have to start at the bottom. This is an economic phenomenon, and the economic goals of developers are not likely to change.

To accelerate progress, we must find a way to re-use those chutes, the goal being to transform the picture into something more like this:

In this picture, each results-oriented chute carries with it a net of all the support code that was written for it, upon which anyone can build. Of course only with the consent of the creator (people will find a way to protect their information), but if it is useful, then people will buy it.

I have thus become a supporter of Donald Knuth’s mantra:

I also must confess to a strong bias against the fashion for reusable code. To me, “re-editable code” is much, much better than an untouchable black box or toolkit.

Forking a project in order to use one of its features in another project is no simple task. Depending on the complexity of the feature, you might gain some time by doing things this way, but many competent software engineers prefer to simply rewrite from scratch. One reason is in order to understand the code, but another is simply economics. Saying that I rewrite features because it is fun is missing half of the equation. I rewrite features because it is more fun then spending a week fighting integration issues (very much not fun). I claim that to support this new picture, our primary goal should be to vastly reduce integration issues.

Here is where my purist ideals kick back in, having found a new purpose. Pure functions have no integration issues, at least semantically speaking (performance is always a bit more subtle). A pure function makes explicit all of its inputs and outputs, so to use it, you can just rip it out of its context, provide the inputs, and transform the outputs. While depending on the situation that may be a lot of work, compare it to finding all the places a singleton was modified in order to recreate its state at some point in the execution of the program. What if the singleton returned a reference to an object that was stored away and modified by an obscure corner of the program, upon which your feature later depended. This sounds like a terribly inflexible design, but most software is terribly designed, and we still want to reuse it.

However, just saying “use pure functions, voila!” is no solution. The trend of the world is gradually shifting toward more functional idioms, but asking everyday programmers (who we care about, because they are doing our work for us) to switch to purity is asking a lot. Retraining brains is too hard.

So that is my open research question. How do we introduce the immense advantage of purity from this perspective into programming pop culture? Perhaps it is a new virtual machine, that does dynamic dataflow analysis. Perhaps it is a static analysis tool. Perhaps it is a new language, which can interface with many popular existing languages. This would be slow to take off because it requires migration (thus risking never), but because of interop and the good reuse properties (with tool support) it might be faster to achieve critical mass. Clojure and Scala are doing kindof okay.

Once you have a way to track assumptions, the rest follows easily. It would be possible to assemble a database of every function ever written, with the ability to just drag and drop it into your program (ehem, Haskellers, why don’t we have this?). The natural version control system, another step in the direction of the DVCSes, tracks changes between individual objects, and how those changes induce changes in the objects that reference them (this is the part I have thought through the most, but I will forgo describing it for it is a small detail and this post is already very long).

A singularity-aware culture should keep its eye on a way to bring the next inevitable order of magnitude of progress. I believe the elimination of integration issues is one such way, and Haskell, for the most part, has already eliminated them. The first step is to bring this property to the popular languages in any way we can (this includes making a Haskell a popular language… LOL).

Did you enjoy reading this article? Let me know! :-) Flattr this

24 thoughts on “Programming for a culture approaching singularity

  1. Pure functions have no integration issues? I don’t agree, except for the simplest ones. Functions work on data types, which are usually non-trivial. And data types don’t necessarily integrate well. Suppose I have two libraries, written independently, that have very similar but not quite identically structured data types. Then integration requires writing lots of conversion functions – no fun. It’s worse with statically typed languages, where even different names for structurally identical data creates a need for conversion functions. I have an example in this blog post: http://khinsen.wordpress.com/2009/05/12/static-typing-and-code-clutter/.

    Python has been a big success as an integration language because it defines a set of data access protocols (iteration, arithmetic, string conversion, …) that cover most of the data interfacing needs of many applications. They are almost universally respected because the protocols are part of the language definition and easy to implement. And dynamic typing makes this easier again. Clojure goes in a similar direction with its protocols (new in the 1.2 release), which offer even better integration support and require not much more effort.

    For statically typed languages such as Haskell, it would perhaps help to have an interface definition language for translating data types, which would be used by the compiler to generate and insert the conversion functions automatically. But I don’t expect this to be a trivial job.

  2. @Konrad, you have a good point. I didn’t reveal my whole master plan here, but Haskell actually fails at being referentially transparent at the type level — exactly the “different names for structurally identical types” problem you mention. This hinders metaprogramming techniques that would reduce the pain of some conversion functions.

    Of course, conversion functions are necessary no matter what language you pick. You make a(nother) good point about Python. Standardization continues to have its advantages whether you are in a pure or impure setting.

    However, my point about purity is really about orders of magnitude. Let’s say you have a decently-sized commercial product, say 5M lines of code. In imperative code, the number of lines that could affect some output is 5M. In functional code, you can (always) trace exactly what can affect the output, and it will (usually) be approximately the path from the main program to your function, about log(5M). This is obviously a hideously rough approximation, and may fall down in the face of very poorly designed pure code (with which I have no experience).

    So I suppose I overgeneralized when I said “no integration issues”. Being more precise amounts to repeating the definition of purity: “same input -> same output” hereditarily; i.e. you pull pieces out of their context and they will still work, in contrast to imperative languages. I’ll revise my thesis to “when working with pure languages, integration is tractable and predictable”.

  3. @Luke: I actually agree that pure functions are a tremendous help in isolating pieces of code and making them reusable. As you say, integration becomes tractable and predictable. But it’s still not easy, so programmers will continue to reinvent wheels because that’s the method of least (immediate) effort. Until someone seriously tackles the data type issue.

    It’s surprising for me that the data type integration problem is well-known to most practicing programmers (from personal painful experience) and yet seems to be completely absent from the research on programming languages. Could it be that academics care little about code reuse? Of course I’d like to be proven wrong about this.

  4. Kurzweil’s idea is not new; exponential inventiveness was taught to me in middle school. I agree that reusable code is wonderful. We know this because 1) Perl is extremely popular, 2) Perl sucks, 3) CPAN seems to make up for Perl sucking.

    However, convergence is a trend, not a law. Windows and Unix still use different line endings. C integers are not guaranteed to be 32 bits wide in x86. Windows uses Control, Mac uses Command, Java uses Alt. Mac programs quit with Command + Q, Linux programs quit with Control + Q, Windows programs quit with Alt + F4. The U.S. refuses to use metric units, and every single phone uses a different freaking charger.

    Those specific differences will resolve themselves when people converting to public standards seems a profitable endeavor. I do not defend this kind of divergence, however, I do believe that experimentation will always fly in the face of convergence.

    Example:

    Mozilla Firefox has brothers, including Wyzo, Classilla, SeaMonkey, Flock, Camino, Iceweasel, Swiftweasel, Swiftfox, XeroBank, Skyfire, Epic, and more.

    Look at any computer timeline. PC hardware, Unix, and C each diverge into thousands of variants. Each variant has a goal different enough from its ancestor’s goal that it exists, and prospers. Sometimes projects converge, but more often, developers tweak an existing product and prefer to call it libfredc than attempt to merge with libstdc.

  5. > How do we introduce the immense advantage of purity from this perspective into programming pop culture?

    I have made 2 pieces of attempts to convince the average programmer[1] that purity might be a good idea. why: http://www.loup-vaillant.fr/articles/assignment and how: http://www.loup-vaillant.fr/tutorials/avoid-assignment

    Nothing new of course, but the goal there is persuasiveness, not originality.

    [1]: The programmer unaware of ML or Haskell, or Lisp.

  6. For problems in software development the first way to solve it is with a technical solution. However, to change a culture it’s the peoples mindset that need to be changed. In this case software developers. The mindset of building stuff from scratch has to switch to a shared codebase that developers actually use and maintain.

    If you can change this paradigm, you’ll have sharing and collaborating developers re-using eachothers code.

  7. This post reminds me of Chris Barker’s Direct Compositionality on Demand. Wouldn’t it be nice if anyone can draw any closed curve around any code (delineating a region) and get it as a standalone piece that can enter composition elsewhere (automatically abstracting over the dependencies that cross the curve)? The regions don’t have to contain each other hierarchically.

  8. From that article: “You will get $40 trillion just by reading this essay and understanding what it says.” Are you seriously considering that article and that it may give a sane ideas for computer science? ;)

  9. > Perhaps it is a new language, which can interface with many popular existing languages.

    That is, a vision at least, for Perl 6. Changing the ecology of programming language evolution and code reuse. Perl 6 is basically a next-generation Common Lisp, with the reader strengthened to handle the grammars of arbitrary languages, and a flexible runtime kitchen-sinked to run them. So FORTRAN, C, Java, Haskell, et al, become interoperable DSLs. Allowing exploration of “when you are not crippling yourself up front, what are the *next* set of bottlenecks you hit in software engineering?”

    But CL was a human generation ago. And investment in language development, by government, industry, and FOSS developers, remains tiny and uneven. So even if language power someday goes exponential, as at least seems plausible, we may not live to see it. But the odds of tunnelling through the barrier increase with time, as high-investment backends become increasingly flexible, ideas spread out, and societal resources gather.

  10. How come nobody yet mentioned REST web services?

    In the spirit of using what is available now, I think REST provides the best option for taking advantage of other people’s work. No need for developers to release source code if they don’t want to, as would be needed for source-translation approaches; no dev environment or build-time integration issues, as could be the case for virtual machines; and no language constraints, as in OSGi and other component frameworks.

    Granted, support – as in API vrsioning, uptime, etc. – is a problem. But what if REST service providers could be deployed locally, turning individual computers into small, private “machine Internets”? For many years now infrastructure software such as web and application servers have been doing something similar, embedding web servers through which they provide browser access to web-like consoles.

    The nice part is everyone could join in quickly: as interprocess protocols go, REST is quite simple, and every language from C to BF already has an embeddable web server library these days. Also, as mainstream development has been steadily moving towards a browser-centered approach, it seems only natural that local applications also start using web-like interfaces, for communicating among each other as well as with users.

  11. If you take away the GUI it is possible to increase the amount of reuse to an extremely high-level. Right now every piece of non-GUI code that I develop can be used in any application (assuming the functionality is relevant). The solution is to focus at a much higher-level of integration than language features and algorithms, which is the big mistake in most approaches IMHO. My approach does not rely on a single language feature and in theory could be implemented in just about any language (my choices at the moment are Delphi and C#).

    Without spending hours describing this the necessary features are:
    *) Decomposition into coarse-granular functional components
    *) All communication between components via structured messages
    *) A message structure that allows any type of information to be packaged
    *) A “forgving” message processing design that allows a component to ignore messages it does not need to process and includes deterministic default behaviour for reading message information that is not contained in the message
    *) A method of “glueing” the components together via message consumers and handlers that is not part of the code (I use configuration files)

    The result is a suite of components that operate in their own little “bubble” by procssing inputs and generating outputs, without reference to where the inputs came from, or where the outputs should go. These components can run in their own independent thread or process, and can be connected in arbitrary ways via system configuration. Each component can be built, tested, and deployed independently.

    I could go on for hours, but I would just like to say that it really works, and is unlike just about any other framework around. It has taken me 3 years of development time (spread over the last 11 years), but in reality knowing what I know now I could build the framework a fair bit quicker. If you are interested just search for the CSI application development framework.

  12. I had trouble understanding how your thinking was influenced by Kurzweil. What ideas came from thinking about The Singularity? How did thinking about the Singularity lead to these thoughts. Seems you skipped three weeks of thinking… :-)

    When I think of programming and the Singularity, I project further out than what I’ll be doing next week. I think about how programming might look in 30-40 years… and that’s very interesting stuff to ponder for sure.

  13. > Once you have a way to track assumptions, the rest follows easily.

    Agreed, I really think this is the key issue. But in order to track assumptions related to modern computing, you need some kind of vocabulary for these kinds of assumptions. Where’s our standard ontology for things like files, directories, character encodings, ELF executables, blocked processes, menu shortcuts, etc.?

    I’m skeptical that much progress can be made toward breaking down application silos until we can at least write the equivalent of autoconf in terms of a widely shared (or at least fairly task-neutral) ontology that captures a good fraction of our high-level, semi-intuitive computing concepts. I’d be delighted to be proven wrong about this.

  14. Pure functions help, but documentation is the key. You can reuse the worst written code if it does exactly what you need done, but can’t reuse the best written code if you don’t know what it does. Function purity doesn’t help you know what a function does nor how it does it, only that whatever that is, there’s no unseen side effects.

    I find myself deciding to reimplement purely with a cost analysis view. When the cost of integrating with a library and learning what exactly it does outweighs reimplementing just the functionality I need out of it, I implement just the functionality I need myself. When the analysis is the opposite, I use the library.

  15. I think the language is not the problem. Stop thinking about languages! What a waste of time… I think you guys should focus on a new type of compiler of sorts… Something that can take something from any language and transfer it to some common generic like language. (may be some sort of general byte code like language which could be compilable by most other languages.) This will allow to collect all the code from the net and pack it into a huge database of functions.

  16. @pepe: We actually do have such a language, of sorts. Compilers and interpreters all have to eventually produce machine code, and we glue different platforms together with socket IPCs and what-not. But it’s a horrible kludge, and marshalling is difficult to work with. (Especially in impure software.)

    The most difficult problem, I think, is establishing isomorphisms between the various data models. We don’t even have languages that can take a list of isomorphisms and connect functions together to be useful somehow. (At least, none that I’m aware of.) This problem becomes much more difficult when trying to map a string in Perl into a string in Haskell, for instance. Languages like Java exacerbate it because they’re data-focused; every nontrivial class is another data type to which you have to assign semantic meaning.

    Programming platforms are messy enough that design is frequently compromised by integration issues. Identifying the semantics of already-written code is a significant challenge for humans. The fact that compilers don’t do the integration-glue step just makes things worse. (Otherwise, we’d write the semantically significant code, push the “autogenerate the rest” button for integration, and be done with it — semantically interesting stuff is clearly separated.)

    Finally, the programs we write these days often aren’t algorithmically interesting. The top-10 apps out there are huge pieces of glue that tie services together. There’s a clever algorithm behind there somewhere, but it’s buried so deep inside a library that you’ll never need to know about it. It’s the reason that languages like Perl and Ruby make sense; performance takes a second to ease of integration. It’s also why agile makes sense, and why you’re more employable if you know PHP than if you know the lambda calculus. Inside each app is a semantically interesting composition, but the question is how you tease it out from the noise of integration. I don’t see a solution to that problem anytime soon.

  17. That said, an automatic isomorphism/homomorphism composer would probably remove 90% of the glue code out there. Though knowing software, you’d have type system integration issues most likely :)

  18. I liked very much your second last paragraph:
    “Once you have a way to track assumptions, the rest follows easily. It would be possible to assemble a database of every function ever written, with the ability to just drag and drop it into your program (ehem, Haskellers, why don’t we have this?). The natural version control system, another step in the direction of the DVCSes, tracks changes between individual objects, and how those changes induce changes in the objects that reference them (this is the part I have thought through the most, but I will forgo describing it for it is a small detail and this post is already very long).”

    My question is how do you “induce changes in the objects that reference them”?

  19. I love the idea, Luke.

    Related:

    1. Object capability languages are weaker than purity, but have very few ‘integration issues’ because they can only interact with objects obtained through constructor or through earlier interactions interactions. There is no singleton, no global FFI, no global state. In addition to being good for reuse, this is also a powerful basis for security and testability (unit and integration testing).

    2. Gilad Bracha has recently done a lot of work on modularity for his language NewSpeak. He makes a convincing argument on his blog that ‘import’ declarations should be banned in order to significantly improve independent reusability of libraries. The use of imports can be replaced by arguments, though the language would need to be designed with this in mind. See the explanation here: http://gbracha.blogspot.com/2009/06/ban-on-imports.html

    3. Another tried and true way to improve reuse is to make the application components more loosely coupled and pluggable. This has been achieved for publish-subscribe systems, tuple spaces, various multi-agent systems, and so on. Pure code doesn’t really lend itself to the act of ‘plugging in’, but imperative code has a lot of nasty properties. I believe there is a middle path – i.e. supporting highly restricted side-effects with properties like monotonicity, idempotence, commutativity. Idempotence and commutativity together give 80% the advantages of purity.

  20. Emlyn O’Regan has begun developing, and has running alpha code for, a system in Python designed for software editing and reuse at the individual function level, called Social Code. It’s very much a pragmatic approach, versus your idealistic approach, but I think your ideas about how to avoid integration issues would be helpful.

    With Social Code, programmers would see the advantage of purity in terms of how easy pieces are to reuse without breakage as they are updated. Unit testing is used for enforcing behavior conformance, rather than anything more academic.

    You can read about it and get a link to the actual running implementation at:

    http://appenginedevelopment.blogspot.com/2011/12/its-so-like-social.html

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s