This StackOverflow question has generated a buzz of zealous reactions in the Haskell community. Here are the important bits of the question:
I often find this pattern in Haskell code:options :: MVar OptionRecord options = unsafePerformIO $ newEmptyMVar doSomething :: Foo -> Bar doSomething = unsafePerformIO $ do opt <- readMVar options doSomething' where ...
Basically, one has a record of options or something similar, that is initially set at the programs beginning. As the programmer is lazy, he don’t wants to carry the options record all over the program. … Now each part of the program has to use unsafePerformIO again, just to extract the options.
In my opinion, such a variable is considered pragmatically pure (don’t beat me). …
In this post I will give my own zealous reaction.
To ask a question like this assumes something about the nature of software. The assumption is hiding in these phrases: all over the program, each part of the program. Here, the poster assumes that a program is a large, monolithic beast such that every part of it will need access to this variable, and yet the definition of this variable is not known to the programmer. That means that the program depends on this value. If we purely model the structure of the above example program, we see that every function depends on OptionRecord. So we have (taking the context of a compiler):
parse :: OptionRecord -> String -> AST compile :: OptionRecord -> AST -> LinkerObject simplify :: OptionRecord -> AST -> AST freeVars :: OptionRecord -> AST -> Set Variable safeName :: OptionRecord -> Set Variable -> Variable -> Variable
These are perhaps not the cleanest signatures for parse, compile, and simplify, but they are conceivable in the real world. There is some junk — surely not all three of those functions depend on every option of OptionRecord. It would be cleaner to declare that they depend on exactly the things they actually depend on.
But the problem becomes much more unsettling at freeVars. freeVars takes an OptionRecord — staying true to the original problem description, it must, because it or a function it calls may end up depending options. But what on earth could a global OptionRecord determine about a free variables function? Perhaps there are multiple ways to find free variables — do we count type variables, what scoping mechanism to use — but those are not global options. Different functions will require different behaviors out of that function depending on what they are doing.
We even get such pathologies as shortestPath :: OptionRecord -> Graph -> Node -> Node -> [Node] — a plain, simple, reusable graph algorithm which somehow depends on this global options record. We have no way of telling the compiler — or, more importantly, ourselves — that this algorithm really has nothing to do with the specific compiler we are implementing. Somewhere deep in shortestPath‘s call chain, there is a call to some function which calls an error function which depends on one of the options. Suddenly this beautiful, well-defined function is not reusable. To take it out and use it in another project means to include OptionsRecord in that project, and OptionsRecord has things about compiler and type system extensions, configuration files, who-knows-what, but certainly having nothing to do with graphs. Sure, we can go and dig out the OptionsRecord, replace it with a record that is more suited to the program we are reusing the code in. But you have to go read, understand, mutate code that you just want to work please so you can get on with your project. We have all suffered the head-throbbing pain of integration problems. This is their source.
When I think of software as thousands of lines of specification for something, my mind jumps to problems like the original question. How am I going to write something so huge purely without it being really inconvenient? I see the need for global options, often global state, things ending with Manager (often a global trying to convince you it is a good abstraction), big systems talking about big ideas which are only applicable to my project.
But I have begun to think about software another way. Consider 100 lines. That is potentially a lot of information. The only reason 100 lines is not very much in the code world is because we lack the vocabulary to say what we mean. We are caught up in the details of manipulating lists of identifiers, building sorting trees, defining what we mean by “first” in this or that context. Could you describe your project in 100 lines of English? Perhaps not, but you could get a lot closer than with 100 lines of code.
I’m beginning to think that my latest greatest software project should be as small as possible. I need to build up vocabulary so I can describe it in a small space, but that vocabulary is not a part of my project. That vocabulary belongs to everyone in the same-ish domain of software. And nobody else cares about the meaning of OptionsRecord.
When I think of software this way, the idea that I need to pass around an OptionsRecord as a parameter to every function in my project starts to make sense. Every part of my project depends on its options, but my project is just a little thing that is standing on the shoulders of the giants of vocabulary used to get it there. I don’t mind passing options around between a few functions after I load them.
This is an ideal. I hope to move software closer to this ideal with my CodeCatalog project. Your current project probably cannot be phrased in a few hundred lines right now. But think about what it would look like if you structured it as a small main program + a lot of supporting vocabulary. Think of the supporting vocabulary as something any project could use. What does that mean for the modularity, reusability, and adaptability of your code? What does it mean for the clarity of your specification? What does it mean about the concept of “architecture”?