Here is a familiar context-free grammar for arithmetic expressions:
S ::= add add ::= mul | add + mul mul ::= term | mul * term term ::= number | ( S ) number ::= digit | digit number digit ::= 0 | 1 | ... | 9
I have a challenge for you: write a program (in a language of your choice) which enumerates all strings accepted by this grammar. That is, your program runs forever, but if I have a string this grammar accepts, then your program should output it in a finite amount of time.
(This really is a fun challenge, so I encourage interested readers to try it themselves)
This is a tricky problem. Indeed it is sufficiently tricky that I cannot think of a clean imperative solution to it (which I’m sure is also related to being very out-of-practice in imperative programming). I’m interested to see any such solutions that people came up with. (And I’ll try it myself in the meantime)
But I’m going to present a functional solution using a neat little monad called
Omega which I just uploaded to Hackage.
Let’s step back and consider a simpler motivating example. Consider the following list comprehension (and recall that list comprehensions are just the list monad behind a little bit of sugar):
pairs = [ (x,y) | x <- [0..], y <- [0..] ]
This looks like it generates all pairs of naturals, but it does not. It generates the list
[(0,0), (0,1), (0,2), ...], so the first element of the pair will never get to 1. If Haskell allowed us to use ordinal numbers as indices then we could:
pairs !! ω == (1,0). :-)
Conceptually what we have is a lattice of pairs that we’re “flattening” poorly:
(0,0) (0,1) (0,2) (0,3) ... (1,0) (1,1) (1,2) (1,3) ... (2,0) (2,1) (2,2) (2,3) ... . . . .
We’re flattening it by taking the first row, then the second row, and so on. That’s what
concat does. But anybody who’s had a brief introduction to set theory knows that that’s not how you enumerate lattices! You take the positive diagonals:
(0,0), (1,0), (0,1), (2,0), (1,1), (0,2), .... That way you hit every element in a finite amount of time. This is the trick used to show that there are only countably many rational numbers.
Omega is the monad that comes from this concept. We define
join to be this “diagonal” function. What we get back is a monad with a nice property:
If x occurs at a finite position in xs and y occurs at a finite position in f x, then y occurs at a finite position in f =<< xs
It’s hard to know what that means without knowing what =<< is supposed to mean. It means if you have a (possibly infinite) list of items, and to each one you apply a function which generates another (possibly infinite) list, and you merge them all together, you’ll be able to reach every result in a finite time.
More intuitively, it means if you write a multiple branching recursive function where each branch recurses infinitely (generating values), you will not get stuck in the first branch, but rather generate values from all branches.
And that is exactly what we need to write the context-free enumerator. Given a simple data structure representing a context-free grammar:
data Symbol a = Terminal a | Nonterminal [[Symbol a]] -- a disjunction of juxtapositions
Then we can write an enumerate function in a straightforward depth-first way:
enumerate (Terminal a) = return [a] enumerate (Nonterminal alts) = do alt <- each alts -- for each alternative -- (each is the Omega constructor :: [a] -> Omega a) rep <- mapM enumerate alt -- enumerate each symbol in the sequence return $ concat rep -- and concatenate the results
Omega monad will do some heavy lifting and stagger those generators for us. Defining the grammar above:
arithGrammar = s where s = Nonterminal [[add]] add = Nonterminal [[mul], [add, Terminal '+', mul]] mul = Nonterminal [[term], [mul, Terminal '*', term]] term = Nonterminal [[number], [Terminal '(', s, Terminal ')']] digit = Nonterminal $ map (map Terminal . show) [0..9] number = Nonterminal [[digit], [digit, number]]
And then running our enumerator:
runOmega $ enumerate arithGrammar
We get a very encouraging list of results:
0 1 0+0 0*0 0+1 (0) 1+0 0*1 0+0*0 00 ...
Notice how each type of node is represented in that initial prefix. That’s a good sign that there won’t be degenerate repetitive behavior. But of course we know there won’t be by the guarantee
Omega makes (unless I implemented it wrong).