My YOW! talk, “Designing Distributed Systems with TLA+”, is now available! You can watch it here.
The other day I was talking with a friend about structured editing and literate programming came up. LP was one of Donald Knuth’s ideas, to structure programs as readable documents instead of just machine docs. He was interested in it, I was cautiously skeptical. We both knew the famous story about it:
In 1986, Jon Bentley asked Knuth to demonstrate the concept of literate programming by writing a program in WEB. Knuth came up with an 8-pages long monolithic listing that was published together with a critique by Douglas McIlroy of Bell Labs. McIlroy praised intricacy of Knuth’s solution, his choice of a data structure (Frank M. Liang’s hash trie), but noted that more practical, much faster to implement, debug and modify solution of the problem takes only six lines of shell script by reusing standard Unix utilities. McIlroy concluded:[12]
Knuth has shown us here how to program intelligibly, but not wisely. I buy the discipline. I do not buy the result. He has fashioned a sort of industrial-strength Faberge egg—intricate, wonderfully worked, refined beyond all ordinary desires, a museum piece from the start.
The program was “print out the top K most-used words in a text.” The six lines of shell script were
tr -cs A-Za-z '\n' |
tr A-Z a-z |
sort |
uniq -c |
sort -rn |
sed ${1}q
A damning counter. But neither of us had ever read the paper. And as you know, I’m all about primary sources. We pulled up the paper here and read through it together. And it left us with a very different understanding of literate programming, and the challenge, than the famous story gave.
First of all, we found that Literate Programming as conceived by Knuth not just “text, code, text, code”. You could say “now we add foo
to section <<bar>>
” even after having already created bar
and modified it several times before. Writing the code out-of-order is the whole point. You’d use a tool to take the narrative, or “weave” and synthesize the code, the “tangle”. Very roughly, you could write:
Blah blah blah
<<foo>>
<<bar>>
Blah blah
<<bar>> = print("world
Blah
<<foo>> = print("hello ");
Blah
<<bar>> += !");
And it would weave that into the actual program.
The actual paper paints LP in a much better light than we thought it did.
[rant] McIlroy could only write such a tight script because we’re doing something that’s near perfect for shell scripting: simple processing of text. If we choose a slightly different problem, like “find the top K pairs of words and print the Levenshtein distance between each pair” then shell scripting that would be a lot harder.[/rant]
Not to say that McIlroy is entirely wrong, and he raises a lot of good points. But I don’t think this is the damning indictment of LP or a total vindication of the Unix model. It also got me to look into the Leo Editor, which seems a more modern version of LP. I’ll report on how it goes.
This was sent as part of an email newsletter; you can subscribe here. Common topics are software history, formal methods, the theory of software engineering, and silly research dives. Updates are usually 1x a week. I also have a website where I put my more polished and heavily-edited writing (the newsletter is more for off-the-cuff stuff).