Fine, make me a blog

It's a blog, okay?
Follow What I Follow | My Book Reviews | About Me | Tweets

May 18

My Re-thinking of Literate Programming

As long as I can remember, I wanted to be an author.  So the legend goes, when I was born, the delivery room (mom, dad, grandmas, grandpas, aunts, uncles) all came to the consensus that my full name would be “John Alexander Zabroski”.  My pop’s grandma said, “Wow, with a name like that, he is going to be an author.”  Nostalgia aside, I actually want to be much more than an author.

I want to fundamentally change the way people communicate and express ideas.

I want to be the alpha author, the farmer who plants the seeds.

Since I am a professional programmer, probably my biggest time sink is learning the answer to WHY something is the way it is.  I think a good code base should read like the Bible or a Michael Lewis book like Moneyball.  (Yes, I just put Moneyball and the Bible in the same sentence.)

In order to make this work, I need to break some existing mental models on how things are supposed to work.  People have poor habits when writing code that they simply do not notice.  Almost nobody in computer science has paid attention to this topic.  Donald Knuth is an exception, and he coined the term Literate Programming, and even wrote a book about it.  Knuth also came up with a model for how Literate Programming should be done, which I call The Knuth Model of Literate Programming.

Knuth’s model was largely unsuccessful.  In an interview with Software Development Times’ Andrew Binstock, Knuth reflects on its unfortunate lack of wide-scale adoption, and shares Jon Bentley’s theory of LP’s failure:

Literate programming is certainly the most important thing that came out of the TeX project. Not only has it enabled me to write and maintain programs faster and more reliably than ever before, and been one of my greatest sources of joy since the 1980s—it has actually been indispensable at times…

Jon Bentley probably hit the nail on the head when he once was asked why literate programming hasn’t taken the whole world by storm. He observed that a small percentage of the world’s population is good at programming, and a small percentage is good at writing; apparently I am asking everybody to be in both subsets…

Jon Bentley’s theory seems silly to me.  Most programmers are good writers, but most programmers are hired to simply squish out programs, somehow, someway, just so long as it’s Right Now.  Let me go on the record that nothing can save a system that is rushed.  The Bible was written not just over centuries, but millennium, and people argue thousands of years after its completion that it is inconsistent, bug-ridden hodge-podge.  The Bible is sort of like the Microsoft Office of shrinkwrapped literature, except nobody is crazy enough to reorganize the Gospels into a Ribbon and confuse everybody.

We have to rule out Bentley’s theory. Why?  Because I can’t make bad code, good code. Literate programming is a technique useful in making systems more portable, stable, and reliable by making them more human understandable.  If the code for a system is squished out so fast nobody knows what is going on, then no human being will be able to go back later and understand how it all hangs together.  I know of no better widely acknowledged example of human understandable code than Richard Hipp’s SQLite SQL DBMS.  It has to be ported to more platforms than any other SQL DBMS out there, and you probably interact with it every day, because Firefox now uses it to serialize most of its information.  It is portable not just because it is “Lite”, but also because its failure modes are meticulously documented.  (If you want to become a better programmer, especially in C and *NIX, then read Richard’s code.)

In addition, there are signs that lunchpale programmers do care about well-documented systems, as evidenced by Test-Driven Development and the proliferation of unit testing libraries.  More recently, Dan North has championed Behavior-Driven Development, which argues we should not only test code but the tests should be literate programs that describe some behavior in the application using a story-telling DSL.  I don’t know when it happened, but some time after Kent Beck wrote Test Driven Development in 2002, testing became popular, and it set the stage for a tipping point where we shifted from testing code to testing behaviors.  Here is a quote from Dan North’s initial blog entry Introducing BDD:

I had a problem. While using and teaching agile practices like test-driven development (TDD) on projects in different environments, I kept coming across the same confusion and misunderstandings. Programmers wanted to know where to start, what to test and what not to test, how much to test in one go, what to call their tests, and how to understand why a test fails.

The deeper I got into TDD, the more I felt that my own journey had been less of a wax-on, wax-off process of gradual mastery than a series of blind alleys. I remember thinking “If only someone had told me that!” far more often than I thought “Wow, a door has opened.” I decided it must be possible to present TDD in a way that gets straight to the good stuff and avoids all the pitfalls.

Suddenly, Bentley’s theory is beginning to become unraveled.  Why?  Because older developer’s have finally learned the pains of not documenting systems, not testing systems, and have now realized that testing and documenting systems at the same time kills two birds with one stone.  Thanks to guys like Dan North, this lesson is now passed on to a younger generation of programmers who come out of college knowing how to test software.  Some graduates can even test software in a literate style.  Yet, we have not quite reached the tipping point where the core code itself is written in a literate style — as an industry, we only practice literacy in our tests to bale our asses out of the fire.

Aside from Bentley’s theory, there is one other interesting comment from Knuth in the above quote.  Let me repeat it again in case you missed me emboldening it the first time:

Literate programming is certainly the most important thing that came out of the TeX project.

Literate Programming started with the TeX project!  Now, let’s play devil’s advocate… what if, What if, What If, WHAT IF… Knuth made something much better than TeX at the time?  What if Literate Programming were invented today? Would we still be using TeX?  I don’t think so.

TeX has a number of shortcomings that make it undesirable for documenting software systems.

  1. TeX is limited to horizontal and vertical line drawing, only.  Graphics packages for Literate Programming, such as Eitan Gurari’s DraTeX and AlDraTeX, use obscenely disgusting hacks in order to draw things such as circles. You simply can’t do the kinds of visualizations you can do in Mathematica, which powers Wolfram Alpha.  Instead, you have to reserve space for a figure, which you paste from a graphics program into your TeX document as a linked resource.  The paste is also merely a monolithic, blackbox snapshot rather than a continuously integrated openable object embedded in the source text.  Which brings us to the next problem…
  2. TeX is fundamentally a batch-mode system.  This is the killer problem.  The human mind can be trained to accept the fact that in order to get a table of contents or cross-references for your document, you have to hit compile twice.  However, it is important to realize that this doesn’t map to our brain’s notion of adding a section and expecting a new section to appear.  Our expectations increase iteratively and incrementally, but TeX meets those expectations in waterfalls.  TeX is a transformational system, when what we need is a reactive system that changes on-the-fly in response to what we do.

Also, TeX is a proven failure in large open source projects, where the TeX logical layer addition known as LaTeX was used to document Guido van Rossum’s Python project.  Requiring TeX knowledge of any documenter proved too high an entry barrier for Joe Blow off the street looking to chip-in a few man hours.  Again, even when people are capable writers, they find the learning curve of LaTeX intimidating.  So it would be more accurate to say that capable writers are not capable typesetters, editors, publishers, and so on and so forth.  The publishing industry has more roles than “just write the darn thing” in order to make sure things work smoothly, and basically unless you self-publish (rare), you are never all of these roles.  The Python community is highly ideological, believing there is always One True Way, and it is absolutely fascinating to me that the Python community has abandoned TeX’s you-must-know-and-do-everything publishing model as part of its One True Way.

In a nutshell, I think Literate Programming needs a redo.  We can’t repeat Donald Knuth’s mistakes.  We can’t use TeX, LaTeX, or any other fundamentally batch-mode technologies like SVG, PostScript, etc.

We need something that is interactive, or at least trivially has the potential to be interactive.  We need something with a short learning curve, something that can be easily bootstrapped into a WYSIWYG or WYSIWYM (What You See Is What You Mean) editor environment.  We also need something more expressive than the batch-mode Web, Tangle and Weave operations that comprise the Knuth Model of LP.

This blog post mainly concentrates on why TeX is a poor environment for Literate Programming.  In a future blog post I’ll discuss why the the LP tools built on top of TeX suffer from another set of difficulties that has discouraged wide-spread literate programming adoption, and I’ll talk about how the authors of these tools spend extraordinary amounts of energy trying to solve parsing and lexing problems created by programming languages that simply weren’t designed with literate programming in mind.

For now, you can head over to Lambda the Ultimate and check out my call for LP prior art: What is the best literate programming tool/environment/research work? The LtU community has given me plenty of quests to journey on, and in another blog post I’ll summarize my findings and provide a State of the Union for LP.


  1. z-bo posted this