Fine, make me a blog

It's a blog, okay?
Follow What I Follow | My Book Reviews | About Me | Tweets

Aug 9

Flexible Order of Elaboration

A core tenet of Literate Programming is flexible order of elaboration.

Let’s start with some examples to explain this core tenet.

Suppose you have a file helloworld.c that looks like this:

public function foo()
{
bar();
}
private function bar()
{
print("Hello World");
}

What happens if you want all function names alphabetically ordered in the file?  You could manually make that change like so:

private function bar()
{
print("Hello World");
}
public function foo()
{
bar();
}

However, we’ve not changed any of the text in the file — we’ve simply reordered it via sorting by function name.  We might want to reorder it in other clever ways, such as a multi-dimensional sort on name (ascending), visibility modifier (private first, then public).

Vertically reordering program parts like functions and object instance variables is the most basic form of refactoring.  Some software companies explicitly require a specific vertical reordering in their coding guidelines.  Not only can this refactoring be automated by a computer, but a computer can check for refactoring’s goal state satisfaction (different from solving how to automatically refactor).  Such satisfaction is one way to check that coding guidelines are met.

Flexible order of elaboration says that we should be able to do these vertical reorderings as we please, with the intended goal of making our programs more like english prose.  We should even be able to reorder things horizontally, and even transforming a vertical order into a horizontal order!  Food for thought: What text editor do you know of that allows you to do that?  More importantly, what programming language has a syntax parser that allows you to organize functions into columns, thus laying out code horizontally?

Using more horizontal screen real estate sounds heretic: we’re not sympathetic to those still using 80-columns-wide green screen terminals, and besides, we can always JustUseReallyLongVariableNamesLikeThisOne instead.  Avoiding being labeled a heretic, we’ll focus on what we can do today with vertically reordering source code.

Today we have a problem: if the helloworld.c file was under source control, then the version control system would see two different versions, even though one version is really just a flexibly ordered elaboration of the other.  State-of-the-art version control systems operate on a single, monolithic data structure: file contents.  This allows just about anything to be stored in the source code repository, including things other than source code (images, HTML documentation, etc.).  Not only that, but it also leverages the underlying file system to do the heavylifting.  Accessing stored items means accessing the file system directly.  The version control system treats every file as unstructured syntax, and the upshot of this decision is that it knows exactly how to store every file committed to the repository: as text differences.  Unstructured syntax allows for arbitary decisions in how to store syntactic differences, since there are no optimization boundaries.  A whole file X can be diff’ed against another whole file Y, and we only need to store X plus the differences between Y and X.

In the general case, reordering function names could mean something significant.  For example, in the C programming language, identifiers must be declared before first use.  Thus, in C, version 1 would not compile, but version 2 would compile.  (Ignore the modifiers public, private, and function; for now, assume they are C preprocessor identifiers that evaluate harmlessly.)

Modelling the general case prevents us from taking advantage of basic assumptions of how we think about storing code.  In the C programming model, version 1 to version 2 is a meaningful change, as it represents a bug fix.  Yesterday, the programmer saved the first version of his code (version 1) at the end of the day, before getting it to compile.  Today, he debugged his code and realized that C’s strict identifier declaration rules were preventing his code from compiling.  He saves version 2 as a bug fix, and when he commits the change to version control he writes a log message explaining what the problem was and how it was solved.  The commit log now has a message describing a meaningful change to the system.

May 01, 2009
helloworld.c modified 09:11 EST
Message: Bug fix.  Now compiles.  Apparently the C programming language requires identifiers to be declared prior to first use.  XXX: The code still needs to be tested.
April 30, 2009
helloworld.c created 04:53 EST
Message: This code does not compile, but is being committed to my working branch on the source code repository where it will be automatically backed up over night.  XXX: This bug needs to be fixed before adding new features.

Initialization

Section 6 Declarations, particularly sub-section 6.2 Initialization, of the Sun’s Code Conventions for the JavaTM Programming Language describes how to deal with initializing local variables (emphasis mine):

Try to initialize local variables where they’re declared. The only reason not to initialize a variable where it’s declared is if the initial value depends on some computation occurring first.

In my humble opinion, programming languages should allow us to always initialize a variable where it’s declared, even if the value depends on some computation occurring first.  Separating initialization from declaration doesn’t make sense in purely functional programming languages; it is administrative debris.  Guidelines like Sun’s don’t leverage the fact programmers are sitting in front of a computer.  Sub-section 6.3 Placement tells the programmer to do the heavylifting placing declarations.

Put declarations only at the beginning of blocks.  (A block is any code surrounded by curly braces “{” and “}”.) Don’t wait to declare variables until their first use; it can confuse the unwary programmer and hamper code portability within the scope.

First, Smart Bear Software has a free paperback book, Best Kept Secrets of Peer Code Review, discussing the impacts of this decision.  They discuss how people read code, in particular variable identifiers.

RockScroll totally changes the way people look at placement of identifiers.