Explain Yourself (but Don't Repeat Yourself)

Feb 11

The Making of Pod::Literate

Some time ago I encountered the concept of Literate Programming. I was intrigued by the idea but never really explored taking up the practice. I don't really remember why, though it probably had something to do with a lack of tuits and no obvious support for it in my main language, Perl. I also was not really at a point in my programming career where it would have occurred to me to implement tools for Literate Perl.

I was reacquainted with it more recently when I started to learn Haskell, which has native support for literate programs. Haskell is a language that has some intrinsic appeal for me that I can't quite put a finger on, but again between not having yet grasped the Haskell mindset and not having time for a new project I let that slip by too (I suspect the former will not occur until I have time for the latter).

More recently I've been on a DRY (Don't Repeat Yourself) kick in my day-to-day programming. Of course I'd always practiced DRY to some degree, but lately I've been getting more insistent with myself that I really keep on top of duplicate or near-duplicate code.

This probably came to a head when I read Steve Yegge's rant about code-base size. The code base for my main projectis large enough that I don't visit all it's layers, nooks and crannies on a regular basis. The main problem of code maintenance as many programmers will attest is memory loss. Our own, that is. Remembering why you wrote code they way you did when you haven't seen it in a week, much less months, is a major hurdle when returning to it. The two solutions I see to reduce the effects of memory loss are solid documentation and reducing the amount of code that needs remembering in the first place. Naturally, I chose to look at the latter.

Dispensing with Boilerplate

I imagine that any programmer who takes DRY seriously will at some point take a hard look at boilerplate code. More incantation than instruction, boilerplate code tends to have a very low signal to noise ratio; most of the code is instead there to make the compiler happy.

Being Perl, most of the boilerplate code was of my own making. So I hacked and refactored and experimented with shorter ways of expressing the same intent (another story for another time) and made some good headway towards weeding out the verbosity of my code, without sacrificing (and in some cases enhancing) clarity.

As all good things must be taken to extremes, eventually I started eyeing some of the standard Perl incantations and wondered about doing away with them.

First, library files (modules) in perl must have a final statement in them that evaluate to true in order to compile. I am not clear on where this legacy comes from, but the convention to make sure all is well with your module is to append a "1;" at the end of the file to keep the compiler happy. A whole whopping 2 characters, but it wasn't adding any meaning to my code, and I aimed to remove it.

Second, best practices recommend coding under &"tric"; mode, and at least during active development, under "warnings" mode as well. This makes the Perl interpreter the least tolerant possible of your errant ways, and I take these best practices to heart. In fact, the forthcoming 6th version of Perl makes "strict" the default state, so rather than pronouncing strict mode at the top of every file, you instead only declare when you intend to be naughty by explicitly shutting off strictures. I (and many others) are impatient for Perl6's arrival, and I decided I wanted this little bit of code reduction here and now.

A Preprocessor

These got me thinking about writing a preprocessor. I didn't want to give up strictures, and I needed the compiler to accept my modules, but I also did not feel it was doing me any meaningful good adding these incantations to each of my files. The ""1;" for the module was legacy nonsense and the strict and warnings pragmas were policies I'd rather set once across my whole project rather than for each file. So I figured I could add these in programatically.

At about this point I ran across the literate programming idea again. I figured that as long as I was already running my code through a preprocessor, I may as well have it convert my code from a literate style at the same time. I can't really recall what I ran across that brought it up, but by some route I found myself reading an older article by Mark-Jason Dominus, Pod is not Literate Programming about how POD lacks certain key features of a literate programming system. He pointed to noweb as a language agnostic tool for Literate Programming, but for some reason that did not appeal to me. I thought, well, this is Perl! There must be something on CPAN.

And indeed there were several attempts on CPAN at implementing Literate Perl, but none addressed one of MJD's core issues with Pod, which was that it doesn't let you rearrange your code from human-reading order to compiler-reading order.

The closest I found to what I wanted was Audrey Tang's Filter::LiterateComments. She chose a style very much in keeping with Haskell's notation, which is unsurprising given her background with the language. But being a source code filter it never had a chance to do code reordering, and on its own it would not be able to create typeset documentation.

Pretty Documentation

Now that I was committed (in both senses of the word) to writing a preprocessor, I wanted to one-up the typesetting question by prettifying the code as well as the explanation in the documentation. I knew there had been some work done with PPI to do syntax highlighting, and I had already decided to rely on Perltidy to enforce code formatting conventions. Further, I recognized that there were several perl operators written in ascii (such as -> and =>) that were really just stand-ins for untypeable symbols. I figured in the interest of legibility I ought to replace them with their intended entities in the documentation.

So the plan involved extracting each code block in the source code, running it through perltidy, then PPI::HTML, and then doing a search-and-replace on operators that were stand-ins for more legible symbols. That way something like this:

    sub dump_code {
        my ( $parser, $extension ) = @_;
        my $source;
            <<replace anchors with code snippets >>
            
            $source .= "use strict;\n"   if $parser->{_use_strict};
            $source .= "use warnings;\n" if $parser->{_use_warnings};
            
            $source .= join(
                '',

                map( { $parser-&gt;{_code}{$_} }
                    grep { defined $parser-&gt;{_code}{$_} }
                        @{ $parser-&gt;{_code_sections} } )
            );
                                            
            <<source code amendments based on file type>>
                                            
            return tidy( $source );

    }

into this:

sub dump_code {
    my ( $parser, $extension ) = @_;
    my $source;

    «replace code anchors with code snippets»

    $source ⋅= "use strict;\n"   if $parser→{_use_strict};
    $source ⋅= "use warnings;\n" if $parser→{_use_warnings};

    $source ⋅= join(
        '',

        map( { $parser→{_code}{$_} }
            grep { defined $parser→{_code}{$_} }
              @{ $parser→{_code_sections} } )
    );

    «source code amendments based on file type»

    return tidy($source);

}

Bootstrapping

As a tool for literate programming, naturally I wanted to be able to write the library itself as literate perl. The dilemma of course, is that the literate format can't be executed until it's translated into regular Perl. What I found was that I had to first develop the code in a non-literate style, and get to a point where I could process literate perl. At that point I could then run the "compiled" non-literate version on the new literate copy of the module in order to get a runnable version of the newest code.

Where To Go From Here

The system I have written works well, but there are still a few things I need to do before I can release it to CPAN

The preprocessor does not spit out a pure POD version of the docs, which is what CPAN wants for module documentation. Indeed, the literate documentation isn't really even the sort of documentation CPAN users are looking for. It may mean putting the overview documentation in a separate pod file, or marking a certain portion of the literate document as appropriate for extraction for use on CPAN.
I actually need to write a bit more of the documentation. While the Pod::Literate module and the literate preprocessing script are written in the literate style, there are still some longer code sections that I have not augmented with appropriate documentation.
Similarly I have not written any tests. Running it on it's own source code has been a pretty good test in itself, but I know I will get frowns from the community if I ship without a good test suite. For the tests-first segment, I'm sure I'm already getting those frowns.

Then there are a few things I'd like to do (or have done) but aren't critical. I'd really like a literate-aware version of perltidy that I could use to clean up the literate source. Right now it wouldn't know what to do with it. Similarly I'd like to investigate what it would take to make Perl::Critic compatible with the literate style files.

Pod::Literate source

Stephen Howard