Every once in a while I get mail from someone asking for a downloadable version of one or another “books” from the MDC Wiki. Since last March, I’ve got this request less than a dozen times, but it’s something I occasionally sit and try to bend my brain around.
There are many questions involved.
The complicated bit here is that in order to maintain a completely up-to-date downloadable/printable version of any particular collection of content within the wiki, the process of generating that content would have to be wholly automated. One would think that computers would be good at that sort of thing, but evidence appears to show otherwise. I’m certainly not the first or smartest person to think about this problem, and as far as I can tell every other project started towards a solution has been abandoned well before completion.
(Aside: if you know of a current and/or complete project that does what I’m talking about, be it wiki->PDF or wiki->docbook or wiki->xml, please send me a note.)
The good folk over at the Hula project have sort of addressed this issue with their Single Page Administration Guide (warning: it’s a long page…164 pages when saved to PDF). Using wiki includes, they’ve simply collated all the disparate pages into a single long page, which you can then print or save to PDF or what-have-you.
This include-everything-in-a-single-page trick is an OKish solution, in that it does allow people to get a copy of the content that they can then use offline. There are also problems. The table of contents has no page numbers. The page section headings don’t have numbers, so the section numbers in the TOC aren’t very useful. Links aren’t clickable (or even rendered as links) so things like “see Message Store” in the HTML version show up simply as “see Message Store” in the PDF. And so forth.
I think the trick will be figuring out how to turn wiki pages into DocBook XML fragments (using only a simple subset of DocBook elements), then patching those fragments together into full DocBook books. Once the DocBook book is available, there are a host of different tools that can be used to generate it into a variety of formats, including much-more-useful PDFs.
While that seems simple enough on the surface, the number of dead projects that have attempted to do this in a fully automated fashion seems to indicate otherwise.
So, if it can’t be fully automated, could it be partially automated? Could wiki markup be turned into a rough approximation of DocBook fragments which could then be finessed and pieced together by hand?
I don’t know. Maybe. Maybe not. What do you think? Is it worth it? Is there an easy way to do this wiki->DocBook or wiki->PDF generation that would generate a proper book without requiring a lot of human involvement?
The wiki has been an awesome boon for the state of Mozilla developer documentation. In less than a year over 22000 edits and additions have been made, each of which has served to improve the content we deliver. The web version of the content is XHTML compliant (with occasional markup errors in editing), and it’s relatively usable and friendly with a nice layout. The kicker is trying to turn this incredible resource into usable offline formats. Obviously we don’t want to stop using the wikis, so if we want to generate offline content, we have to figure out how to do that given the tools at our disposal.
And this is apparently what I spend my Friday evenings thinking about.