Stale HOWTOs (the case of Modem-HOWTO)
by David S. Lawyer, Mar. 7, 2001
Out-of-date (stale) documentation is a major problem for Linux. This
is also a problem in the Linux Documentation Project (LDP). One well
known reason for stale documents is that document authors sometimes
don't revise their documents frequently enough. But even if they are
revised frequently, people searching for information may not find
up-to-date versions.
Here's why. Even though the Linux Documentation Project (LDP) has
the most recent versions of its documents on over 200 mirror sites,
several hundred other sites also carry LDP documents. Unfortunately,
most of these have stale documentation. Why don't people just go to
the mirror sites and avoid the other sites? The reason is that many
people search for information about Linux using one of the many search
engines available on the Internet. More likely than not, such a
search engine will find out-of-date Linux documents. While the LDP
sites have a search engine for searching the LDP site, it's often
advantageous to search the entire Web since there are many other
documents available besides just LDP's. But doing so is likely to
find stale documentation.
Suppose one finds a LDP HOWTO by using a search engine. Can't they
just look at the date of the document and also click on a link to a
mirror site that will have the latest document. Unfortunately, this
isn't too easy to do. What people usually find with a search engine
is not the entire document, but only a chapter of a document. The
html documents are usually split up into chapters so that they will
download fast.
Each chapter doesn't contain version or date information (perhaps it
should). While there may be a chapter in the document that contains a
link to the latest version, it's not likely to be in the chapter that
one finds with a search engine. To find such a link (if it exists)
requires first clicking on the "contents" link to get to the
table-of-contents page. Then one might browse the contents to try to
find a link to another chapter which itself might contain a link to
the most recent version. It's not simple, sure or fast so few readers
are likely to do this.
I did a quick survey to find out which versions of Modem-HOWTO were on
the Internet. Here's the results: (Last col. is number of sites on
the web per Google on Mar. 2, 2001.)
Version | Date | Count |
v0.14 | Feb. 2001 | 0 |
v0.13 | Feb. 2001 | 0 |
v0.12 | Dec. 2000 | 76 |
v0.11 | June 2000 | 118 |
v0.10 | May 2000 | 60 |
v0.09 | Mar. 2000 | 18 |
v0.08 | Jan. 2000 | 61 |
v0.07 | Nov. 1999 | 3 |
v0.06 | Nov. 1999 | 2 |
v0.05 | Oct. 1999 | 17 |
v0.04 | Aug. 1999 | 64 |
v0.03 | May 1999 | 11 |
v0.02 | Mar. 1999 | 73 |
v0.01 | Jan. 1999 | 58 |
v0.00 | Dec. 1998 | 63 |
The situation is not quite as dire as shown above since in some cases
Google doesn't have the latest info: the site has been updated but
Google doesn't know about it, or the site may be dead. But a spot
check indicated that roughly 80% of them still exist as listed. The
sites that were supposed to have v0.12 frequently had the latest
version.
For a small minority of cases there's double counting since some sites
have HOWTOs in more than one format. Also, a small minority of sites
have stale HOWTOs in a directory named "archives", "old", etc. This
is OK since they are being correctly classified.
In another respect the situation is even worse than described above
since the Modem-HOWTO was a fork from the Serial-HOWTO. Over 200 old
versions of Serial-HOWTO (prior to the first version on Modem-HOWTO)
are still on the Internet. They all contain quite obsolete
information about modems.
Here's some details on how I did the search. I searched using
google.com with search terms: Modem-HOWTO "modulation details" v0.xx
Where xx = 00, 01, 02, etc. The phrase ""modulation details" is from
the table-of-contents so as to always select the HTML table of
contents file (for split HTML-HOWTOs) . This is needed since v0.xx is
sometimes also in chapter 1 and used so that readers can click on a
link to LDP to see if they have the latest version. If "modulation
details" were omitted there would be double counting. Also,
"modulation details" removes hits on lists/catalogs of HOWTOS.
There's still some more details on how I did it but they're not of
general interest and are thus omitted.
Thus there are a lot of out-of-date versions of LDP docs (and other
documentation) on the Internet. One way to try to lessen this problem
would be to put some requirement into the license so that when a
document becomes outdated it must be clearly labeled as such. Such
labeling needs to be seen before one clicks on the document. But how
can this be assured? What might help would be to add a suffix to the
name of the document to indicate that it's outdated.
|