This week's challenge begins with a real-world example.
Today I tried to transfer a bunch of files off of an ancient PC. This particular machine has a full disk, no networking card, no SCSI bus, and no tape or other removable mass storage media. I wasn't about to copy the entire hard disk to floppies. But I do have a version of the Unix "tar" program that runs under MS-DOS. I thought I would run the command
tar cf com2 .
to recursively write the contents of entire directory subtrees, in tar format, to the serial port. I had the serial port connected to a modern Linux box, where the command
stty raw -echo; cat > pcdump.tar
was running, to collect the characters arriving over the serial port and store them in a file.
This is, of course, the dreaded "Spray & Pray" technique, possibly the least sophisticated file transfer protocol in the known universe, but I'd run a few small experiments and convinced myself that it ought to work. It would take a while, of course, but I didn't mind leaving it running all day if necessary, since I'm not using the machine for anything else. Setting the tty device on the Linux side to raw mode was supposed to ensure that all characters (including nulls, control-C's, eighth-bit characters, etc.) would be properly written to the output file. Naturally, the tar program on the MS-DOS side opens its output file (here COM2) in binary mode, to avoid the various CRLF and control-Z problems which MS-DOS systems are notorious for having in text mode by default.
It didn't work. The very first control-Z that tar encountered (in a binary file among those in the directory it was archiving) screwed everything up. Not only was the control-Z not written to the serial port, but the next 210 or so characters weren't written, either. For whatever reason, opening the output file in binary mode wasn't sufficient.
I ended up modifying the tar program (as it was one I'd written, so had the source for it) to inspect every byte it was about to write to its output "file", and translate control-Z's to some innocuous bit pattern. (Of course, this was a nuisance and a waste of time, and necessitated more time wasted on the other end undoing this little encoding so that a stock tar program on the Linux side could disassemble the archive.)
There are a number of lessons here. (I'll probably cover at least one of the other ones, namely the danger of in-band control characters, in another of these challenges.) The lesson I want to focus on today is: Avoid pointless exceptions, strive for generality.
Programming is all about toolbuilding. Sometimes we're writing programs which are explicitly labeled as tools, but even when we're writing end-user application programs, just about all of the functions we write can be thought of as tools, too, inasmuch as they're intended to accomplish some part of the task of making that application run.
One of the defining characteristics of a good tool is (or ought to be) a certain amount of generality. You certainly don't want tools with ridiculous special cases which pointlessly restrict the problems they can be used to solve. Under certain circumstances, you might want a special-purpose tool optimized for one particular task out of 256 similar tasks, but why would you ever want to use a tool that could perform 255 out of 256 similar tasks, but not the 256th? Of what use is a serial port driver that can write any 8-bit byte to the serial port, except control-Z?
Someone had to go out of their way to program the MS-DOS serial port driver not to be able to pass control-Z. That time was wasted, and much worse: it has wasted everyone's time ever since. Everyone who wants to be able to write arbitrary 8-bit bytes has had to work around the limitation in some way. No one has ever said, "Say! Good thing the original designer put that special case in, so I didn't have to waste time handling that case specially myself!".
There are always pressures to put certain special cases in. Each special case will allegedly make its system more efficient, or more convenient, or more compatible, or more marketable, or something. It is believed that the special cases won't cause any problems, because they don't seem to conflict with any of the uses which anyone can imagine the system being put to. But another of the defining characteristics of a useful and general tool is that it can be successfully put to uses that its designers never dreamed of. But a tool can only be successful in that way if its designers made sure that it was, in fact, properly general for its task, that it didn't contain unnecessary exceptions, limitations, or special cases. The challenge for the rest of us is: see if we can make our code properly general and free of pointless exceptions, too.
This page by Steve Summit // about these challenges / previous next