Programming Challenges

Challenge #1: Stamp out 2-digit year formats!

(First posted January 2, 2000)

Well, we dodged a bullet. The world didn't come to a screeching halt on January 1, 2000 due to Y2K problems. Catastrophic failures were extremely unlikely in any case, but it looks like at least some of the hype and hard work paid off, such that most of the problems in noncritical systems got fixed, too.

As 1999 drew to a close, however, I was thinking about it again, and it occurred to me that there are probably a lot of systems out there which are still using 2-digit year representations internally, and which ducked any immediate Y2K problems by adopting "windowing" techniques and other expedient kludges. This means, of course, that these systems are still vulnerable to the same sorts of problems, perhaps in 2020 or 2050 (or whenever the edge of the window is), and definitely in 2100. That is, the problems have been put off, not solved. I reached a discouraging conclusion: there is going to be a "Y2.1K crisis", smaller than but similar to the one we just went through.

The challenge, then, is to root out and replace all the 2-digit date representations once and for all. We've now got 100 years to do it (not 5 or 10), so it shouldn't be impossible. Virtually all code in use today will be replaced (i.e. rewritten from scratch) sometime during the next century. Many of those systems will be rewritten soon enough that programmers alive today (and with memories of Y2K fresh in their heads) will be involved in the rewriting, and will be able to recommend proper fixes. But I'm afraid that seeing those proper fixes through to actual implementation may still be a challenge.

The problem, of course, is that few systems are ever replaced all at once. It may be that new components are introduced one at a time, and must therefore coexist for a time with nearby, not-yet-replaced old components. New code may have to process old data files. Therefore, every not-yet-replaced software component which still uses 2-digit years, and every existing data file which incorporates two-digit years, will act as a retarding force, urging the delay of the adoption of a proper, systemwide fix.

I hope I'm wrong, but I can very easily imagine this scenario, played out numerous times over the next few decades: A near-total rewrite and replacement of a major system is about to begin. One programmer or group of programmers notes that this will be a splendid time to go back and do things right, to adopt a proper, 4-digit year representation systemwide, to discard the windowing techniques and other kludges that were adopted back in '98 and '99, when there wasn't time to do things right. But another group advocates caution.

"We've still got all these data files containing 2-digit years", someone points out. "It will be very hard to arrange a graceful changeover right now, especially since we have numerous auxiliary programs in use, which we weren't planning on replacing, which wouldn't be able to read the hypothetical new 4-digit year formats if we decided to have the new system write them that way. Even though we're discarding an old system and designing a new one, it's obvious that none of the code we write today will still be in use in 2100, and we'll all be dead by then anyway. It's a nice idea, but it turns out that we don't have to fix the 2-digit year problem right now, after all. Someone else will have to fix it later, when the time is right."

The problem is, of course, that with this kind of conservative, pessimistic thinking, the time is never right. At any given instant, any time during the next 100 years, there will always be at least one old program or data format that's still desperately clinging to the notion of using 2-digit years. Using those old programs and data formats as an excuse to perpetuate 2-digit year formats into each next generation of software guarantees that the problem won't go away, and will be very much alive when the year 2100 rolls around. It may be very true that "none of the code we write today will still be in use in 2100", but it's also true that, just about no matter what, the code in use in 2100 will be based on code that was based on code that was based on code that was based on code that was written today. Just as in the rest of the world, certain aspects of programming systems have a way of propagating themselves almost indefinitely. If conservative, excessively backwards compatible thinking is allowed to reign supreme, grave errors can be propagated indefinitely, too.

At any given time, it's possible to perpetuate those long-ago errors, so at any given time, the time to fix a stubborn error once and for all is now. So, here's the challenge: if you're responsible for any piece of code which still uses 2-digit year representations (either internally or externally), and if the code in question is up for any kind of major revision at any time when you're around, insist that it be updated to use 4-digit years. If there are external interfaces or data file formats which still use two-digit years, figure out a backwards-compatibility scheme which is more elegant than simply perpetuating the mistake, a scheme which will not continue to impose the use of 2-digit years on new code and new data files, a scheme which makes some progress towards the day when all programs and files have been updated. (In the case of data files, you might want to use some kind of self-describing data, which is likely to be the subject of another of these challenges.)

I hope it's obvious that using 4-digit years internally (that is, using some integer format which represents 1999 as 1999, and 2000 as 2000) is the only sane, unambiguous way to represent years. When you have to make exceptions, make sure they're restricted to external interfaces or data formats. (The distinction between internal and external representations is likely to be the subject of another of these challenges.) For example, if your user interface requires dates to be entered, and if the dates being entered will always be contemporary dates (that is, dates within a few years of the date they're being entered), you might allow them to be entered as two digits, and to have your code convert the 2-digit year entered to a 4-digit year internally, intuiting the century using a robust windowing technique (i.e. not by simply prepending the century that's current at the time you write the code!). But you might also allow (or even encourage) the user to type in an unambiguous 4-digit year. Similarly, on output, if you must, take the internal 4-digit format and truncate it to two digits for display. (But, all things considered, this is a silly thing to do. Me, if I was given a specification which stipulated that dates be displayed using a 2-digit format, I'd quietly implement it as displaying a 4-digit output instead, and refuse to fix the "bug" even if someone did notice and complain about it.)

While we're talking about switching from 2- to 4-digit years in computer code, it's worth thinking about making the same change in real life, too. I live in the U.S., where the m/d/y format is popular, but for at least the past year I've been trying to get in the habit of writing mm/dd/yyyy instead of mm/dd/yy. (That is, the last day of last year was 12/31/1999, and today is 1/2/2000.) Not to be defensive about the profession or anything, but I believe that at least half of the instances of "the Y2K bug" (which as we know is not a singular bug but rather a huge set of independent bugs, all of which just happened to strike at the same time) were not due to programmers trying to save two bytes of precious disk space, but rather due to programmers unthinkingly using the same sorts of date formats internal to computer code as they (and everyone else) use in real life.

(Along the same lines, yet another good habit to get into is using an internationally-accepted date format, such as ISO xxx's yyyy-mm-dd, to avoid the eternal ambiguity over whether a date is m/d/y or d/m/y or y/m/d.)

This page by Steve Summit // about these challenges / next