Programming Challenges

Challenge #4: Beware "Case and Paste"

(First posted January 23, 2000)

Code reuse is usually a Good Thing, but one kind of reuse that's not so good is to use an editor's cut-and-paste feature to create a block of code that's almost exactly like another block of code, except for n slight differences.

Whenever you find yourself using cut-and-paste when writing code, beware! There are two problems which it is all too easy to have. One, of course, is that you'll neglect to edit the pasted block (the copy of the original code) completely, that you'll overlook one of the n slight differences. For example, here's a scrap of code I wrote just last night:

	switch(partcode)
		{
		case inUpButton:
			if(rsbp->rsb_cur > rsbp->rsb_min)
				rsbp->rsb_cur--;
			break;

		case inDownButton:
			if(rsbp->rsb_cur < rsbp->rsb_min)
				rsbp->rsb_cur++;
			break;

Do you see the bug? When I copied the code for case inUpButton and pasted it in to the inDownButton case, I remembered to change the > to a <, and I remembered to change the -- to ++, but I forgot to change rsb_min to rsb_max. The symptom was that the "down" button on a scroll bar never worked, but since (during my first tests) it was taking at least 10 clicks on a scroll button to get the scroll bar to move visibly at all, I didn't detect the problem with the down button right away.

The second problem with case-and-paste is more subtle, but more serious. Suppose you replicate, not the code for one case in a switch statement, but the code for an entire function, to create a second copy of the function that is almost identical, except for some number of significant differences. Not only have you bloated the code, you've also made it far too easy for a later programmer (which might even be yourself) to come alone and make a change to one function but to overlook the necessity of making the same change to the clone. (Or, even worse, a later programmer might notice the existence of both functions, and use that precedent as an excuse to clone off a third copy.)

For example, I am currently making some changes to a data logging system. Depending on the primary event which has just been logged, the program must occasionally prompt for one or two additional pieces of data. Among the changes I had to make to this program was to expand the repertoire of pieces of additional data which it might prompt for. (This actually involved adding several new but similar cases to a switch, but this time I managed not to make any case-and-paste mistakes.) The problem was that I didn't realize that the program has a "revise" mode, which allows overlooked events to be inserted farther back in the log. Furthermore, the code which implements the "revise" function has its own switch statement which prompts for any additional data required by the inserted event. But since I didn't know that, I ended up with a modified system which properly prompts for the new additional pieces of information only when an event was logged normally, but not when it is inserted in "revise" mode. (The code still has that bug as I write this, since I refuse to add the new code a second time; among other things the code is getting too large for the embedded system it runs on. Coalescing the two divergent switch statements back in to one central "get additional data required by event (whether real-time or inserted)" function is a project for this afternoon.)

The challenge, then, is not necessarily to completely eliminate the use of cut-and-paste when programming. (If a switch statement has a bunch of parallel similar-but-different cases, you'd like to make it easy to see the similarities and the differences, and cut-and-paste is probably the best way to achieve this; if you stubbornly typed each case in from scratch, their similar aspects might end up looking different for cosmetic reasons, and of course you might end up with the other kind of bug, involving some aspect of two cases which was supposed to be the same, but accidentally wasn't.) But whenever you find yourself using cut-and-paste, a little mental alarm bell should go off, reminding you to be extra careful.

Furthermore, in general, cut-and-paste should probably be reserved for small scraps of code which are and will remain right next to each other, e.g. case statements in a switch. If you find yourself using cut-and-paste to create a clone of an entire function, it's very likely that you're setting a future programmer up for the kind of blunder I described above. (Also, you're contributing to the problem of global bloating.) The ideal, of course, is to rearrange the code so that the differences remain different, while the similarities are identical -- that is, so that the "similarities" are in one place, not two.

Rather than creating a second entire copy of an original function, where the copy has certain differences but many similarities, there are several things you can do. At the very least, perhaps you can keep just the one original function, and add a flag which tells it which of two similar actions to perform. But mode flags are cheesy, and they're not your only alternative. Perhaps you can place the common code in a third function. Or if one of the two functions is a subset of the other, perhaps you can make that relationship explicit by having one function call the other. (If these abstract descriptions don't make sense, here are some schematic examples.)

During initial development, when everything is changing fast, it's not always easy to see exactly how the functionality in some code should be partitioned, and "niceties" such as long-term maintainability often take a back seat. (Indeed, if code is undergoing sweeping changes every day, it might be a waste of time to polish the code too much, if a polished bit might be abandoned tomorrow, before anyone had a chance to appreciate it.) One technique I've occasionally managed to use (with some success) is:

1. Temporarily, during initial development, make a duplicate copy of a function, making whatever changes to the copy are required, leaving the original intact.
2. When things have settled down, but before I've forgotten about the issue, extract a copy of each function's source code to a temporary file, and run a mechanical diff program to see exactly how the two functions have diverged. Then, merge the two using some appropriate technique suggested by the particular form of the revealed similarities and differences.

This is a risky technique, of course, because if I forget to apply step 2, pretty soon I forget about the situation, and eventually I might come back and make some change to one of the two functions which ought to be made to both.

Beware of "case and paste"!

(See also the Jargon File entry.)

This page by Steve Summit // about these challenges / previous next