section 1.9: Character Arrays

Pay attention to the way this program is developed first in ``pseudocode,'' and then refined into real C code. A clear pseudocode statement not only makes it easier to think about the structure of the eventual real code, but if you make the eventual real code mimic the pseudocode, the real code will be equally straightforward and easy to read.

The function getline, introduced here, is extremely useful, and we'll have as much use for it in our own programs as the authors do in theirs. (In other words, they have succeeded in their goal of making it ``useful in other contexts.'' In fact, I've been using a getline function much like this one ever since I learned C from K&R, and I generally find it preferable to the standard library's line-reading function.)

Pages 28 through 30 introduce quite a lot of material all at once; you'll probably want to read it several times, especially if arrays or character strings are new to you.

Earlier we said that C provided no particular built-in support for composite objects such as character strings, and here we begin to see the significance of that omission. A string is just an array of characters, and you can access the characters within a string exactly as easily (because you use exactly the same syntax) as you access the elements within any other array.

If you've used BASIC, you will probably wonder where C's SUBSTR function is. C doesn't have one, for two reasons. First of all, there's less of a need for one, because it's so easy the get at the individual characters within a string in C. More importantly, a SUBSTR function implies that you take a string and extract a substring as a new string. However, creating a new string (i.e. the extracted substring) involves allocating arbitrary amounts of memory to hold the string, and C rarely if ever allocates memory implicitly for you.

If anything, it's too easy to access the individual characters within strings in C. String handling illustrates one of the potentially frustrating aspects of C we mentioned earlier: the language doesn't define any high-level string handling features for you, so you're free to do whatever low-level string processing you wish. The down side is that constantly manipulating strings down at the character level, and always having to remember to allocate memory for new strings, can get tedious after a while.

The preceding paragraph is not meant to discourage you, but just to point out a reality: any C program which manipulates strings (and this includes most C programs) will find itself doing a certain amount of character-level fiddling and a certain amount of memory allocation. It will also find that it can do just about anything it wants to do (and that its programmer has the patience to do) with the strings it manipulates.

Since string processing, and at this relatively low level, is so common in C, you'll want to pay careful attention to the discussion on page 30 of how strings are stored in character arrays, and particularly to the fact that a '\0' character is always present to mark the end of a string. (It's easy to forget to count the '\0' character when allocating space for a string, for instance.) Notice the nice picture on page 30; this is a good way of thinking about data structures (and not just simple character arrays, either).

page 29

Note that the program explicitly allocates space for the two strings it manipulates: the current line line, and the longest line longest. (It only needs these two strings at any one time, even though the input consists of arbitrarily many lines.) Note that it cannot simply assign one string to another (because C provides no built-in support for composite objects such as character strings); the program calls the copy function to do so. (The authors write their own copy function for explanatory purposes; the standard library contains a string-copying function which would normally be used.) The only strings that aren't explicitly allocated are the arrays in the getline and copy functions; as the discussion briefly mentions, these do not need to be allocated because they're already allocated in the caller. (There are a number of subtleties about array parameters to functions; we'll have more to say about them later.)

The code on page 29 contains a number of examples of compressed assignments and tests; evidently the authors expect you to get used to this style in a hurry. The line

	while ((len = getline(line, MAXLINE)) > 0)
is similar to the getchar loops earlier in this chapter; it calls getline, saves its return value in the variable len, and tests it against 0.

The comparison

	i<lim-1 && (c=getchar())!=EOF && c!='\n'
in the for loop in the getline function does several things: it makes sure there is room for another character in the array; it calls, assigns, and tests getchar's return value against EOF, as before; and it also tests the returned character against '\n', to detect end of line. The surrounding code is mildly clumsy in that it has to check for \n a second time; later, when we learn more about loops, we may find a way of writing it more cleanly. You may also notice that the code deals correctly with the possibility that EOF is seen without a \n.

The line

	while ((to[i] = from[i]) != '\0')
in the copy function does two things at once: it copies characters from the from array to the to array, and at the same time it compares the copied character against '\0', so that it stops at the end of the string. (If you think this is cryptic, wait 'til we get to page 106 in chapter 5!)

We've also just learned another printf conversion specifier: %s prints a string.

page 30

Deep sentence:

There is no way for a user of getline to know in advance how long an input line might be, so getline checks for overflow.
Because dynamically allocating memory for arbitrary-length strings is mildly tedious in C, it's tempting to use fixed-size arrays. (It's so tempting, in fact, that that's what most programs do, and since fixed-size arrays are also considerably easier to discuss, all of our early example programs will use them.) Using fixed-size arrays is fine, as long as some assurance is made that they don't overflow. Unfortunately, it's also tempting (and easy) to forget to guard against array overflow, perhaps by deluding yourself into thinking that too-long inputs ``can't happen.'' Murphy's law says that they do happen, and the various corrolaries to Murphy's law say that they happen in the most unpleasant way and at the least convenient time. Don't be cavalier about arrays; do make sure that they're big enough and that you guard against overflowing them. (In another mark of C's general insensitivity to beginning programmers, most compilers do not check for array overflow; if you write more data to an array than it is declared to hold, you quietly scribble on other parts of memory, usually with disastrous results.)


Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995, 1996 // mail feedback