Functions and Program Structure
Functions break large computing tasks into smaller ones, and enable people to
build on what others have done instead of starting over
from scratch.
Appropriate functions hide details of operation from parts of the program that
don't need to know about them, thus clarifying the whole, and easing the pain
of making changes.
C has been designed to make functions efficient and easy to use; C programs
generally consist of many small functions rather than a few big ones. A
program may reside in one or more source files. Source files may be compiled
separately and loaded together, along with previously compiled functions from
libraries. We will not go into that process here, however, since the details vary
from system to system.
Function declaration and definition is the area where the ANSI standard has
made the most changes to C. It is now possible to declare the type of arguments
when a function’s declared. The syntax of function declaration also changes, so
that declarations and definitions match. This makes it possible for a compiler to
detect many more errors than it could before.
Furthermore, when arguments are properly declared, appropriate type coercions
are performed automatically.
The standard clarifies the rules on the scope of names; in particular, it requires
that there be only one definition of each external object. Initialization is more
general: automatic arrays and structures may now be initialized.
The C preprocessor has also been enhanced. New preprocessor facilities
include a more complete set of conditional compilation directives, and better
control over the macro expansion process.
To begin with, let us design and write a program to print each line of its input
that contains a particular ``pattern'' or string of characters. (This is a special
case of the UNIX program grep.)
The job falls neatly into three pieces:
while (there's another line)
if (the line contains the pattern)
print it
Although it's certainly possible to put the code for all of this in main, a better
way is to use the structure to advantage by making each part a separate function.
Three small pieces are better to deal with than one big one, because irrelevant
details can be buried in the functions, and the chance of unwanted interactions
is minimized. And the pieces may even be useful in other programs ``While
1
is convenient
there's another line'' is getline, and ``print it'' is printf, which someone has
already provided for us. This means we need only write a routine to decide
whether the line contains an occurrence of the pattern.
We can solve that problem by writing a function strindex(s,t) that returns the
position or index in the string s where the string t begins, or -1 if s does not
contain t. Because C arrays begin at position zero, indexes will be zero or
positive, and so a negative value like
signaling failure.
When we later need more sophisticated pattern matching, we only have to
replace strindex; the rest of the code can remain the same. (The standard library
provides a function strstr that is similar to strindex, except that it returns a
pointer instead of an index.) Given this much design, filling in the details of the
program is straightforward. Here is the whole thing, so you can see how the
pieces fit together. For now, the pattern to be searched for is a literal string,
which is not the most general of mechanisms. We will return shortly to a
discussion of how to initialize character arrays, and will show how to make the
pattern a parameter that is set when the program is run. There is also a slightly
different version of getline.
for
#include
#define MAXLINE 1000 /* maximum input line length */
int getline(char line[], int max)
int strindex(char source[], char searchfor[]);
char pattern[] = "ould";
/* pattern to search for */
/* find all lines matching pattern */
main()
{
char line[MAXLINE];
int found = 0;
while (getline(line, MAXLINE) > 0)
if (strindex(line, pattern) >= 0) {
printf("%s", line);
found++;
}
return found;
2
}
get line into s, return length */
/* getline:
int getline(char s[], int lim)
{
int c, i;
i = 0;
while (--lim > 0 && (c=getchar()) != EOF && c != '\n')
s[i++] = c;
if (c == '\n')
s[i++] = c;
s[i] = '\0';
return i;
}
return index of t in s, -1 if none */
/* strindex:
int strindex(char s[], char t[])
{
int i, j, k;
for (i = 0; s[i] != '\0'; i++) {
for (j=i, k=0; t[k]!='\0' && s[j]==t[k]; j++, k++)
if (k > 0 && t[k] == '\0')
return i;
}
return -1;
}
Each function definition has the form:
return-type function-name(argument declarations)
{
declarations and statements
}
Various parts may be absent; a minimal function is
dummy() {}
which does nothing and returns nothing. A do-nothing function like this is
sometimes useful as a place holder during program development. If the return
3
expression;
type is omitted, int is assumed.
A program is
just a set of definitions of variables and functions.
Communication between the functions is by arguments and values returned by
the functions, and through external variables. The functions can occur in any
order in the source file, and the source program can be split into multiple files,
so long as no function is split.
The return statement is the mechanism for returning a value from the called
function to its caller. Any expression can follow return:
return
The expression will be converted to the return type of the function if necessary.
Parentheses are often used around the expression, but they are optional.
The calling function is free to ignore the returned value. Furthermore, there
need to be no expression after return; in that case, no value is returned to the
caller. Control also returns to the caller with no value when execution ``falls off
the end'' of the function by reaching the closing right brace. It is not illegal, but
probably a sign of trouble, if a function returns a value from one place and no
value from another.
In any case, if a function fails to return a value, its ``value'' is certain to be
garbage.
The pattern-searching program returns a status from main, the number of
matches found. This value is available for use by the environment that called
the program.
The mechanics of how to compile and load a C program that resides on
multiple source files vary from one system to the next. On the UNIX system,
for example, the cc command mentioned does the job. Suppose that the three
functions are stored in three files
called main.c, getline.c, and strindex.c. Then the command
cc main.c getline.c strindex.c
compiles the three files, placing the resulting object code in files main.o,
getline.o, and strindex.o, then loads them all into an executable file called
a.out. If there is an error, say in main.c, the file can be recompiled by itself and
the result loaded with the previous object files, with the command
cc main.c getline.o strindex.o
command uses the ``.c'' versus ``.o'' naming convention to distinguish
The cc
source files from object files.
So far our examples of functions have returned either no value (void) or an int.
What if a function must return some other type? Many numerical functions like
4
cos return
double; other specialized functions return other
sqrt, sin, and
types.
To illustrate how to deal with this, let us write and use the function atof(s),
which converts the string s to its double-precision floating-point equivalent.
atof if an extension of atoi . It handles an optional sign and decimal point, and
the presence or absence of either part or fractional part. Our version is not a
high-quality input conversion routine; that would take more space than we care
to use. The standard library includes an atof; the header declares it.
First, atof itself must declare the type of value it returns, since it is not int. The
type name precedes the function name.
Second, and just as important, the calling routine must know that atof returns a
non-int value. One way to ensure this is to declare atof explicitly in the calling
routine. The declaration is shown in this primitive calculator (barely adequate
for check-book balancing), which reads one number per line, optionally
preceded with a sign, and adds them up, printing the running sum after each
input:
#include
#define MAXLINE 100
/* rudimentary calculator */
main()
{
double sum, atof(char []);
char line[MAXLINE];
int getline(char line[], int max);
sum = 0;
while (getline(line, MAXLINE) > 0)
printf("\t%g\n", sum += atof(line));
return 0;
}
The declaration
double sum, atof(char []);
says that sum is a double variable, and that atof is a function that takes one
char[] argument and return a double.
The function atof must be declared and defined consistently. If atof itself and
the call to it in main have inconsistent types in the same source file, the error
will be detected by the compiler. But if (as is more likely) atof were compiled
5
separately, the mismatch would not be detected, atof would return a double that
main would treat as an int, and meaningless answers would result. In the light
of what we have said about how declarations must match definitions, this might
seem surprising. The reason a mismatch can happen is that if there is no
function prototype, a function is implicitly declared by its first appearance in an
expression, such as
sum += atof(line)
If a name that has not been previously declared occurs in an expression and is
followed by a left parentheses, it is declared by context to be a function name,
the function is assumed to return an
int, and nothing is assumed about its
arguments. Furthermore, if a function declaration does not include arguments,
as in
double atof();
that too is taken to mean that nothing is to be assumed about the arguments of
atof; all parameter checking is turned off. This special meaning of the empty
argument list is intended to permit older C programs to compile with new
compilers. But it's a bad idea to use it with new C programs. If the function
takes arguments, declare them; if it takes no arguments, use void. Given atof,
properly declared, we could write atoi (convert a string to int) in terms of it:
/* atoi:
convert string s to integer using atof */
int atoi(char s[])
{
double atof(char s[]);
return (int) atof(s);
}
Notice the structure of the declarations and the return statement. The value of
the expression in: return expression; is converted to the type of the function
before the return is taken. Therefore, the value of atof, a double, is converted
automatically to int when it appears in this return, since the function atoi
returns an int. This operation does potentionally discard information, however,
so some compilers warn of it. The cast states explicitly that the operation is
intended, and suppresses any warning.
A C program consists of a set of external objects, which are either variables or
functions. The adjective ``external'' is used in contrast to ``internal'', which
describes the arguments and variables defined inside functions. External
variables are defined outside of any function, and are thus potentionally
6
available to many functions. Functions themselves are always external, because
C does not allow functions to be defined inside other functions. By default,
external variables and functions have the property that all references to them by
the same name, even from functions compiled separately, are references to the
same thing. (The standard calls this property external linkage.) In this sense,
external variables are analogous to Fortran COMMON blocks or variables in
the outermost block in Pascal. We will see later how to define external variables
and functions that are visible only within a single source file.
Because external variables are globally accessible, they provide an alternative
to function arguments and return values for communicating data between
functions. Any function may access an external variable by referring to it by
name, if the name has been declared somehow. If a large number of variables
must be shared among functions, external variables are more convenient and
efficient than long argument lists. However, this reasoning should be applied
with some caution, for it can have a bad effect on program structure, and lead to
programs with too many data connections between functions. External
variables are also useful because of their greater scope and lifetime. Automatic
variables are internal to a function; they come into existence when the function
is entered, and disappear when it is left. External variables, on the other hand,
are permanent, so they can retain values from one function invocation to the
next. Thus if two functions must share some data, yet neither calls the other, it
is often most convenient if the shared data is kept in external variables rather
than being passed in and out via arguments. Let us examine this issue with a
larger example. The problem is to write a calculator program that provides the
operators +, -, * and /. Because it is easier to implement, the calculator will use
reverse Polish notation instead of infix. (Reverse Polish notation is used by
some pocket calculators, and in languages like Forth and Postscript.) In reverse
Polish notation, each operator follows its operands; an infix expression like
(1 - 2) * (4 + 5)
is entered as
1 2 - 4 5 + *
Parentheses are not needed; the notation is unambiguous as long as we know
how many operands each operator expects.
What are getch and ungetch? It is often the case that a program cannot
determine that it has read enough input until it has read too much. One instance
is collecting characters that make up a number: until the first non-digit is seen,
the number is not complete. But then the program has read one character too far,
7
getch
reads
from the buffer
if
a character that it is not prepared for.
The problem would be solved if it were possible to ``un-read'' the unwanted
character. Then, every time the program reads one character too many, it could
push it back on the input, so the rest of the code could behave as if it had never
been read. Fortunately, it's easy to simulate un-getting a character, by writing a
pair of cooperating functions. getch
delivers the next input character to be
considered; ungetch will return them before reading new input. How they work
together is simple. Ungetch puts the pushed-back characters into a shared
buffer -- a character array.
there is
anything else, and calls getchar if the buffer is empty. There must also be an
index variable that records the position of the current character in the buffer.
Since the buffer and the index are shared by getch and ungetch and must retain
their values between calls, they must be external to both routines.
The standard library includes a function ungetch that provides one character of
pushback. We have used an array for the pushback, rather than a single
character, to illustrate a more general approach.
The functions and external variables that make up a C program need not all be
compiled at the same time; the source text of the program may be kept in
several files, and previously compiled routines may be loaded from libraries.
Among the questions of interest are
•
compilation?
•
connected when the program is loaded?
•
•
Let us discuss these topics by reorganizing the calculator program into several
files. As a practical matter, the calculator is too small to be worth splitting, but
it is a fine illustration of the issues that arise in larger programs.
The scope of a name is the part of the program within which the name can be
used. For an automatic variable declared at the beginning of a function, the
scope is the function in which the name is declared. Local variables of the same
name in different functions are unrelated. The same is true of the parameters of
the function, which are in effect local variables.
The scope of an external variable or a function lasts from the point at which it
is declared to the end of the file being compiled. For example, if main, sp, val,
push, and pop are defined in one file, in the order shown above, that is,
How are declarations organized so there is only one copy?
How are external variables initialized?
How are declarations written so that variables are properly declared during
How are declarations arranged so that all the pieces will be properly
8