10.1 Naming Library Function Global Variables

Due to the way the awk language evolved, variables are either global (usable by the entire program) or local (usable just by a specific function). There is no intermediate state analogous to static variables in C.

Library functions often need to have global variables that they can use to preserve state information between calls to the function—for example, getopt()’s variable _opti (see Processing Command-Line Options). Such variables are called private, as the only functions that need to use them are the ones in the library.

When writing a library function, you should try to choose names for your private variables that will not conflict with any variables used by either another library function or a user’s main program. For example, a name like i or j is not a good choice, because user programs often use variable names like these for their own purposes.

The example programs shown in this chapter all start the names of their private variables with an underscore (‘_’). Users generally don’t use leading underscores in their variable names, so this convention immediately decreases the chances that the variable names will be accidentally shared with the user’s program.

In addition, several of the library functions use a prefix that helps indicate what function or set of functions use the variables—for example, _pw_byname() in the user database routines (see Reading the User Database). This convention is recommended, as it even further decreases the chance of inadvertent conflict among variable names. Note that this convention is used equally well for variable names and for private function names.69

As a final note on variable naming, if a function makes global variables available for use by a main program, it is a good convention to start those variables’ names with a capital letter—for example, getopt()’s Opterr and Optind variables (see Processing Command-Line Options). The leading capital letter indicates that it is global, while the fact that the variable name is not all capital letters indicates that the variable is not one of awk’s predefined variables, such as FS.

It is also important that all variables in library functions that do not need to save state are, in fact, declared local.70 If this is not done, the variables could accidentally be used in the user’s program, leading to bugs that are very difficult to track down:

function lib_func(x, y,    l1, l2)
{
    ...
    # some_var should be local but by oversight is not
    use variable some_var
    ...
}

A different convention, common in the Tcl community, is to use a single associative array to hold the values needed by the library function(s), or “package.” This significantly decreases the number of actual global names in use. For example, the functions described in Reading the User Database might have used array elements PW_data["inited"], PW_data["total"], PW_data["count"], and PW_data["awklib"], instead of _pw_inited, _pw_awklib, _pw_total, and _pw_count.

The conventions presented in this section are exactly that: conventions. You are not required to write your programs this way—we merely recommend that you do so.

Beginning with version 5.0, gawk provides a powerful mechanism for solving the problems described in this section: namespaces. Namespaces and their use are described in detail in Namespaces in gawk.


Footnotes

(69)

Although all the library routines could have been rewritten to use this convention, this was not done, in order to show how our own awk programming style has evolved and to provide some basis for this discussion.

(70)

gawk’s --dump-variables command-line option is useful for verifying this.