When we first start thinking about how to count the words in a
function definition, the first question is (or ought to be) what are
we going to count? When we speak of “words” with respect to a Lisp
function definition, we are actually speaking, in large part, of
symbols. For example, the following multiply-by-seven
function contains the five symbols defun
,
multiply-by-seven
, number
, *
, and 7
. In
addition, in the documentation string, it contains the four words
‘Multiply’, ‘NUMBER’, ‘by’, and ‘seven’. The
symbol ‘number’ is repeated, so the definition contains a total
of ten words and symbols.
(defun multiply-by-seven (number) "Multiply NUMBER by seven." (* 7 number))
However, if we mark the multiply-by-seven
definition with
C-M-h (mark-defun
), and then call
count-words-example
on it, we will find that
count-words-example
claims the definition has eleven words, not
ten! Something is wrong!
The problem is twofold: count-words-example
does not count the
‘*’ as a word, and it counts the single symbol,
multiply-by-seven
, as containing three words. The hyphens are
treated as if they were interword spaces rather than intraword
connectors: ‘multiply-by-seven’ is counted as if it were written
‘multiply by seven’.
The cause of this confusion is the regular expression search within
the count-words-example
definition that moves point forward word
by word. In the canonical version of count-words-example
, the
regexp is:
"\\w+\\W*"
This regular expression is a pattern defining one or more word constituent characters possibly followed by one or more characters that are not word constituents. What is meant by “word constituent characters” brings us to the issue of syntax, which is worth a section of its own.