18.6.1.6 Searching with Fastmaps

If you’re searching through a long string, you should use a fastmap. Without one, the searcher tries to match at consecutive positions in the string. Generally, most of the characters in the string could not start a match. It takes much longer to try matching at a given position in the string than it does to check in a table whether or not the character at that position could start a match. A fastmap is such a table.

More specifically, a fastmap is an array indexed by the characters in your character set. Under the ASCII encoding, therefore, a fastmap has 256 elements. If you want the searcher to use a fastmap with a given pattern buffer, you must allocate the array and assign the array’s address to the pattern buffer’s fastmap field. You either can compile the fastmap yourself or have re_search do it for you; when fastmap is nonzero, it automatically compiles a fastmap the first time you search using a particular compiled pattern.

By setting the buffer’s fastmap field before calling re_compile_pattern, you can reuse a buffer data structure across multiple searches with different patterns, and allocate the fastmap only once. Nonetheless, the fastmap must be recompiled each time the buffer has a new pattern compiled into it.

To compile a fastmap yourself, use:

int
re_compile_fastmap (struct re_pattern_buffer *pattern_buffer)

pattern_buffer is the address of a pattern buffer. If the character c could start a match for the pattern, re_compile_fastmap makes pattern_buffer->fastmap[c] nonzero. It returns 0 if it can compile a fastmap and -2 if there is an internal error. For example, if ‘|’ is the alternation operator and pattern_buffer holds the compiled pattern for ‘a|b’, then re_compile_fastmap sets fastmap['a'] and fastmap['b'] (and no others).

re_search uses a fastmap as it moves along in the string: it checks the string’s characters until it finds one that’s in the fastmap. Then it tries matching at that character. If the match fails, it repeats the process. So, by using a fastmap, re_search doesn’t waste time trying to match at positions in the string that couldn’t start a match.

If you don’t want re_search to use a fastmap, store zero in the fastmap field of the pattern buffer before calling re_search.

Once you’ve initialized a pattern buffer’s fastmap field, you need never do so again—even if you compile a new pattern in it—provided the way the field is set still reflects whether or not you want a fastmap. re_search will still either do nothing if fastmap is null or, if it isn’t, compile a new fastmap for the new pattern.