Next: , Previous: Access Control, Up: Top



Appendix D Querying using regular expressions

See also Query expressions.

Unfortunately, we do not have room in this manual for a complete exposition on regular expressions. The following is a basic summary of some regular expressions you might wish to use.

NOTE: When you use query expressions containing regular expressions as part of an ordinary query-pr shell command line, you need to quote them with '', otherwise the shell will try to interpret the special characters used, yielding highly unpredictable results.

See Regular Expression Syntax (Regex), for details on regular expression syntax. Also see Syntax of Regular Expressions (GNU Emacs Manual), but beware that the syntax for regular expressions in Emacs is slightly different.

All search criteria options to query-pr rely on regular expression syntax to construct their search patterns. For example,

     query-pr --expr 'State="open"' --format full

matches all PRs whose State values match with the regular expression open.

We can substitute the expression o for open, according to gnu regular expression syntax. This matches all values of State which begin with the letter o.

We see that

     query-pr --expr 'State="o"' --format full

is equivalent to

     query-pr --expr 'State="open"' --format full

in this case, since the only value for State which matches the expression o in a standard installation is open. State="o" also matches o, oswald, and even oooooo, but none of those values are valid states for a Problem Report in default gnats installations.

We can also use the expression operator | to signify a logical OR, such that

     query-pr --expr 'State="o|a"' --format full

matches all open or analyzed Problem Reports.

Regular expression syntax considers a regexp token surrounded with parentheses, as in (regexp), to be a group. This means that (ab)* matches any number (including zero) of contiguous instances of ab. Matches include , ab, and ababab.

Regular expression syntax considers a regexp token surrounded with square brackets, as in [regexp], to be a list. This means that Char[(ley)(lene)(broiled) matches any of the words Charley, Charlene, or Charbroiled (case is significant; charbroiled is not matched).

Using groups and lists, we see that

     query-pr --expr 'Category="gcc|gdb|gas"' --format full

is equivalent to

     query-pr --expr 'Category="g(cc|db|as)"' --format full

and is also very similar to

     query-pr --expr 'Category="g[cda]"' --format full

with the exception that this last search matches any values which begin with gc, gd, or ga.

The . character is known as a wildcard. . matches on any single character. * matches the previous character (except newlines), list, or group any number of times, including zero. Therefore, we can understand .* to mean “match zero or more instances of any character.”

     query-pr --expr 'State=".*a"' --format full

matches all values for State which contain an a. (These include analyzed and feedback.)

Another way to understand what wildcards do is to follow them on their search for matching text. By our syntax, .* matches any character any number of times, including zero. Therefore, .*a searches for any group of characters which end with a, ignoring the rest of the field. .*a matches analyzed (stopping at the first a) as well as feedback.

Note: When using fieldtype:Text or fieldtype:Multitext (see Query expressions), you do not have to specify the token .* at the beginning of your expression to match the entire field. For the technically minded, this is because these queries use re_search rather than re_match. re_match anchors the search at the beginning of the field, while re_search does not anchor the search.

For example, to search in the >Description: field for the text

     The defrobulator component returns a nil value.

we can use

     query-pr --expr 'fieldtype:Multitext="defrobulator.*nil"' --format full

To also match newlines, we have to include the expression (.|^M) instead of just a dot (.). (.|^M) matches “any single character except a newline (.) or (|) any newline (^M).” This means that to search for the text

     The defrobulator component enters the bifrabulator routine
     and returns a nil value.

we must use

     query-pr --expr 'fieldtype:Multitext="defrobulator(.|^M)*nil"'
              --format full

To generate the newline character ^M, type the following depending on your shell:

csh
control-V control-M
tcsh
control-V control-J
sh (or bash)
Use the <RETURN> key, as in
          (.|
          )
     

Again, see Regular Expression Syntax (Regex), for a much more complete discussion on regular expression syntax.