2.4 grep Programs

grep searches the named input files for lines containing a match to the given patterns. By default, grep prints the matching lines. A file named - stands for standard input. If no input is specified, grep searches the working directory . if given a command-line option specifying recursion; otherwise, grep searches standard input. There are four major variants of grep, controlled by the following options.

-G
--basic-regexp

Interpret patterns as basic regular expressions (BREs). This is the default.

-E
--extended-regexp

Interpret patterns as extended regular expressions (EREs). (-E is specified by POSIX.)

-F
--fixed-strings

Interpret patterns as fixed strings, not regular expressions. (-F is specified by POSIX.)

-P
--perl-regexp

Interpret patterns as Perl-compatible regular expressions (PCREs).

For documentation, refer to https://www.pcre.org/, with these caveats:

  • In a UTF-8 locale, Perl treats data as UTF-8 only under certain conditions, e.g., if perl is invoked with the -C option or the PERL_UNICODE environment variable set appropriately. Similarly, pcre2grep treats data as UTF-8 only if invoked with -u or -U. In contrast, in a UTF-8 locale grep and git grep always treat data as UTF-8.
  • In Perl and git grep -P, ‘\d’ matches all Unicode digits, even if they are not ASCII. For example, ‘\d’ matches “٣” (U+0663 ARABIC-INDIC DIGIT THREE). In contrast, in ‘grep -P’, ‘\d’ matches only the ten ASCII digits, regardless of locale. In pcre2grep, ‘\d’ ordinarily behaves like Perl and git grep -P, but when given the --posix-digit option it behaves like ‘grep -P’. (On all platforms, ‘\D’ matches the complement of ‘\d’.)
  • The pattern ‘[[:digit:]]’ matches all Unicode digits in Perl, ‘grep -P’, git grep -P, and pcre2grep, so you can use it to get the effect of Perl’s ‘\d’ on all these platforms. In other words, in Perl and git grep -P, ‘\d’ is equivalent to ‘[[:digit:]]’, whereas in ‘grep -P’, ‘\d’ is equivalent to ‘[0-9]’, and pcre2grep ordinarily follows Perl but when given --posix-digit it follows ‘grep -P’.

    (On all these platforms, ‘[[:digit:]]’ is equivalent to ‘\p{Nd}’ and to ‘\p{General_Category: Decimal_Number}’.)

  • If grep is built with PCRE2 version 10.43 (2024) or later, ‘(?aD)’ causes ‘\d’ to behave like ‘[0-9]’ and ‘(?-aD)’ causes it to behave like ‘[[:digit:]]’.
  • Although PCRE tracks the syntax and semantics of Perl’s regular expressions, the match is not always exact. Perl evolves and a Perl installation may predate or postdate the PCRE2 installation on the same host, or their Unicode versions may differ, or Perl and PCRE2 may disagree about an obscure construct.
  • By default, grep applies each regexp to a line at a time, so the ‘(?s)’ directive (making ‘.’ match line breaks) is generally ineffective. However, with -z (--null-data) it can work:
    $ printf 'a\nb\n' |grep -zP '(?s)a.b'
    a
    b
    

    But beware: with the -z (--null-data) and a file containing no NUL byte, grep must read the entire file into memory before processing any of it. Thus, it will exhaust memory and fail for some large files.