The LC_CTYPE
locale specifies the encoding of characters in
patterns and data, that is, whether text is encoded in UTF-8, ASCII,
or some other encoding. See Environment Variables.
In the ‘C’ or ‘POSIX’ locale, every character is encoded as
a single byte and every byte is a valid character. In more-complex
encodings such as UTF-8, a sequence of multiple bytes may be needed to
represent a character, and some bytes may be encoding errors that do
not contribute to the representation of any character. POSIX does not
specify the behavior of grep
when patterns or input data
contain encoding errors or null characters, so portable scripts should
avoid such usage. As an extension to POSIX, GNU grep
treats
null characters like any other character. However, unless the
-a (--binary-files=text) option is used, the
presence of null characters in input or of encoding errors in output
causes GNU grep
to treat the file as binary and suppress
details about matches. See File and Directory Selection.
Regardless of locale, the 103 characters in the POSIX Portable Character Set (a subset of ASCII) are always encoded as a single byte, and the 128 ASCII characters have their usual single-byte encodings on all but oddball platforms.