diff
, diff3
and sdiff
treat each line of
input as a string of characters and encoding errors, where an
encoding error is an input byte that is not part of any
character. Single-byte and multi-byte characters are supported, along
with common character encoding systems like UTF-8. The operating
system’s locale specifies the character encoding, and can be specified
with the LC_ALL
environment variable. You can find which
locales are supported on your system by running the shell command
‘locale -a’.
When counting columns for options like --expand-tabs (-t),
diff
consults the locale for the column width of each character,
and assumes that each encoding error occupies a single column.
When ignoring case for --ignore-case (-i),
diff
downcases each character before comparing it,
regardless of whether it is multi-byte. See Suppressing Case Differences.