18.1.1 Handling Multi-byte and Varying-Width Characters

diff, diff3 and sdiff treat each line of input as a string of characters and encoding errors, where an encoding error is an input byte that is not part of any character. Single-byte and multi-byte characters are supported, along with common character encoding systems like UTF-8. The operating system’s locale specifies the character encoding, and can be specified with the LC_ALL environment variable. You can find which locales are supported on your system by running the shell command ‘locale -a’.

When counting columns for options like --expand-tabs (-t), diff consults the locale for the column width of each character, and assumes that each encoding error occupies a single column.

When ignoring case for --ignore-case (-i), diff downcases each character before comparing it, regardless of whether it is multi-byte. See Suppressing Case Differences.