When comparing two files, diff
finds sequences of lines common to
both files, interspersed with groups of differing lines called
hunks. Comparing two identical files yields one sequence of
common lines and no hunks, because no lines differ. Comparing two
entirely different files yields no common lines and one large hunk that
contains all lines of both files. In general, there are many ways to
match up lines between two given files. diff
tries to minimize
the total hunk size by finding large sequences of common lines
interspersed with small hunks of differing lines.
For example, suppose the file F contains the three lines
‘a’, ‘b’, ‘c’, and the file G contains the same
three lines in reverse order ‘c’, ‘b’, ‘a’. If
diff
finds the line ‘c’ as common, then the command
‘diff F G’ produces this output:
1,2d0 < a < b 3a2,3 > b > a
But if diff
notices the common line ‘b’ instead, it produces
this output:
1c1 < a --- > c 3c3 < c --- > a
It is also possible to find ‘a’ as the common line. diff
does not always find an optimal matching between the files; it takes
shortcuts to run faster. But its output is usually close to the
shortest possible. You can adjust this tradeoff with the
--minimal (-d) option (see diff
Performance Tradeoffs).