1.1 Hunks

When comparing two files, diff finds sequences of lines common to both files, interspersed with groups of differing lines called hunks. Comparing two identical files yields one sequence of common lines and no hunks, because no lines differ. Comparing two entirely different files yields no common lines and one large hunk that contains all lines of both files. In general, there are many ways to match up lines between two given files. diff tries to minimize the total hunk size by finding large sequences of common lines interspersed with small hunks of differing lines.

For example, suppose the file F contains the three lines ‘a’, ‘b’, ‘c’, and the file G contains the same three lines in reverse order ‘c’, ‘b’, ‘a’. If diff finds the line ‘c’ as common, then the command ‘diff F G’ produces this output:

1,2d0
< a
< b
3a2,3
> b
> a

But if diff notices the common line ‘b’ instead, it produces this output:

1c1
< a
---
> c
3c3
< c
---
> a

It is also possible to find ‘a’ as the common line. diff does not always find an optimal matching between the files; it takes shortcuts to run faster. But its output is usually close to the shortest possible. You can adjust this tradeoff with the --minimal (-d) option (see diff Performance Tradeoffs).