Computer users often find occasion to ask how two files differ. Perhaps one file is a newer version of the other file. Or maybe the two files started out as identical copies but were changed by different people.
You can use the diff
command to show differences between two
files, or each corresponding file in two directories. diff
outputs differences between files line by line in any of several
formats, selectable by command line options. This set of differences is
often called a diff or patch. For files that are identical,
diff
normally produces no output; for binary (non-text) files,
diff
normally reports only that they are different.
You can use the cmp
command to show the byte and line numbers
where two files differ. cmp
can also show all the bytes
that differ between the two files, side by side. A way to compare
two files character by character is the Emacs command M-x
compare-windows. See Other Window in The GNU
Emacs Manual, for more information on that command.
You can use the diff3
command to show differences among three
files. When two people have made independent changes to a common
original, diff3
can report the differences between the original
and the two changed versions, and can produce a merged file that
contains both persons’ changes together with warnings about conflicts.
You can use the sdiff
command to merge two files interactively.
You can use the set of differences produced by diff
to distribute
updates to text files (such as program source code) to other people.
This method is especially useful when the differences are small compared
to the complete files. Given diff
output, you can use the
patch
program to update, or patch, a copy of the file. If you
think of diff
as subtracting one file from another to produce
their difference, you can think of patch
as adding the difference
to one file to reproduce the other.
This manual first concentrates on making diffs, and later shows how to use diffs to update files.
GNU diff
was written by Paul Eggert, Mike Haertel,
David Hayes, Richard Stallman, and Len Tower. Wayne Davison designed and
implemented the unified output format. The basic algorithm is described
by Eugene W. Myers in “An O(ND) Difference Algorithm and its Variations”,
Algorithmica Vol. 1, 1986, pp. 251–266,
http://dx.doi.org/10.1007/BF01840446; and in “A File
Comparison Program”, Webb Miller and Eugene W. Myers,
Software—Practice and Experience Vol. 15, 1985,
pp. 1025–1040,
http://dx.doi.org/10.1002/spe.4380151102.
The algorithm was independently discovered as described by Esko Ukkonen in
“Algorithms for Approximate String Matching”,
Information and Control Vol. 64, 1985, pp. 100–118,
http://dx.doi.org/10.1016/S0019-9958(85)80046-2.
Unless the --minimal option is used, diff
uses a
heuristic by Paul Eggert that limits the cost to O(N^1.5 log N)
at the price of producing suboptimal output for large inputs with many
differences. Related algorithms are surveyed by Alfred V. Aho in
section 6.3 of “Algorithms for Finding Patterns in Strings”,
Handbook of Theoretical Computer Science (Jan Van Leeuwen,
ed.), Vol. A, Algorithms and Complexity, Elsevier/MIT Press,
1990, pp. 255–300.
GNU diff3
was written by Randy Smith. GNU
sdiff
was written by Thomas Lord. GNU cmp
was written by Torbjörn Granlund and David MacKenzie.
GNU patch
was written mainly by Larry Wall and Paul Eggert;
several GNU enhancements were contributed by Wayne Davison and
David MacKenzie. Parts of this manual are adapted from a manual page
written by Larry Wall, with his permission.