Introducing Changesets

The Hackerlab at regexps.com

Introducing Changesets

up: arch Meets hello-world
next: Exploring Changesets
prev: The update/commit Style of Cooperation

It is often extremely useful to compare two project trees (usually for the same project) and figure out exactly what has changed between them. A record of such changes is called a changeset or a delta .

Changesets are a very central concept to arch -- much of arch is defined in terms of operations performed with changesets.

If you have a changeset between an "old tree" and a "new tree", you can "apply the changeset" to the old tree to get the new tree -- in other words, you can automatically make the editing changes described by a changeset. If you have some third tree, you can apply the patch to get an approximation of making the same changes to that third tree.

arch includes sophisticated tools for creating and applying changesets.

mkpatch

up: Introducing Changesets
next: dopatch

mkpatch computes a changeset describing the differences between two trees. The basic command syntax is:

        % tla mkpatch ORIGINAL MODIFIED DESTINATION

which compares the trees ORIGINAL and MODIFIED .

mkpatch creates a new directory, DESTINATION , and stores the changeset there.

When mkpatch compares trees, it uses inventory ids. For example, it considers two directories or two files to be "the same directory (or file)" if they have the same id -- regardless of where each is located in its respective tree. (See Inventory Ids for Source.)

A changeset produced by mkpatch describes what files and directories have been added or removed, which have been renamed, which files have been changed (and how they have been changed), and what file permissions have changed (and how). When regular text files are compared, mkpatch produces a context diff describing the differences. mkpatch can compare binary files (saving complete copies of the old and new versions if they differ) and symbolic links (saving the old and new link targets, if they differ).

A detailed description of the format of a changeset is provided in an appendix (see The arch Changeset Format).

dopatch

up: Introducing Changesets
next: Inexact Patching -- How Conflicts are Handled
prev: mkpatch

dopatch is used to apply a changeset to tree:

        % tla dopatch PATCH-SET TREE

If tree is exactly the same as the the "original" tree seen by mkpatch , then the effect is to modify tree so that it is exactly the same as the the "modified" tree seen by mkpatch , with one exception (explained below).

"Exactly the same" means that the directory structure is the same, symbolic link targets are the same, the contents of regular files are the same, and file permissions are the same. Modification times, files with multiple (hard) links, and file ownership are not reliably preserved.

The exception to the "exactly the same" rule is that if the patch requires that files or directories be removed from tree , those files and directories will be saved in a subdirectory of tree with an eye-splitting name matching the pattern:

        ++removed-by-dopatch-PATCH--DATE

where PATCH is the name of the patch-set directory and DATE a timestamp.

Inexact Patching -- How Conflicts are Handled

up: Introducing Changesets
prev: dopatch

What if a tree patched by dopatch is not exactly the same as the original tree seen by mkpatch ?

Below is a brief description of what to expect. Complete documentation of the dopatch process is included with the source code.

dopatch takes an inventory of the tree being patched. It uses inventory ids to decide which files and directories expected by the changeset are present or missing from the tree, and to figure out where each file and directory is located in the tree.

Simple Patches If the changeset contains an ordinary patch or metadata patch for a link, directory or file, and that file is present in the tree, dopatch applies the patch in the ordinary way. If the patch applies cleanly, the modified file, link, or directory is left in place.

If a simple patch fails to apply cleanly, dopatch will always leave behind a .orig file (the file originally in the tree being patched, without any changes) and a .rej file (the part of the patch that could not be applied).

If the patch was a context diff, dopatch will also leave behind the file itself -- partially patched.

If an (unsuccessful) patch was for a binary file, no partially-patched file will be left. Instead, there will be:

        .orig   -- the file originally in the tree being patched,
                   without modifications.

        .rej    -- a complete copy of the file from the modified tree,
                   with permissions copied from `.orig'.

        .patch-orig -- a complete copy of the file from the original
                       tree seen by `mkpatch', with permissions
                       retained from that original

                        -or-

                       the symbolic link from the original tree seen
                       by `mkpatch' with permissions as in the original
                       tree.

If an (unsuccessful) patch was for a symbolic link, no partially patched file will be left. Instead there will be:

        .orig   -- the unmodified file from the original tree

        .rej    -- a symbolic link with the target intended by the
                   patch and permissions copied from .orig

        .patch-orig -- a complete copy of the file from the original
                       tree seen by `mkpatch', with permissions 
                       retained from that original

                        -or-

                       the symbolic link from the original tree seen
                       by `mkpatch' with permissions as in the original
                       tree.

Patches for Missing Files

All patches for missing files and directories are stored in a subdirectory of the root of the tree being patched called

        ==missing-file-patches-PATCH-DATE

where PATCH is the basename of the changeset directory and DATE a time-stamp.

Directory Rearrangements and New Directories

Directories are added, deleted, and rearranged much as you would expect, even if you don't know it's what you'd expect.

Suppose that when mkpatch was called the ORIGINAL tree had:

        Directory or file:              Id:

        a/x.c                           id_1
        a/bar.c                         id_2

but the MODIFIED tree had:

        a/x.c                           id_1
        a/y.c                           id_2

with changes to both files. The patch will want to rename the file with id id_2 to y.c , and change the contents of the files with ids id_1 and id_2 .

Suppose, for example, that you have a tree with:

        a/foo.c                         id_1
        a/zip.c                         id_2

and the you apply the patch to that tree. After the patch, you'll be left with:

        a/foo.c                         id_1
        a/y.c (was zip.c)               id_2

with patches made to the contents of both files.

Here's a sample of some subtleties and ways of handling conflicts:

Suppose that the original tree seen by mkpatch has:

        Directory or file:              Id:

        ./a                             id_a
        ./a/b                           id_b
        ./a/b/c                         id_c

and that the modified directory has:

        ./a                             id_a
        ./a/c                           id_c
        ./a/c/b                         id_b

Finally, suppose that the tree has:

        ./x                             id_a
        ./x/b                           id_b
        ./x/c                           id_new_directory
        ./x/c/b                         id_different_file_named_b
        ./x/c/q                         id_c

When patch gets done with the tree, it will have:

        ./x                             id_a
                Since the patch doesn't do anything 
                to change the directory with id_a.

        ./x/c.orig                      id_new_directory
        ./x/c.rej                       id_c
                Since the patch wants to make the 
                directory with id_c a subdirectory named "c"
                of the directory with id_a, but the tree
                already had a different directory there,
                with the id id_new_directory.

        ./x/c.rej/b                     id_b
                Since the patch wants to rename the directory
                with id_b to be a subdirectory named "b"
                of the directory with id_c.

        ./x/c.orig/b                    id_different_file_named_b
                Since the patch made new changes to this file,
                it stayed with its parent directory.

arch Meets hello-world: A Tutorial Introduction to The arch Revision Control System
The Hackerlab at regexps.com