regexps.com
Caution: Steep Learning Curve: The concepts and commands introduced in this chapter are likely to be unfamiliar to you, even if you have used other revision control systems. They're really quite simple once you get over the initial learning hurdle -- and after that they're very useful.
In a project tree, some of the files and directories are "part of
the source" -- they are of interest to arch
. Other files and
directories may be scratch files, editor back-up files, and temporary
or intermediate files generated by programs. Those other files should
be ignored or treated specially by most arch
commands.
This chapter discusses how arch
recognizes which files to pay
attention to, and which to ignore.
The command tla inventory --names --source
is used to print a list
of source files as determined by the naming conventions. It has many
options, including options to print other kinds of file lists (such as
a list of all editor backup files, or a list of all files which are
not source).
Let's suppose that after some editing, our source tree looks like this:
% ls hw.c hw.c.~1~ main.c {arch}
The file hw.c.~1~
is an editor backup file. tla
knows that
and omits that file from the source inventory:
% tla inventory --names --source ./hw.c ./main.c
tla
can give you other lists besides lists of source:
% tla inventory --names --backups ./hw.c.~1~
This section describes the default naming conventions used by arch
to pick out source files from other kinds of files. A later chapter
describes how to customize these conventions for a partiuclar tree
(see Customizing the inventory Naming Conventions).
The naming conventions are based on several categories of files:
. and .. These are simply ignored by arch excluded Excluded files are normally omitted from a listing, but if the `--all' flag is passed to `inventory', then these files are put into one of the categories below and included in the listing. source These are apparent source files precious These are non-source files that should not be automatically deleted junk These are non-source files that may be automatically deleted backups These are non-source files that may be automatically deleted, but any program that deletes them should treat them as editor backup files (e.g., keep the oldest and newest of them) unrecognized These are files that arch doesn't know how to classify -- they fit none of the naming conventions or that have names that appear to be "suspicious".
The algorithm for classifying files by name has several rules. For each file name, each of these rules is checked in the order listed here until the first rule is reached that classifies the file.
Exclude Dot Files The special files .
and ..
are always
excluded from inventory listings.
Non-portable Names are Unrecognized File names containing
whitespace, non-printing characters, or a "globbing character" are
always classified as unrecognized
. The globbing characters
are:
? [ ] * \
Excluded File Test If the --all
flag is not given to
inventory
, the file names matching the pattern for excluded files
are dropped from the listing. If the name of a directory is excluded,
the entire contents of that directory are skipped. By default, the
pattern for excluded files matches control files created by arch
itself:
^(.arch-ids|\{arch\})$
Junk File Test All file names reaching this step that begin with two
commas (,,
) are classified as junk
. Temporary files created by
arch
itself begin with two commas. In addition, any file name
matching the junk pattern are classified by junk
. By default, that
pattern matches any name beginning with (at least) one comma:
^,.*$
Incidentally, that default pattern gives rise to a handy trick. If you need to create a scratch file in a source tree, give it a name that begins with a single comma.
Backup File Test By default, a backup file is any file that reaches this step and matches one of the patterns:
^.*(~|\.~[0-9]+~)$ ^.*\.bak|\.orig|\.rej|\.original|\.modified|\.reject)$
Precious File Test By default, a precious file is any that reaches this step and matches one of the patterns:
^\+.*$ ^(\.gdbinit|\.#ckpts-lock)$ ^(=build\.*|=install\.*)$ ^(CVS|CVS\.adm|RCS|RCSLOG|SCCS|TAGS)$
Suspicious File Test (Unrecognized) Some file names reaching this step
are explicitly treated as unrecognized
on the presumption that they
should probably not be present in a source tree. By default, names
ending with any of these extensions are treated as unrecognized
:
.o .a .so .core
In addition, the filename core
is (by default) treated as
unrecognized
).
Source File Test Files reaching this step are compared to the pattern
for source files. The default pattern is shown below. You should
note that this pattern overlaps that for excluded
files given
above. If the --all
flag is given to inventory, the excluded
pattern isn't used, and files that would match it instead "fall
through" to later steps of this algorithm.
^([_=a-zA-Z0-9].*|\.arch-ids|\{arch\}|\.arch-project-tree)$
In other words, by default, the arch
control files and directories
are source (if not excluded). Files beginning with letters, numbers,
underscore, or an equal sign are source.
Unrecognized Files Any left-over file name reaching this step is
treated as unrecognized
.
Using our example, we can illustrate some of the naming conventions.
Recall that our project tree looks like this:
% ls hw.c hw.c.~1~ main.c {arch}
So the ordinary source listing is:
% tla inventory --names --source ./hw.c ./main.c
And all of the source files (none excluded from the list) is:
% tla inventory --names --source --all ./hw.c ./main.c ./{arch}/.arch-project-tree ./{arch}/=tagging-method
We can include directories in this listing:
% tla inventory --names --source --all --both ./hw.c ./main.c ./{arch} ./{arch}/.arch-project-tree ./{arch}/=tagging-method ./{arch}/hello-world ./{arch}/hello-world/hello-world--mainline [... output trimmed ...]
We can also look at some lists of non-source files:
% tla inventory --names --backups ./hw.c.~1~
The inventory
command has many options that you may wish to explore.
You can alter the patterns used by inventory
to classify files.
This is explained in a later chapter (see Customizing the inventory Naming Conventions).
Many systems provide naming conventions for recognizing source files
but users new to arch
often wonder why arch
needs so many
categories of files. Recall that arch
has the categories:
excluded source precious junk backups unrecognized
A rationale for each category is explained here:
excluded is provided simply to keep inventory listings brief in the
very common case that arch
control files are of no particular
interest. This is similar to the treatment of "dot files" by ls
and the --all
flag to inventory
is similar to the -a
flag to
ls
.
source is provides simply so that arch
can reliably distinguish
those files from others. For example, when comparing two source
trees, arch
compares only the files in the category source
.
precious files are those that arch
should make an effort to
preserve. For example, if arch
needs to make a copy of a project
tree for you, it copies the precious
files along with the source
.
Suppose, for example, that you are taking notes while working on
source. You don't want your file of notes to be mistaken for source,
but you also don't want them to be lost. A useful trick is to give the
file a precious
name (e.g. +notes
).
junk Often when working on a project tree, it's convenient to create
"throw-away" files. You might want to compile a quick test program
or save, for the moment, the output of some command. When enough of
these throw-away files have accumulated, it's handy to be able to get
rid of them all-at-once, without having to carefully identify which
files to toss, and which to keep. junk
names are perfect for this.
When you create one of these throw-away files, give it name like
,foo
. Later, you can feel confident and safe issuing commands
like:
% rm ,* % find . -name ',*' | xargs rm % tla inventory --junk | xargs rm
From arch's perspective, junk files have two important properties.
First, when copying a tree, the junk files are not copied. Second,
it is considered safe for arch to overwrite a junk file. In
practice, arch will only ever actually overwrite a junk file if that
junk file has a name that begins with ,,
.
backups Editor backup files and the backup files created by programs
like patch
often deserve special treatment. For example, if your
editor creates "numbered backups", those are almost junk files,
but rather than deleting all of them, you might want to delete only
some of them.
For arch, what is important is that when copying a tree, backup files should not also be copied. For users, what is hopefully most useful is that using the trick:
% tla inventory --junk | xargs rm
will not delete backup files.
unrecognized The appearance in a source tree of a file that doesn't
fit any known pattern (or that has a suspicious name) most likely
indicates that something has gone wrong. Rather than silently
ignoring such files or treating them as precious
or junk
, arch
explicitly flags these exceptions in order to be able to give
warnings to users.
Overall, adopting file naming conventions is a discipline that many
programmers may not be accustomed to, but it's one I strongly
recommend. It's easy to stick to these conventions and tools like
inventory
and tree-lint
(introduced later) help you to keep your
source from get out of control.
regexps.com