[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
combine
Processes FilesThe base of combine
reads records from a data file (or a series of
them in a row) and if there is an output request for data records, it
writes the requested fields out to a file or to stdout
. Here is
an example of this most simple version of events.
combine --write-output --output-fields=1- |
This is essentially an expensive pipe. It reads from stdin
and
writes the entire record back to stdout
.
Introducing a reference file gives more options. Now combine
reads
the reference file into memory before reading the data file. For every
data record, combine
then checks to see if it has a match. The
following example limits the simple pipe above by restricting the output
to those records from stdin
that share the first 10 bytes in
common with a record in the reference file.
combine -w -o 1- -r reference_file.txt --key-fields=1-10 \ --data-key-fields=1-10 --unique |
Note that the option ‘--unique’ is used here to prevent more than
one copy of a key from being stored by combine
. Without it,
duplicate keys in the reference file, when matched, would result in
more than one copy of the matching data record.
The other option with a reference file is to have output based on the
records in that file, with indicators of how the data file records were
able to match to them. In the next example, the same match as above is
done, but this time we write out a record for every unique key, with a
flag set to ‘1’ if it was matched by a data record or ‘0’
otherwise. It still reads the data records from stdin
and writes
the output records to stdout
.
combine -r -f reference_file.txt -k 1-10 -m 1-10 -u -w -o 1-10 |
Of course, you might want both sets of output at the same time: the list
of data records that matched the keys in the reference file and a list
of keys in the reference file with an indication of which ones were
matched. In the prior two examples the two different kinds of output
were written to stdout
. You can still do that if you like, and
then do a little post-processing to determine where the data-based
records leave off and the reference-based records begins. A simpler
way, however, is to let combine
write the information to separate
files.
In the following example we combine the output specifications from the prior two examples and give them each a filename. Note that the first one has a spelled-out ‘--output-file’ while the second one uses the shorter 1-letter option ‘-t’.
combine -w -o 1- --output-file testdata.txt \ -r -f reference_file.txt -k 1-10 -m 1-10 \ -u -w -o 1-10 -t testflag.txt |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] |
This document was generated by Daniel P. Valentine on July 28, 2013 using texi2html 1.82.