[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
A reference file record is expected to match on a set of key fields to a data file record. The parts of a reference file that are necessary for processing are read entirely into memory. You can specify as many reference files as you want, depending only on the amount of memory your system can spare to hold them. For any reference file, it is minimally required that you specify a file name, a specification of the key fields in the reference file, and a specification of the matching key fields in the data file.
The following are the options that are related to reference files. They are all positional, and they apply to the processing of the previously named reference file. (Except of course for the reference file name itself, which applies to itself.)
Use filename as a reference file to match to the data file in
processing. This option introduces a block of positional options that
relate to this reference file’s processing in combine
.
Use the fields specified by range_string as a key to match to a corresponding key in the data file.
Use the fields specified by range_string as the corresponding key to a key taken from a reference file.
Use the fields specified by range_string as a key to perform a recursive hierarchical match within the reference file. This key will be matched against values specified in the regular key on the reference file.
Keep only one record for the reference file in memory for each distinct
key. By default combine
maintains all the records from the
reference file in memory for processing. This default allows for
cartesian products when a key exists multiple times in both the
reference and data files.
Use the number provided as a base size for allocating a hash table
to store the records from this reference file. If this number is too
small, combine
will fail when it tries to record a record it has no
room for. If it is only a little bit too small, it will cause
inefficiency as searching for open space in the hash table will be
difficult.
One of the keywords binary
, number
, beginning
,
or end
, indicating how to turn the key into a number with
the best variability and least overlap. The wise choice of this option
can cut processing time significantly. The binary option is the default,
and treats the last few bytes (8 on most computers) of the key string(s)
as a big number. The number option converts the entire key to a number
assuming it is a numeric string. The other two take the least significant
3 bits from each of the first or last few (21 where a 64 bit integer is
available) bytes in the key strings and turns them into a number.
Signals the program that output records should be written for every record stored for this reference file. This will either be one record for every record in the reference file or one record for every distinct set of keys in the reference file, depending on the setting of the option ‘--unique’. The record written will include all specified output fields from the reference file record, any specified constant value for this reference file, and any flag, counter, or sums requested.
If provided, write the output based on this reference file to
filename. Otherwise the output will go to stdout
. This
option only makes sense if you plan to write output based on this
reference file.
Write the fields specified by range_string as part of the record
in any reference-file- or data-file-based output. The range
specifications share a common format with all field specifications for
combine
.
Write string to the reference- or data-file-based output.
When traversing the hierarchy from a given reference-file record, use the values on that record in the ‘--hierarchy-key-fields’ fields to connect to the ‘--key-fields’ fields of other records from the reference file. For most purposes, the presence of the connection on the first record suggests a single parent in a standard hierarchy. The hierarchy traversal stops when the ‘--hierarchy-key-fields’ fields are empty.
If this option is not set, the ‘--key-fields’ fields are used to search for the same values in the ‘--hierarchy-key-fields’ fields of other records in the same file. This allows multiple children of an initial record, and suggests going down in the hierarchy. The hierarchy traversal stops when no further connection can be made. The traversal is depth-first.
When traversing a hierarchy, treat only the endpoints as matching records. Nodes that have onward connections are ignored except for navigating to the leaf nodes.
When traversing a hierarchy, act as the ‘hierarchy-leaf-only’, except save information about the intervening nodes. Repeat the ‘output-fields’ fields number times (leaving them blank if there were fewer levels), starting from the first reference record matched.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This document was generated by Daniel P. Valentine on July 28, 2013 using texi2html 1.82.