[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
A hierarchy tends to be an organization where there is a directional
one-to-many relationship between parent records and their associated
children. combine
works with hierarchies within reference files
when the file has a record for each node and each record points to its
parent in the hierarchy.
Because the hierarchy is assumed to be stored in a reference file, it is accessed by matching to a data record. Once an individual reference record has been matched to the data record, its relationship to other records within the hierarchy is followed through the hierarchy until there is no further to go.
The standard process is to assume that the key that matched to the
data file key is at the top of the hierarchy. When traversing the
hierarchy, combine
looks for the key on the current record in
the hierarchy key of other reference records. This repeats until
there are no further linkages from one record to the next. For each
record that is linked to the hierarchy, that record is treated as a
reference record that matched the data record.
In this section, we’ll use the following hierarchy file. It is a simple hierarchy tree with ‘Grandfather’ as the top node and 2 levels of entries below.
Grandfather, Father,Grandfather Uncle,Grandfather Me,Father Brother,Father Cousin,Uncle |
If my data file consisted only of a record with the key ‘Grandfather’, then the following command would result in the records listed after it. Each record written includes the entry itself and its parent.
combine -D ',' -w -d ',' -r test1.tmp -k 1 -m 1 -a 2 -D ',' \ -o 1-2 test2.tmp Grandfather, Father,Grandfather Me,Father Brother,Father Uncle,Grandfather Cousin,Uncle |
If we are only interested in the endpoints (in this case all the lowest-level descendants of ‘Grandfather’), we can use the option ‘-l’.
combine -D ',' -w -d ',' -r test1.tmp -k 1 -m 1 -a 2 -D ',' -o 1 \ -l test2.tmp Me Brother Cousin |
We can arrive at the same number of records, each containing the entire hierarchy
traversed to get to the leaf nodes, by using the option ‘--flatten-hierarchy’
(‘-F’). This option takes a number as an argument, and then includes
information from that many records found in traversing the hierarchy, starting from
the record that matched the data record. This example tells combine
to report
three levels from the matching ‘Grandfather’ record.
combine -D ',' -w -d ',' -r test1.tmp -k 1 -m 1 -a 2 -D ',' -o 1 \ -F 3 test2.tmp Grandfather,Father,Me Grandfather,Father,Brother Grandfather,Uncle,Cousin |
As with other areas within combine
, the hierarchy manipulation is extensible
through Guile. The key fields can be modified as with any other fields. See section Field-specific extensions, for details. The matches within the hierarchy
can be further filtered, using the ‘h’ suboption of the option ‘-x’.
(see section Extending combine
.) As with matches between reference records and data
this filtering can allow you to perform fuzzy comparisons, to do more complex
calculations to filter the match, or to decide when you have gone far enough and would
like to stop traversing the hierarchy.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] |
This document was generated by Daniel P. Valentine on July 28, 2013 using texi2html 1.82.