[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.4 Hierarchies

A hierarchy tends to be an organization where there is a directional one-to-many relationship between parent records and their associated children. combine works with hierarchies within reference files when the file has a record for each node and each record points to its parent in the hierarchy.

Because the hierarchy is assumed to be stored in a reference file, it is accessed by matching to a data record. Once an individual reference record has been matched to the data record, its relationship to other records within the hierarchy is followed through the hierarchy until there is no further to go.

The standard process is to assume that the key that matched to the data file key is at the top of the hierarchy. When traversing the hierarchy, combine looks for the key on the current record in the hierarchy key of other reference records. This repeats until there are no further linkages from one record to the next. For each record that is linked to the hierarchy, that record is treated as a reference record that matched the data record.

In this section, we’ll use the following hierarchy file. It is a simple hierarchy tree with ‘Grandfather’ as the top node and 2 levels of entries below.

 
Grandfather,
Father,Grandfather
Uncle,Grandfather
Me,Father
Brother,Father
Cousin,Uncle

If my data file consisted only of a record with the key ‘Grandfather’, then the following command would result in the records listed after it. Each record written includes the entry itself and its parent.

 
combine -D ',' -w -d ',' -r test1.tmp -k 1 -m 1 -a 2 -D ',' \
        -o 1-2 test2.tmp

Grandfather,   
Father,Grandfather
Me,Father
Brother,Father
Uncle,Grandfather
Cousin,Uncle

If we are only interested in the endpoints (in this case all the lowest-level descendants of ‘Grandfather’), we can use the option ‘-l’.

 
combine -D ',' -w -d ',' -r test1.tmp -k 1 -m 1 -a 2 -D ',' -o 1 \
        -l test2.tmp

Me
Brother
Cousin

We can arrive at the same number of records, each containing the entire hierarchy traversed to get to the leaf nodes, by using the option ‘--flatten-hierarchy’ (‘-F’). This option takes a number as an argument, and then includes information from that many records found in traversing the hierarchy, starting from the record that matched the data record. This example tells combine to report three levels from the matching ‘Grandfather’ record.

 
combine -D ',' -w -d ',' -r test1.tmp -k 1 -m 1 -a 2 -D ',' -o 1 \
        -F 3 test2.tmp

Grandfather,Father,Me
Grandfather,Father,Brother
Grandfather,Uncle,Cousin

As with other areas within combine, the hierarchy manipulation is extensible through Guile. The key fields can be modified as with any other fields. See section Field-specific extensions, for details. The matches within the hierarchy can be further filtered, using the ‘h’ suboption of the option ‘-x’. (see section Extending combine.) As with matches between reference records and data this filtering can allow you to perform fuzzy comparisons, to do more complex calculations to filter the match, or to decide when you have gone far enough and would like to stop traversing the hierarchy.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]

This document was generated by Daniel P. Valentine on July 28, 2013 using texi2html 1.82.