[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
combine
If combine
was built with Guile (GNU’s Ubiquitous Intelligent
Language for Extensibility), you can do anything you want (within reason) to
extend combine
. This would have been set up when combine
was
compiled and installed on your computer. In a number of places, there
are built-in opportunities to call Guile with the data that is currently in
process. Using these options, you can use your favorite modules or write your
own functions in scheme to manipulate the data and to adjust how combine
operates on it.
The most common method (in my current usage) of extending combine
is
to alter the values of fields from the input files before they are used
for matching or for output. This is done inside the field list by
adding the scheme statement after the range and precision. This is
covered in the section on field specifications. See section Field-specific extensions, for details.
Another useful option is the ability to initialize Guile with your own
program. To do this, you can use the ‘--extension-init-file’ (or
‘-X’) followed by a file name. combine
will load that
scheme file into Guile before any processing. In that way your
functions will be available when you need them in the running of the
program. It certainly beats writing something complicated on the
command line.
In addition, there are Guile modules included in the distribution, which can be used in extension scripts.
4.1 Extension Options | ||
4.2 Guile Modules |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The remaining extensibility options are called at various points in the program: when it starts, when a file is started, when a match is found, when a record is read, when a record is written, when a file is closed, and at the very end of the program. The options are listed below along with the way to get access to the relevant data.
The various non-field=specific options are as follows. They all occur as arguments to the option ‘--extension’ (or ‘-x’).
Filter records from the current file using the scheme command provided. The scheme command must return ‘#t’ (to keep processing the record) or ‘#f’ (to ignore this record and move on to the next). The variables ‘reference-field-n’ or ‘data-field-n’ will be available to the scheme command, depending on whether the record to be filtered is from the data file or a reference file. In the variable names ‘n’ represents the number of the specified output field, numbered from 1.
Validate a proposed match using the scheme command provided. The scheme
command must return ‘#t’ (to confirm that this is a good match) or
‘#f’ (to tell combine
that this is not a match). The variables
‘reference-field-n’ and ‘data-field-n’ will be available to the
scheme command from the reference and data records involved in a
particular match. In the variable names ‘n’ represents the number
of the specified output field, numbered from 1. The extension
specification affects the match between the data file and the last named
reference file.
Validate a proposed match between two records in the same hierarchy using
the scheme command provided. The scheme command must return ‘#t’
(to confirm that this is a good match) or ‘#f’ (to tell combine
that this is not a match). The variables ‘reference-field-n’ and
‘prior-reference-field-n’ will be available to the scheme command
from the prior and current reference records involved in a particular
match. In the variable names ‘n’ represents the number of the
specified output field, numbered from 1. The extension specification
affects the match while traversing the hierarchs in the last named
reference file.
Modify a record that has just been read using the scheme command provided. The scheme command must return a string, which will become the new value of the input record to be processed. The input record iteself can be referred to in the scheme command by using the variable ‘input-record’ in the scheme command at the right place. The records affected by this option are the records from the most recently named reference file, or from the data file if no reference file has yet been named.
As an example, consider that you may have received a file from someone
who strips all the trailing spaces from the end of a record, but you
need to treat it with a fixed-width record layout. Assuming that you
have defined a scheme function rpad
in the initialization file
‘util.scm’, you can use the following command to get at the field
in positions 200-219, with spaces in place of the missing rest of the
record.
combine -X util.scm -x 'r(rpad input-record 219 #\space)' \ -o 200-219 trimmed_file.txt |
The same syntax works with the other ‘--extension’ options.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Here we talk about Guile modules that are distributed with combine
.
At the moment, those are limited to date processing.
In addition, the file ‘util.scm’ in the distribution contains a few functions I have found handy. They are not documented here, and the file doesn’t get installed automatically.
4.2.1 Calendar Functions | ||
4.2.2 Calendar Reference | ||
4.2.3 Calendar Parsing |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Included in the combine
package are two Guile modules to work with dates
from a number of calendars, both obscure and common. The basis for them is
the set of calendar functions that are shipped with Emacs.
The reason that these functions deserve special notice here is that date comparisons are a common type of comparison that often cannot be made directly on a character string. For example I might have trouble knowing if "20030922" is the same date as "22 September 2003" if I compared strings; however, comparing them as dates allows me to find a match. We can even compare between calendars, ensuring that "1 Tishri 5764" is recognized as the same date as "20030927".
The calendar module can be invoked as (use-modules (combine_scm calendar))
.
It provides functions for converting from a variety of calendars to and from
and absolute date count, whose 0-day is the imaginary date 31 December 1 B.C.
In the functions, the absolute date is treated as a single number, and the
dates are lists of numbers in (month day year)
format unless otherwise
specified.
The calendar functions are as follow:
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Here are some variables that can be used as references to get names associated with the numbers that the date conversion functions produce for months.
An associative list giving the weekdays in the Gregorian calendar in a variety of languages. Each element of this list is a list composed of a 2-letter language code (lowercase) and a list of 7 day names.
An associative list giving the months in the Gregorian calendar in a variety of languages. Each element of this list is a list composed of a 2-letter language code (lowercase) and a list of 12 month names.
A list of the months in the Islamic calendar.
A list of the months in the standard Hebrew calendar.
A list of the months in the leap year Hebrew calendar.
A list of the months in the French Revolutionary calendar.
A list of the months in the French Revolutionary calendar, using multibyte codes to represent the accented characters.
A list of the days in the French Revolutionary calendar.
A list of the special days (non weekdays) in the French Revolutionary calendar, using multibyte codes to represent the accented characters.
A list of the special days (non weekdays) in the French Revolutionary calendar.
A list of the months in the Coptic calendar.
A list of the months in the Ethiopic calendar.
A list of the months in the Persian calendar.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The calendar parsing module can be invoked as (use-modules (combine_scm parse))
.
The most useful function in the module is parse-date
. It takes as arguments
a date string and an output format. The date string is parsed as well as possible
in descending order of preference for format in case of ambiguity. The function
returns the date triplet (or other such representation) suggested by the format
string.
The supported format strings are the words in the function names of the form
calendar-xxxx-from-absolute
that would take the place of the xxxx
.
See section Calendar Functions, for more information.
The parsing of the date string depends on the setting of a couple of variables. Look inside the file ‘parse.scm’ for details. The list parse-date-expected-order lists the order in which the parser should look for the year, month, and day in case of ambiguity. The list parse-date-method-preference give more general format preferences, such as 8-digit, delimited, or a word for the month and the expected incoming calendar.
Here are a few examples of passing a date and putting it out in some formats:
guile> (use-modules (combine_scm parse)) guile> (parse-date "27 September 2003" "gregorian") (9 27 2003) guile> (parse-date "27 September 2003" "julian") (9 14 2003) |
The 13 day difference in the calendars is the reason that the Orthodox Christmas is 2 weeks after the Roman Catholic Christmas.
guile> (parse-date "27 September 2003" "hebrew") (7 1 5764) |
Note that the Hebrew date is Rosh HaShannah, the first day of the year 5764. The reason that the month is listed as 7 rather than 1 is inherited from the Emacs calendar implementation. Using the month list in calendar-hebrew-month-name-array-common-year or calendar-hebrew-month-name-array-leap-year correctly gives "Tishri", but since the extra month (in years that have it) comes mid-year, the programming choice that I carried forward was to cycle the months around so that the extra month would come at the end of the list.
guile> (parse-date "27 September 2003" "islamic") (7 30 1424) guile> (parse-date "27 September 2003" "iso") (39 6 2003) |
This is the 6th day (Saturday) of week 39 of the year.
guile> (parse-date "27 September 2003" "mayan-long-count") (12 19 10 11 7) |
I won’t get into the detail, but the five numbers reflect the date in the Mayan calendar as currently understood.
Generally, I’d recommend using the more specific functions if you are sure of the date format you expect. For comparing dates, I would further recommend comparing the absolute day count rather than any more formatted format.
[ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This document was generated by Daniel P. Valentine on July 28, 2013 using texi2html 1.82.