recutils supports aggregate functions. These are so called because they accept a record set and a field name as inputs and generate a single result. Usually this result is numerical.
The supported aggregate functions are the following:
Count(FIELD)
Counts the number of occurrences of a field.
Avg(FIELD)
Calculates the average (mean) of the numerical values of a field.
Sum(FIELD)
Calculates the sum of the numerical values of a field.
Min(FIELD)
Calculates the minimum numerical value of a field.
Max(FIELD)
Calculates the maximum numerical value of a field.
The aggregate functions are to be invoked in the field expressions in
recsel
. By default they are applied to the totality of the
records in a record set. For example, using the items database from
the previous section, we can do calculations as in the following examples.
The SQL aggregate functions can be applied to the totality of the
tuples in the relation. For example, using the Count
aggregate
function we can calculate the number of fields named Category
present in the record set as follows:
$ recsel -p "Count(Category)" items.rec Count_Category: 5
The result is a field whose name is derived from the function name and the field passed as its parameter, separated by an underline. This name scheme probably suffices for most purposes, but it is always possible to use a rewrite rule to obtain something different:
$ recsel -p "Count(Category):NumCategories" items.rec NumCategories: 5
You can use different letter case in writing the name of the aggregate, and this will be reflected in the field name:
$ recsel -p "CoUnT(Category)" items.rec CoUnT_Category: 5
It is possible to use more than one aggregate function in the field
expression. Suppose we are also interested in the average price of
the items we sell. We can use the Avg
aggregate:
$ recsel -p "Count(Category),Avg(Price)" items.rec Count_Category: 5 Avg_Price: 4.240000
Now let’s add a field along with an aggregate function to the field expression and see what we get:
$ recsel -p "Type,Avg(Price)" items.rec Type: EC Car Avg_Price: 12.200000 Type: Terria Avg_Price: 0.600000 Type: Typex Avg_Price: 1.200000 Type: Notebook Avg_Price: 1 Type: Sexy Puzzle Avg_Price: 6.200000
We get five records! The reason is that when only aggregate functions are part of the field expression, they are applied to the single record that would result from concatenating all the records in the record set together. However, when a regular field appears in the field expression the aggregate functions are applied to the individual records. This is still useful in some cases, such as a database of maintainers:
Name: Jose E. Marchesi Email: jemarch@gnu.org Email: jemarch@es.gnu.org Name: Luca Saiu Email: positron@gnu.org
Lets see how many emails each maintainer has:
$ recsel -p "Name,Count(Email)" maintainers.rec Name: Jose E. Marchesi Count_Email: 2 Name: Luca Saiu Count_Email: 1
Aggregate functions are most useful when we combine them with grouping. This is when we are interested in some property of a subset of the records in the database. For example, the average prices of each item category stored in the database can be obtained by executing:
$ recsel -p "Category,Avg(Price)" -G Category items.rec Category: Food Avg_Price: 0.600000 Category: Office Avg_Price: 1.100000 Category: Toy Avg_Price: 9.200000
If we were interested in the actual prices that result in each average we can do:
$ recsel -p "Category,Price,Avg(Price)" -G Category items.rec Category: Food Price: 0.60 Avg_Price: 0.600000 Category: Office Price: 1.20 Price: 1.00 Avg_Price: 1.100000 Category: Toy Price: 12.2 Price: 6.20 Avg_Price: 9.200000