Next: Field Delimiters, Previous: Summary Statistics, Up: Usage Examples [Contents][Index]
If the input does not have a header line, use --header-out to add a header in the first line of the output, indicating which operation was performed:
$ datamash --sort --header-out groupby 2 min 3 max 3 < scores.txt GroupBy(field-2) min(field-3) max(field-3) Arts 46 88 Business 79 94 Engineering 39 99 Health-Medicine 72 100 Life-Sciences 14 91 Social-Sciences 27 90
If the input has a header line (first line containing column names), use --header-in to skip the line:
$ cat scores_h.txt Name Major Score Shawn Arts 65 Marques Arts 58 Fernando Arts 78 Paul Arts 63 ... $ datamash --sort --header-in groupby 2 mean 3 < scores_h.txt Arts 68.947 Business 87.363 Engineering 66.538 Health-Medicine 90.615 Life-Sciences 55.333 Social-Sciences 60.266
If the header line is not skipped, datamash
will show an error
(due to strict input validation):
$ datamash groupby 2 mean 3 < scores_h.txt datamash: invalid numeric value in line 1 field 3: 'Score'
Column names in the input header lines can be printed in the output header lines by using --headers (or -H, both are equivalent to --header-in --header-out):
$ datamash --sort --headers groupby 2 mean 3 < scores_h.txt GroupBy(Major) mean(Score) Arts 68.947 Business 87.363 Engineering 66.538 Health-Medicine 90.615 Life-Sciences 55.333 Social-Sciences 60.266
Or in short form (-sH instead of --sort --headers), equivalent to the above command:
$ datamash -sH groupby 2 mean 3
When the input file has a header line, column names can be used instead of column numbers. In the example below, Major is used instead of the value 2, and Score is used instead of the value 3:
$ datamash --sort --headers groupby Major mean Score < scores_h.txt GroupBy(Major) mean(Score) Arts 68.947 Business 87.363 Engineering 66.538 Health-Medicine 90.615 Life-Sciences 55.333 Social-Sciences 60.266
datamash
will read the first line of the input, and deduce
the correct column number based on the given name. If the column name
is not found, an error will be printed:
$ datamash --sort --headers groupby 2 mean Foo < scores_h.txt datamash: column name 'Foo' not found in input file
Field names must be escaped with a backslash if they start with a digit or contain special characters (dash/minus, colons, commas). Note the interplay between escaping with backslash and shell quoting. The following equivalent command sum the values of a field named ‘FOO-BAR’:
$ datamash -H sum FOO\\-BAR < input.txt $ datamash -H sum 'FOO\-BAR' < input.txt $ datamash -H sum "FOO\\-BAR" < input.txt
Next: Field Delimiters, Previous: Summary Statistics, Up: Usage Examples [Contents][Index]