Next: Invoking datamash
, Previous: Datamash, Up: Datamash [Contents][Index]
The datamash
program
(https://www.gnu.org/software/datamash) performs calculation (e.g.
sum,, count, min, max, skewness,
standard deviation) on input files.
Example: sum up the values in the first column of the input:
$ seq 10 | datamash sum 1 55
datamash
can group input data and perform operations on each group.
It can sort the file, and read header lines.
Example: Given a file with three fields (name, subject, score), find the average score in each subject:
$ cat scores.txt Name Subject Score Bryan Arts 68 Isaiah Arts 80 Gabriel Health-Medicine 100 Tysza Business 92 Zackery Engineering 54 ... $ datamash --sort --headers --group 2 mean 3 sstdev 3 < scores.txt GroupBy(Subject) mean(Score) sstdev(Score) Arts 68.9474 10.4215 Business 87.3636 5.18214 Engineering 66.5385 19.8814 Health-Medicine 90.6154 9.22441 Life-Sciences 55.3333 20.606 Social-Sciences 60.2667 17.2273
datamash
is designed for interactive exploration of textual data
and for automating tasks in shell scripts.
datamash
has a rich set of statistical functions to quickly assess
information in textual input files. An example of calculating basic statistic
(mean, 1st quartile, median, 3rd quartile, IQR, sample-standard-deviation,
and p-value of Jarque-Bera test for normal distribution:
$ datamash -H mean 1 q1 1 median 1 q3 1 iqr 1 sstdev 1 jarque 1 < FILE mean(x) q1(x) median(x) q3(x) iqr(x) sstdev(x) jarque(x) 45.32 23 37 61.5 38.5 30.4487 8.0113-09