Next: Crosstab - Cross-Tabulation (pivot-tables), Previous: Groupby on /etc/passwd, Up: Usage Examples [Contents][Index]
datamash
check validates the tabular structure of a
file, ensuring all lines have the same number of
fields. check is meant to be used in scripting and automation
pipelines, as it will terminate with non-zero exit code if the file is
not well structured, while also printing detailed context information
about the offending lines:
$ cat good.txt A 1 ww B 2 xx C 3 yy D 4 zz $ cat bad.txt A 1 ww B 2 xx C 3 D 4 zz $ datamash check < good.txt && echo ok || echo fail 4 lines, 3 fields ok $ datamash check < bad.txt && echo ok || echo fail line 2 (3 fields): B 2 xx line 3 (2 fields): C 3 datamash: check failed: line 3 has 2 fields (previous line had 3) fail
check accepts optional lines and fields and will return failure if the input does not have the requested number of lines/fields.
The syntax is:
datamash check [N lines] [N fields]
Usage examples:
$ cat file.txt A 1 ww B 2 xx C 3 yy D 4 zz $ datamash check 4 lines < file.txt && echo ok 4 lines, 3 fields ok $ datamash check 3 fields < file.txt && echo ok 4 lines, 3 fields ok $ datamash check 4 lines 3 fields < file.txt && echo ok 4 lines, 3 fields ok $ datamash check 7 fields < file.txt && echo ok line 1 (3 fields): A 1 ww datamash: check failed: line 1 has 3 fields (expecting 22) $ datamash check 10 lines < file.txt && echo ok datamash: check failed: input had 4 lines (expecting 10)
For convenience, line,row,rows can be used instead of lines; field,columns,column,col can be used instead of fields. The following are all equivalent:
datamash check 4 lines 10 fields < file.txt datamash check 4 rows 10 columns < file.txt datamash check 10 col 4 row < file.txt
In pipeline/automation context, it is often beneficial to validate files as early as possible (immediately after file is created, as in fail-fast methodology). A typical usage in a shell script would be:
#!/bin/sh die() { base=$(basename "$0") echo "$base: error: $@" >&2 exit 1 } custom pipeline-or-program > output.txt \ || die "program failed" datamash check < output.txt \ || die "'output.txt' has invalid structure (missing fields)"
If the generated output.txt file has invalid structure
(i.e. missing fields), datamash
will print the stderr
enough details to help in troubleshooting (line numbers and offending
line’s content).
Next: Crosstab - Cross-Tabulation (pivot-tables), Previous: Groupby on /etc/passwd, Up: Usage Examples [Contents][Index]