combine Manual 0.4.0: 5.1 Rearranging Fields

5.1 Rearranging Fields

When you do not use any reference files, combine still gives you the opportunity to create a new record layout based on the records you read.

This is an advantage over the cut utility because while cut only allows you to omit portions of the record, combine also allows you to reorder those fields you keep and to add a constant field somewhere in the order. In addition, combine gives you the chance to convert between fixed-width and delimited formats, where cut keeps the format you started with (although the GNU version does let you change delimiters).

Clearly, flexible tools like awk or sed or any programming language will also make this kind of thing (and with a little work anything combine can do) possible. It may be that they are a more efficient choice, but I have never tested it.

As an example, here is a fixed width file, which contains in its record layout some address book information. If I need to make a tab-delimited file of names and phone numbers to upload into my mobile phone, I can use the command that follows to get the output I want.

$ cat testadd.txt
2125551212Doe       John      123 Main StreetNew York  NY10001
2025551212Doe       Mary      123 Main StreetWashingtonDC20001
3015551212Doe       Larry     123 Main StreetLaurel    MD20707
6175551212Doe       Darryl    123 Main StreetBoston    MA02115
6035551212Doe       Darryl    123 Main StreetManchesterNH02020

Here is a command that grabs the first and last name and the phone number and tacks the word "Home" on the end so that my phone marks the number with a little house.(2)

Note that the statistics and the output all show up on the screen if you do not say otherwise. The statistics are on stderr and the output on stdout, so you can redirect them differently. You can also use the option ‘--output-file’ (or ‘-t’) to provide an output file, and you can suppress the statistics if you want with ‘--no-statistics’.

% combine --write-output --output-field-delimiter="	" \
          --output-fields=21-30,11-20,1-10 \
          --output-constant="Home" testadd.txt
Statistics for data file testadd.txt
  Number of records read:                            5
  Number of records dropped by filter:               0
  Number of records matched on key:                  5
  Number of records written:                         5
John	Doe	2125551212	Home
Mary	Doe	2025551212	Home
Larry	Doe	3015551212	Home
Darryl	Doe	6175551212	Home
Darryl	Doe	6035551212	Home

The delimiter between the quotes and in the output was a tab character, and it worked, but in some formats it comes out in a space when printed.

For reference, here is a comparable SQL query that would select the same data assuming a table were set up containing the data in the file above.

SELECT First_Name, Last_Name, Phone_Number, 'Home'
  FROM Address_Book;

A comparable gawk program would be something like this.

BEGIN {OFS = "\t"}
{ 
  print substr ($0, 21, 10), substr ($0, 11, 10), substr ($0, 1, 10), "Home";
  }

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

This document was generated by Daniel P. Valentine on July 28, 2013 using texi2html 1.82.