GNU Coreutils version sort implements specialized handling of strings that look like file names with extensions. This enables slightly more natural ordering of file names.
The following additional rules apply when comparing two strings where both begin with non-‘.’. They also apply when comparing two strings where both begin with ‘.’ but neither is ‘.’ or ‘..’.
(\.[A-Za-z~][A-Za-z0-9~]*)*$
in the C locale.
The longest such match is used, except that a suffix is not
allowed to match an entire nonempty string.
Examples for rule 1:
Examples for rule 2:
Example for rule 3:
Examples for rule 4:
How does the suffix-removal algorithm effect ordering results?
Consider the comparison of hello-8.txt and hello-8.2.txt.
Without the suffix-removal algorithm, the strings will be broken down to the following parts:
hello- vs hello- (rule 2, all non-digits) 8 vs 8 (rule 3, all digits) .txt vs . (rule 2) empty vs 2 empty vs .txt
The comparison of the third parts (‘.’ vs ‘.txt’) will determine that the shorter string comes first – resulting in hello-8.2.txt appearing first.
Indeed this is the order in which Debian’s dpkg
compares the strings.
A more natural result is that hello-8.txt should come before hello-8.2.txt, and this is where the suffix-removal comes into play:
The suffixes (‘.txt’) are removed, and the remaining strings are broken down into the following parts:
hello- vs hello- (rule 2, all non-digits) 8 vs 8 (rule 3, all digits) empty vs . (rule 2) empty vs 2
As empty strings sort before non-empty strings, the result is ‘hello-8’ being first.
A real-world example would be listing files such as: gcc_10.fc9.tar.gz and gcc_10.8.12.7rc2.fc9.tar.bz2: Debian’s algorithm would list gcc_10.8.12.7rc2.fc9.tar.bz2 first, while ‘ls -v’ will list gcc_10.fc9.tar.gz first.
These priorities make sense for ‘ls -v’: Versioned files will be listed in a more natural order.
For ‘sort -V’ these priorities might seem arbitrary. However,
because the sorting code is shared between the ls
and sort
program, the ordering rules are the same.