How can I numerically sort a single line of delimited items?

I have a line (or many lines) of numbers that are delimited by an arbitrary character. What UNIX tools can I use to sort each line’s items numerically, retaining the delimiter?

Examples include:

list of numbers; input: 10 50 23 42; sorted: 10 23 42 50
IP address; input: 10.1.200.42; sorted: 1.10.42.200
CSV; input: 1,100,330,42; sorted: 1,42,100,330
pipe-delimited; input: 400|500|404; sorted: 400|404|500

Since the delimiter is arbitrary, feel free to provide (or extend) an Answer using a single-character delimiter of your choosing.

Contents hide

Answers:

Method 1

Method 2

Method 3

Method 4

Using sed to sort octets of an IP address

Method 5

Method 6

Shell

Awk

Perl

Method 7

Method 8

Method 9

Method 10

Method 11

Method 12

Method 13

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

With gawk (GNU awk) for the asort() function:

gawk -v SEP='*' '{ i=0; split($0, arr, SEP); len=asort(arr);
    while ( ++i<=len ){ printf("%s%s", i>1?SEP:"", arr[i]) }; 
        print "" 
}' infile

replace * as the field separator in SEP='*' with your delimiter.

You can also do with the following command in case of a single line (because it’s better leave it alone of using shell-loops for text-processing purposes)

tr '.' 'n' <<<"$aline" | sort -n | paste -sd'.' -

replace dots . with your delimiter.
add -u to the sort command above to remove the duplicates.

Notes:
You may need to use -g, --general-numeric-sort option of sort instead of -n, --numeric-sort to handle any class of numbers (integer, float, scientific, Hexadecimal, etc).

$ aline='2e-18,6.01e-17,1.4,-4,0xB000,0xB001,23,-3.e+11'
$ tr ',' 'n' <<<"$aline" |sort -g | paste -sd',' -
-3.e+11,-4,2e-18,6.01e-17,1.4,23,0xB000,0xB001

In awk no need change, it still will handling those.

Method 2

Using perl there’s an obvious version; split the data, sort it, join it back up again.

The delimiter needs to be listed twice (once in the split and once in the join)

eg for a ,

perl -lpi -e '$_=join(",",sort {$a <=> $b} split(/,/))'

echo 1,100,330,42 | perl -lpi -e '$_=join(",",sort {$a <=> $b} split(/,/))'
1,42,100,330

Since the split is a regex, the character may need quoting:

echo 10.1.200.42 | perl -lpi -e '$_=join(".",sort {$a <=> $b} split(/./))'
1.10.42.200

By using the -a and -F options, it’s possible to remove the split.
With the -p loop, as before and set the results to $_, which will automatically print:

perl -F'/./' -aple '$_=join(".", sort {$a <=> $b} @F)'

Method 3

Using Python and a similar idea as in Stephen Harris’ answer:

python3 -c 'import sys; c = sys.argv[1]; sys.stdout.writelines(map(lambda x: c.join(sorted(x.strip().split(c), key=int)) + "n", sys.stdin))' <delmiter>

So something like:

$ cat foo
10.129.3.4
1.1.1.1
4.3.2.1
$ python3 -c 'import sys; c = sys.argv[1]; sys.stdout.writelines(map(lambda x: c.join(sorted(x.strip().split(c), key=int)) + "n", sys.stdin))' . < foo
3.4.10.129
1.1.1.1
1.2.3.4

Sadly having to do the I/O manually makes this far less elegant than the Perl version.

Method 4

Using `sed` to sort octets of an IP address

sed does not have a built-in sort function, but if your data is sufficiently constrained in range (such as with IP addresses), you can generate a sed script that manually implements a simple bubble sort. The basic mechanism is to look for adjacent numbers that are out-of-order. If the numbers are out of order, swap them.

The sed script itself contains two search-and-swap commands for each pair of out-of-order numbers: one for the first two pairs of octets (forcing a trailing delimiter to be present to mark the end of the third octet), and a second for the third pair of octets (end with EOL). If swaps occur, the program branches to the top of the script, looking for numbers that are out-of-order. Otherwise, it exits.

The generated script is, in part:

$ head -n 3 generated.sed
:top
s/255.254./254.255./g; s/255.254$/254.255/
s/255.253./253.255./g; s/255.253$/253.255/

# ... middle of the script omitted ...

$ tail -n 4 generated.sed
s/2.1./1.2./g; s/2.1$/1.2/
s/2.0./0.2./g; s/2.0$/0.2/
s/1.0./0.1./g; s/1.0$/0.1/
ttop

This approach hard-codes the period as the delimiter, which has to be escaped, as otherwise it would be “special” to the regular expression syntax (allowing any character).

To generate such a sed script, this loop will do:

#!/bin/bash

echo ':top'

for (( n = 255; n >= 0; n-- )); do
  for (( m = n - 1; m >= 0; m-- )); do
    printf '%s; %sn' "s/$n\.$m\./$m.$n./g" "s/$n\.$m$/$m.$n/"
  done
done

echo 'ttop'

Redirect the output of that script to another file, say sort-ips.sed.

A sample run could then look like:

ip=$((RANDOM % 256)).$((RANDOM % 256)).$((RANDOM % 256)).$((RANDOM % 256))
printf '%sn' "$ip" | sed -f sort-ips.sed

The following variation on the generating script uses the word boundary markers < and > to get rid of the need of the second substitution. This also cuts down the generated script’s size from 1.3 MB to just under 900 KB along with greatly reducing the run time of the sed itself (to about 50%-75% of the original, depending on what sed implementation is being used):

#!/bin/bash

echo ':top'

for (( n = 255; n >= 0; --n )); do
  for (( m = n - 1; m >= 0; --m )); do
      printf '%sn' "s/\<$n\>\.\<$m\>/$m.$n/g"
  done
done

echo 'ttop'

Method 5

Bash script:

#!/usr/bin/env bash

join_by(){ local IFS="$1"; shift; echo "$*"; }

IFS="$1" read -r -a tokens_array <<< "$2"
IFS=$'n' sorted=($(sort -n <<<"${tokens_array[*]}"))
join_by "$1" "${sorted[@]}"

Example:

$ ./sort_delimited_string.sh "." "192.168.0.1"
0.1.168.192

Based on

Method 6

Shell

Loading a higher level language takes time.
For a few lines, the shell itself may be a solution.
We can use the external command sort, and of the command tr. One is quite efficient in sorting lines and the other is effective to convert one delimiter to newlines:

#!/bin/bash
shsort(){
           while IFS='' read -r line; do
               echo "$line" | tr "$1" 'n' |
               sort -n   | paste -sd "$1" -
           done <<<"$2"
    }

shsort ' '    '10 50 23 42'
shsort '.'    '10.1.200.42'
shsort ','    '1,100,330,42'
shsort '|'    '400|500|404'
shsort ','    '3 b,2       x,45    f,*,8jk'
shsort '.'    '10.128.33.6
128.17.71.3
44.32.63.1'

This need bash because of the use of <<< only. If that is replaced with a here-doc, the solution is valid for posix.
This is able to sort fields with tabs, spaces or shell glob characters (*, ?, [). Not newlines because each line is being sorted.

Change <<<"$2" to <"$2" to process filenames and call it like:

shsort '.'    infile

The delimiter is the same for the whole file. If that is a limitation, it could be improved on.

However a file with just 6000 lines takes 15 seconds to process. Truly, the shell is not the best tool to process files.

Awk

For more than a few lines (more than a few 10’s) it is better to use a real programming language. An awk solution could be:

#!/bin/bash
awksort(){
           gawk -v del="$1" '{
               split($0, fields, del)
               l=asort(fields)
               for(i=1;i<=l;i++){
                   printf( "%s%s" , (i==0)?"":del , fields[i] )
               }
               printf "n"
           }' <"$2"
         }

awksort '.'    infile

Which takes only 0.2 seconds for the same 6000 lines file mentioned above.

Understand that the <"$2" for files could be changed back to <<<"$2" for lines inside shell variables.

Perl

The fastest solution is perl.

#!/bin/bash
perlsort(){  perl -lp -e '$_=join("'"$1"'",sort {$a <=> $b} split(/['"$1"']/))' <<<"$2";   }

perlsort ' '    '10 50 23 42'
perlsort '.'    '10.1.200.42'
perlsort ','    '1,100,330,42'
perlsort '|'    '400|500|404'
perlsort ','    '3 b,2       x,45    f,*,8jk'
perlsort '.'    '10.128.33.6
128.17.71.3
44.32.63.1'

If you want to sort a file change <<<"$a" to simply "$a" and add -i to perl options to make the file edition “in place”:

#!/bin/bash
perlsort(){  perl -lpi -e '$_=join("'"$1"'",sort {$a <=> $b} split(/['"$1"']/))' "$2"; }

perlsort '.' infile; exit

Method 7

Here some bash that guesses the delimiter by itself:

#!/bin/bash

delimiter="${1//[[:digit:]]/}"
if echo $delimiter | grep -q "^(.)1+$"
then
  delimiter="${delimiter:0:1}"
  if [[ -z $(echo $1 | grep "^([0-9]+"$delimiter"([0-9]+)*)+$") ]]
  then
    echo "You seem to have empty fields between the delimiters."
    exit 1
  fi
  if [[ './' == *$delimiter* ]]
  then
    n=$( echo $1 | sed "s/\"$delimiter"/\n/g" | sort -n | tr 'n' ' ' | sed -e "s/\s/\"$delimiter"/g")
  else
    n=$( echo $1 | sed "s/"$delimiter"/\n/g" | sort -n | tr 'n' ' ' | sed -e "s/\s/"$delimiter"/g")
  fi
  echo ${n%$delimiter}
  exit 0
else
  echo "The string does not consist of digits separated by one unique delimiter."
  exit 1
fi

It might not be very efficient nor clean but it works.

Use like bash my_script.sh "00/00/18/29838/2".

Returns an error when the same delimiter is not used consistently or when two or more delimiters follow each other.

If the used delimiter is a special character then it is escaped (otherwise sed returns an error).

Method 8

This answer is based on a misunderstanding of the Q., but in some cases it happens to be correct anyway. If the input is entirely natural numbers, and has only one delimiter per-line, (as with the sample data in the Q.), it works correctly. It’ll also handle files with lines that each have their own delimiter, which is a bit more than what was asked for.

This shell function reads from standard input, uses POSIX parameter substitution to find the specific delimiter on each line, (stored in $d), and uses tr to replace $d with a newline n and sorts that line’s data, then restores each line’s original delimiters:

sdn() { while read x; do
            d="${x#${x%%[^0-9]*}}"   d="${d%%[0-9]*}"
            x=$(echo -n "$x" | tr "$d" 'n' | sort -g | tr 'n' "$d")
            echo ${x%?}
        done ; }

Applied to the data given in the OP:

printf "%sn" "10 50 23 42" "10.1.200.42" "1,100,330,42" "400|500|404" | sdn

Output:

10 23 42 50
1.10.42.200
1,42,100,330
400|404|500

Method 9

For arbitrary delimiters:

perl -lne '
  @list = /D+|d+/g;
  @sorted = sort {$a <=> $b} grep /d/, @list;
  for (@list) {$_ = <a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="f7849f9e9183b7849885839293">[email protected]</a> if /d/};
  print @list'

On an input like:

5,4,2,3
6|5,2|4
There are 10 numbers in those 3 lines

It gives:

2,3,4,5
2|4,5|6
There are 3 numbers in those 10 lines

Method 10

The following is an variation on Jeff’s answer in the sense that it generates a sed script that will do Bubble sort, but is sufficiently different to warrant its own answer.

The difference is that instead of generating O(n^2) basic regular expressions, this generates O(n) extended regular expressions. The resulting script will be about 15 KB big. The running time of the sed script is in fractions of a second (it takes a bit longer to generate the script).

It’s restricted to sorting positive integers delimited by dots, but it’s not limited to the size of the integers (just increase 255 in the main loop), or the number of integers. The delimiter may be changed by changing delim='.' in the code.

It’s done my head in to get the regular expressions right, so I’ll leave describing the details for another day.

#!/bin/bash

# This function creates a extended regular expression
# that matches a positive number less than the given parameter.
lt_pattern() {
    local n="$1"  # Our number.
    local -a res  # Our result, an array of regular expressions that we
                  # later join into a string.

    for (( i = 1; i < ${#n}; ++i )); do
        d=$(( ${n: -i:1} - 1 )) # The i:th digit of the number, from right to left, minus one.

        if (( d >= 0 )); then
            res+=( "$( printf '%d[0-%d][0-9]{%d}' "${n:0:-i}" "$d" "$(( i - 1 ))" )" )
        fi
    done

    d=${n:0:1} # The first digit of the number.
    if (( d > 1 )); then
        res+=( "$( printf '[1-%d][0-9]{%d}' "$(( d - 1 ))" "$(( ${#n} - 1 ))" )" )
    fi

    if (( n > 9 )); then
        # The number is 10 or larger.
        res+=( "$( printf '[0-9]{1,%d}' "$(( ${#n} - 1 ))" )" )
    fi

    if (( n == 1 )); then
        # The number is 1. The only thing smaller is zero.
        res+=( 0 )
    fi

    # Join our res array of expressions into a '|'-delimited string.
    ( IFS='|'; printf '%sn' "${res[*]}" )
}

echo ':top'

delim='.'

for (( n = 255; n > 0; --n )); do
    printf 's/\<%d\>\%s\<(%s)\>/\1%s%d/gn' 
        "$n" "$delim" "$( lt_pattern "$n" )" "$delim" "$n"
done

echo 'ttop'

The script will look something like this:

$ bash generator.sh >script.sed
$ head -n 5 script.sed
:top
s/<255>.<(25[0-4][0-9]{0}|2[0-4][0-9]{1}|[1-1][0-9]{2}|[0-9]{1,2})>/1.255/g
s/<254>.<(25[0-3][0-9]{0}|2[0-4][0-9]{1}|[1-1][0-9]{2}|[0-9]{1,2})>/1.254/g
s/<253>.<(25[0-2][0-9]{0}|2[0-4][0-9]{1}|[1-1][0-9]{2}|[0-9]{1,2})>/1.253/g
s/<252>.<(25[0-1][0-9]{0}|2[0-4][0-9]{1}|[1-1][0-9]{2}|[0-9]{1,2})>/1.252/g
$ tail -n 5 script.sed
s/<4>.<([1-3][0-9]{0})>/1.4/g
s/<3>.<([1-2][0-9]{0})>/1.3/g
s/<2>.<([1-1][0-9]{0})>/1.2/g
s/<1>.<(0)>/1.1/g
ttop

The idea behind the generated regular expressions is to pattern match for numbers that are less than each integer; those two numbers would be out-of-order, and so are swapped. The regular expressions are grouped into several OR options. Pay close attention to the ranges appended to each item, sometimes they are {0}, meaning the immediately-previous item is to be omitted from the searching. The regex options, from left-to-right, match numbers that are smaller than the given number by:

the ones place
the tens place
the hundreds place
(continued as needed, for larger numbers)
or by being smaller in magnitude (number of digits)

To spell out an example, take 101 (with additional spaces for readability):

s/ <101> . <(10[0-0][0-9]{0} | [0-9]{1,2})> / 1.101 /g

Here, the first alternation allows the numbers 100 through 100; the second alternation allows 0 through 99.

Another example is 154:

s/ <154> . <(15[0-3][0-9]{0} | 1[0-4][0-9]{1} | [0-9]{1,2})> / 1.154 /g

Here the first option allows 150 through 153; the second allows 100 through 149, and the last allows 0 through 99.

Testing four times in a loop:

for test_run in {1..4}; do
    nums=$(( RANDOM%256 )).$(( RANDOM%256 )).$(( RANDOM%256 )).$(( RANDOM%256 ))
    printf 'nums=%sn' "$nums"
    sed -E -f script.sed <<<"$nums"
done

Output:

nums=90.19.146.232
19.90.146.232
nums=8.226.70.154
8.70.154.226
nums=1.64.96.143
1.64.96.143
nums=67.6.203.56
6.56.67.203

Method 11

This should handle any non-digit (0-9) delimiter. Example:

x='1!4!3!5!2'; delim=$(echo "$x" | tr -d 0-9 | cut -b1); echo "$x" | tr "$delim" 'n' | sort -g | tr 'n' "$delim" | sed "s/$delim$/n/"

Output:

1!2!3!4!5

Method 12

With perl:

$ # -a to auto-split on whitespace, results in @F array
$ echo 'foo baz v22 aimed' | perl -lane 'print join " ", sort @F'
aimed baz foo v22
$ # {$a <=> $b} for numeric comparison, {$b <=> $a} will give descending order
$ echo '1,100,330,42' | perl -F, -lane 'print join ",", sort {$a <=> $b} @F'
1,42,100,330

With ruby, which is somewhat similar to perl

$ # -a to auto-split on whitespace, results in $F array
$ # $F is sorted and then joined using the given string
$ echo 'foo baz v22 aimed' | ruby -lane 'print $F.sort * " "'
aimed baz foo v22

$ # (&:to_i) to convert string to integer
$ echo '1,100,330,42' | ruby -F, -lane 'print $F.sort_by(&:to_i) * ","'
1,42,100,330

$ echo '10.1.200.42' | ruby -F'.' -lane 'print $F.sort_by(&:to_i) * "."'
1.10.42.200

Custom command and passing just the delimiter string(not regex). Will work if input has floating data too

$ # by default join uses value of $,
$ sort_line(){ ruby -lne '$,=ENV["d"]; print $_.split($,).sort_by(&:to_f).join' ; }

$ s='103,14.5,30,24'
$ echo "$s" | d=',' sort_line
14.5,24,30,103
$ s='10.1.200.42'
$ echo "$s" | d='.' sort_line
1.10.42.200

$ # for file input
$ echo '123--87--23' > ip.txt
$ echo '3--12--435--8' >> ip.txt
$ d='--' sort_line <ip.txt
23--87--123
3--8--12--435

Custom command for perl

$ sort_line(){ perl -lne '$d=$ENV{d}; print join $d, sort {$a <=> $b} split /Q$d/' ; }
$ s='123^[]$87^[]$23'
$ echo "$s" | d='^[]$' sort_line 
23^[]$87^[]$123

Further reading – I already had this handy list of perl/ruby one-liners

Method 13

Splitting input into multiple lines

Using tr, you can split the input using an arbitrary delimiter into multiple lines.

This input can then be run through sort (using -n if the input is numerical).

If you wish to retain the delimiter in the output, you can then use tr again to add back the delimiter.

e.g. using space as a delimiter

cat input.txt | tr " " "n" | sort -n | tr "n" " "

input: 1 2 4 1 4 32 18 3
output: 1 1 2 3 4 4 18 32

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes

Article Rating