I have a line (or many lines) of numbers that are delimited by an arbitrary character. What UNIX tools can I use to sort each line’s items numerically, retaining the delimiter?
Examples include:
- list of numbers; input:
10 50 23 42; sorted:10 23 42 50 - IP address; input:
10.1.200.42; sorted:1.10.42.200 - CSV; input:
1,100,330,42; sorted:1,42,100,330 - pipe-delimited; input:
400|500|404; sorted:400|404|500
Since the delimiter is arbitrary, feel free to provide (or extend) an Answer using a single-character delimiter of your choosing.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
With gawk (GNU awk) for the asort() function:
gawk -v SEP='*' '{ i=0; split($0, arr, SEP); len=asort(arr);
while ( ++i<=len ){ printf("%s%s", i>1?SEP:"", arr[i]) };
print ""
}' infile
replace * as the field separator in SEP='*' with your delimiter.
You can also do with the following command in case of a single line (because it’s better leave it alone of using shell-loops for text-processing purposes)
tr '.' 'n' <<<"$aline" | sort -n | paste -sd'.' -
replace dots . with your delimiter.
add -u to the sort command above to remove the duplicates.
Notes:
You may need to use -g, --general-numeric-sort option of sort instead of -n, --numeric-sort to handle any class of numbers (integer, float, scientific, Hexadecimal, etc).
$ aline='2e-18,6.01e-17,1.4,-4,0xB000,0xB001,23,-3.e+11' $ tr ',' 'n' <<<"$aline" |sort -g | paste -sd',' - -3.e+11,-4,2e-18,6.01e-17,1.4,23,0xB000,0xB001
In awk no need change, it still will handling those.
Method 2
Using perl there’s an obvious version; split the data, sort it, join it back up again.
The delimiter needs to be listed twice (once in the split and once in the join)
eg for a ,
perl -lpi -e '$_=join(",",sort {$a <=> $b} split(/,/))'
So
echo 1,100,330,42 | perl -lpi -e '$_=join(",",sort {$a <=> $b} split(/,/))'
1,42,100,330
Since the split is a regex, the character may need quoting:
echo 10.1.200.42 | perl -lpi -e '$_=join(".",sort {$a <=> $b} split(/./))'
1.10.42.200
By using the -a and -F options, it’s possible to remove the split.
With the -p loop, as before and set the results to $_, which will automatically print:
perl -F'/./' -aple '$_=join(".", sort {$a <=> $b} @F)'
Method 3
Using Python and a similar idea as in Stephen Harris’ answer:
python3 -c 'import sys; c = sys.argv[1]; sys.stdout.writelines(map(lambda x: c.join(sorted(x.strip().split(c), key=int)) + "n", sys.stdin))' <delmiter>
So something like:
$ cat foo 10.129.3.4 1.1.1.1 4.3.2.1 $ python3 -c 'import sys; c = sys.argv[1]; sys.stdout.writelines(map(lambda x: c.join(sorted(x.strip().split(c), key=int)) + "n", sys.stdin))' . < foo 3.4.10.129 1.1.1.1 1.2.3.4
Sadly having to do the I/O manually makes this far less elegant than the Perl version.
Method 4
Using sed to sort octets of an IP address
sed does not have a built-in sort function, but if your data is sufficiently constrained in range (such as with IP addresses), you can generate a sed script that manually implements a simple bubble sort. The basic mechanism is to look for adjacent numbers that are out-of-order. If the numbers are out of order, swap them.
The sed script itself contains two search-and-swap commands for each pair of out-of-order numbers: one for the first two pairs of octets (forcing a trailing delimiter to be present to mark the end of the third octet), and a second for the third pair of octets (end with EOL). If swaps occur, the program branches to the top of the script, looking for numbers that are out-of-order. Otherwise, it exits.
The generated script is, in part:
$ head -n 3 generated.sed :top s/255.254./254.255./g; s/255.254$/254.255/ s/255.253./253.255./g; s/255.253$/253.255/ # ... middle of the script omitted ... $ tail -n 4 generated.sed s/2.1./1.2./g; s/2.1$/1.2/ s/2.0./0.2./g; s/2.0$/0.2/ s/1.0./0.1./g; s/1.0$/0.1/ ttop
This approach hard-codes the period as the delimiter, which has to be escaped, as otherwise it would be “special” to the regular expression syntax (allowing any character).
To generate such a sed script, this loop will do:
#!/bin/bash
echo ':top'
for (( n = 255; n >= 0; n-- )); do
for (( m = n - 1; m >= 0; m-- )); do
printf '%s; %sn' "s/$n\.$m\./$m.$n./g" "s/$n\.$m$/$m.$n/"
done
done
echo 'ttop'
Redirect the output of that script to another file, say sort-ips.sed.
A sample run could then look like:
ip=$((RANDOM % 256)).$((RANDOM % 256)).$((RANDOM % 256)).$((RANDOM % 256)) printf '%sn' "$ip" | sed -f sort-ips.sed
The following variation on the generating script uses the word boundary markers < and > to get rid of the need of the second substitution. This also cuts down the generated script’s size from 1.3 MB to just under 900 KB along with greatly reducing the run time of the sed itself (to about 50%-75% of the original, depending on what sed implementation is being used):
#!/bin/bash
echo ':top'
for (( n = 255; n >= 0; --n )); do
for (( m = n - 1; m >= 0; --m )); do
printf '%sn' "s/\<$n\>\.\<$m\>/$m.$n/g"
done
done
echo 'ttop'
Method 5
Bash script:
#!/usr/bin/env bash
join_by(){ local IFS="$1"; shift; echo "$*"; }
IFS="$1" read -r -a tokens_array <<< "$2"
IFS=$'n' sorted=($(sort -n <<<"${tokens_array[*]}"))
join_by "$1" "${sorted[@]}"
Example:
$ ./sort_delimited_string.sh "." "192.168.0.1" 0.1.168.192
Based on
Method 6
Shell
Loading a higher level language takes time.
For a few lines, the shell itself may be a solution.
We can use the external command sort, and of the command tr. One is quite efficient in sorting lines and the other is effective to convert one delimiter to newlines:
#!/bin/bash
shsort(){
while IFS='' read -r line; do
echo "$line" | tr "$1" 'n' |
sort -n | paste -sd "$1" -
done <<<"$2"
}
shsort ' ' '10 50 23 42'
shsort '.' '10.1.200.42'
shsort ',' '1,100,330,42'
shsort '|' '400|500|404'
shsort ',' '3 b,2 x,45 f,*,8jk'
shsort '.' '10.128.33.6
128.17.71.3
44.32.63.1'
This need bash because of the use of <<< only. If that is replaced with a here-doc, the solution is valid for posix.
This is able to sort fields with tabs, spaces or shell glob characters (*, ?, [). Not newlines because each line is being sorted.
Change <<<"$2" to <"$2" to process filenames and call it like:
shsort '.' infile
The delimiter is the same for the whole file. If that is a limitation, it could be improved on.
However a file with just 6000 lines takes 15 seconds to process. Truly, the shell is not the best tool to process files.
Awk
For more than a few lines (more than a few 10’s) it is better to use a real programming language. An awk solution could be:
#!/bin/bash
awksort(){
gawk -v del="$1" '{
split($0, fields, del)
l=asort(fields)
for(i=1;i<=l;i++){
printf( "%s%s" , (i==0)?"":del , fields[i] )
}
printf "n"
}' <"$2"
}
awksort '.' infile
Which takes only 0.2 seconds for the same 6000 lines file mentioned above.
Understand that the <"$2" for files could be changed back to <<<"$2" for lines inside shell variables.
Perl
The fastest solution is perl.
#!/bin/bash
perlsort(){ perl -lp -e '$_=join("'"$1"'",sort {$a <=> $b} split(/['"$1"']/))' <<<"$2"; }
perlsort ' ' '10 50 23 42'
perlsort '.' '10.1.200.42'
perlsort ',' '1,100,330,42'
perlsort '|' '400|500|404'
perlsort ',' '3 b,2 x,45 f,*,8jk'
perlsort '.' '10.128.33.6
128.17.71.3
44.32.63.1'
If you want to sort a file change <<<"$a" to simply "$a" and add -i to perl options to make the file edition “in place”:
#!/bin/bash
perlsort(){ perl -lpi -e '$_=join("'"$1"'",sort {$a <=> $b} split(/['"$1"']/))' "$2"; }
perlsort '.' infile; exit
Method 7
Here some bash that guesses the delimiter by itself:
#!/bin/bash
delimiter="${1//[[:digit:]]/}"
if echo $delimiter | grep -q "^(.)1+$"
then
delimiter="${delimiter:0:1}"
if [[ -z $(echo $1 | grep "^([0-9]+"$delimiter"([0-9]+)*)+$") ]]
then
echo "You seem to have empty fields between the delimiters."
exit 1
fi
if [[ './' == *$delimiter* ]]
then
n=$( echo $1 | sed "s/\"$delimiter"/\n/g" | sort -n | tr 'n' ' ' | sed -e "s/\s/\"$delimiter"/g")
else
n=$( echo $1 | sed "s/"$delimiter"/\n/g" | sort -n | tr 'n' ' ' | sed -e "s/\s/"$delimiter"/g")
fi
echo ${n%$delimiter}
exit 0
else
echo "The string does not consist of digits separated by one unique delimiter."
exit 1
fi
It might not be very efficient nor clean but it works.
Use like bash my_script.sh "00/00/18/29838/2".
Returns an error when the same delimiter is not used consistently or when two or more delimiters follow each other.
If the used delimiter is a special character then it is escaped (otherwise sed returns an error).
Method 8
This answer is based on a misunderstanding of the Q., but in some cases it happens to be correct anyway. If the input is entirely natural numbers, and has only one delimiter per-line, (as with the sample data in the Q.), it works correctly. It’ll also handle files with lines that each have their own delimiter, which is a bit more than what was asked for.
This shell function reads from standard input, uses POSIX parameter substitution to find the specific delimiter on each line, (stored in $d), and uses tr to replace $d with a newline n and sorts that line’s data, then restores each line’s original delimiters:
sdn() { while read x; do
d="${x#${x%%[^0-9]*}}" d="${d%%[0-9]*}"
x=$(echo -n "$x" | tr "$d" 'n' | sort -g | tr 'n' "$d")
echo ${x%?}
done ; }
Applied to the data given in the OP:
printf "%sn" "10 50 23 42" "10.1.200.42" "1,100,330,42" "400|500|404" | sdn
Output:
10 23 42 50 1.10.42.200 1,42,100,330 400|404|500
Method 9
For arbitrary delimiters:
perl -lne '
@list = /D+|d+/g;
@sorted = sort {$a <=> $b} grep /d/, @list;
for (@list) {$_ = <a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="f7849f9e9183b7849885839293">[email protected]</a> if /d/};
print @list'
On an input like:
5,4,2,3 6|5,2|4 There are 10 numbers in those 3 lines
It gives:
2,3,4,5 2|4,5|6 There are 3 numbers in those 10 lines
Method 10
The following is an variation on Jeff’s answer in the sense that it generates a sed script that will do Bubble sort, but is sufficiently different to warrant its own answer.
The difference is that instead of generating O(n^2) basic regular expressions, this generates O(n) extended regular expressions. The resulting script will be about 15 KB big. The running time of the sed script is in fractions of a second (it takes a bit longer to generate the script).
It’s restricted to sorting positive integers delimited by dots, but it’s not limited to the size of the integers (just increase 255 in the main loop), or the number of integers. The delimiter may be changed by changing delim='.' in the code.
It’s done my head in to get the regular expressions right, so I’ll leave describing the details for another day.
#!/bin/bash
# This function creates a extended regular expression
# that matches a positive number less than the given parameter.
lt_pattern() {
local n="$1" # Our number.
local -a res # Our result, an array of regular expressions that we
# later join into a string.
for (( i = 1; i < ${#n}; ++i )); do
d=$(( ${n: -i:1} - 1 )) # The i:th digit of the number, from right to left, minus one.
if (( d >= 0 )); then
res+=( "$( printf '%d[0-%d][0-9]{%d}' "${n:0:-i}" "$d" "$(( i - 1 ))" )" )
fi
done
d=${n:0:1} # The first digit of the number.
if (( d > 1 )); then
res+=( "$( printf '[1-%d][0-9]{%d}' "$(( d - 1 ))" "$(( ${#n} - 1 ))" )" )
fi
if (( n > 9 )); then
# The number is 10 or larger.
res+=( "$( printf '[0-9]{1,%d}' "$(( ${#n} - 1 ))" )" )
fi
if (( n == 1 )); then
# The number is 1. The only thing smaller is zero.
res+=( 0 )
fi
# Join our res array of expressions into a '|'-delimited string.
( IFS='|'; printf '%sn' "${res[*]}" )
}
echo ':top'
delim='.'
for (( n = 255; n > 0; --n )); do
printf 's/\<%d\>\%s\<(%s)\>/\1%s%d/gn'
"$n" "$delim" "$( lt_pattern "$n" )" "$delim" "$n"
done
echo 'ttop'
The script will look something like this:
$ bash generator.sh >script.sed
$ head -n 5 script.sed
:top
s/<255>.<(25[0-4][0-9]{0}|2[0-4][0-9]{1}|[1-1][0-9]{2}|[0-9]{1,2})>/1.255/g
s/<254>.<(25[0-3][0-9]{0}|2[0-4][0-9]{1}|[1-1][0-9]{2}|[0-9]{1,2})>/1.254/g
s/<253>.<(25[0-2][0-9]{0}|2[0-4][0-9]{1}|[1-1][0-9]{2}|[0-9]{1,2})>/1.253/g
s/<252>.<(25[0-1][0-9]{0}|2[0-4][0-9]{1}|[1-1][0-9]{2}|[0-9]{1,2})>/1.252/g
$ tail -n 5 script.sed
s/<4>.<([1-3][0-9]{0})>/1.4/g
s/<3>.<([1-2][0-9]{0})>/1.3/g
s/<2>.<([1-1][0-9]{0})>/1.2/g
s/<1>.<(0)>/1.1/g
ttop
The idea behind the generated regular expressions is to pattern match for numbers that are less than each integer; those two numbers would be out-of-order, and so are swapped. The regular expressions are grouped into several OR options. Pay close attention to the ranges appended to each item, sometimes they are {0}, meaning the immediately-previous item is to be omitted from the searching. The regex options, from left-to-right, match numbers that are smaller than the given number by:
- the ones place
- the tens place
- the hundreds place
- (continued as needed, for larger numbers)
- or by being smaller in magnitude (number of digits)
To spell out an example, take 101 (with additional spaces for readability):
s/ <101> . <(10[0-0][0-9]{0} | [0-9]{1,2})> / 1.101 /g
Here, the first alternation allows the numbers 100 through 100; the second alternation allows 0 through 99.
Another example is 154:
s/ <154> . <(15[0-3][0-9]{0} | 1[0-4][0-9]{1} | [0-9]{1,2})> / 1.154 /g
Here the first option allows 150 through 153; the second allows 100 through 149, and the last allows 0 through 99.
Testing four times in a loop:
for test_run in {1..4}; do
nums=$(( RANDOM%256 )).$(( RANDOM%256 )).$(( RANDOM%256 )).$(( RANDOM%256 ))
printf 'nums=%sn' "$nums"
sed -E -f script.sed <<<"$nums"
done
Output:
nums=90.19.146.232 19.90.146.232 nums=8.226.70.154 8.70.154.226 nums=1.64.96.143 1.64.96.143 nums=67.6.203.56 6.56.67.203
Method 11
This should handle any non-digit (0-9) delimiter. Example:
x='1!4!3!5!2'; delim=$(echo "$x" | tr -d 0-9 | cut -b1); echo "$x" | tr "$delim" 'n' | sort -g | tr 'n' "$delim" | sed "s/$delim$/n/"
Output:
1!2!3!4!5
Method 12
With perl:
$ # -a to auto-split on whitespace, results in @F array
$ echo 'foo baz v22 aimed' | perl -lane 'print join " ", sort @F'
aimed baz foo v22
$ # {$a <=> $b} for numeric comparison, {$b <=> $a} will give descending order
$ echo '1,100,330,42' | perl -F, -lane 'print join ",", sort {$a <=> $b} @F'
1,42,100,330
With ruby, which is somewhat similar to perl
$ # -a to auto-split on whitespace, results in $F array
$ # $F is sorted and then joined using the given string
$ echo 'foo baz v22 aimed' | ruby -lane 'print $F.sort * " "'
aimed baz foo v22
$ # (&:to_i) to convert string to integer
$ echo '1,100,330,42' | ruby -F, -lane 'print $F.sort_by(&:to_i) * ","'
1,42,100,330
$ echo '10.1.200.42' | ruby -F'.' -lane 'print $F.sort_by(&:to_i) * "."'
1.10.42.200
Custom command and passing just the delimiter string(not regex). Will work if input has floating data too
$ # by default join uses value of $,
$ sort_line(){ ruby -lne '$,=ENV["d"]; print $_.split($,).sort_by(&:to_f).join' ; }
$ s='103,14.5,30,24'
$ echo "$s" | d=',' sort_line
14.5,24,30,103
$ s='10.1.200.42'
$ echo "$s" | d='.' sort_line
1.10.42.200
$ # for file input
$ echo '123--87--23' > ip.txt
$ echo '3--12--435--8' >> ip.txt
$ d='--' sort_line <ip.txt
23--87--123
3--8--12--435
Custom command for perl
$ sort_line(){ perl -lne '$d=$ENV{d}; print join $d, sort {$a <=> $b} split /Q$d/' ; }
$ s='123^[]$87^[]$23'
$ echo "$s" | d='^[]$' sort_line
23^[]$87^[]$123
Further reading – I already had this handy list of perl/ruby one-liners
Method 13
Splitting input into multiple lines
Using tr, you can split the input using an arbitrary delimiter into multiple lines.
This input can then be run through sort (using -n if the input is numerical).
If you wish to retain the delimiter in the output, you can then use tr again to add back the delimiter.
e.g. using space as a delimiter
cat input.txt | tr " " "n" | sort -n | tr "n" " "
input: 1 2 4 1 4 32 18 3
output: 1 1 2 3 4 4 18 32
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0