Sort using custom pattern

Is there a way to output file’s contents using custom patterns?

For instance, having a file myfile with following contents:

a
d
b
c

..how would one sort it using following pattern: print lines starting with “b” first, then print lines starting with “d” and then print lines in normal alphabetical order, so expected output is:

b
d
a
c

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

When you need data to be sorted beyond sort‘s capability, a common approach is to pre-process the data to prepend a sort key, then sort, and finally remove the extra sort key. For example, here, add a 0 if a line starts with b, a 1 if a line starts with d, and a 2 otherwise.

sed -e 's/^b/0&/' -e t -e 's/^d/1&/' -e 't' -e 's/^/2/' |
sort |
sed 's/^.//'

Note that this sorts all the b and d lines. If you want these lines in the original order, then the easiest approach is to split off the lines that you want to leave unsorted. You can, however, work the original line into a sort key with nl — but here it’s more complicated. (Replace t by a literal tab character throughout if your sed doesn’t understand that syntax.)

nl -ba -nln |
sed 's/^[0-9]* *t([bd])/1t&/; t; s/^[0-9]* *t/zt0t/' |
sort -k1,1 -k2,2n |
sed 's/^[^t]*t[^t]*t//'

Alternatively, use a language such as Perl, Python or Ruby that lets you easily specify a custom sort function.

perl -e 'print sort {($b =~ /^[bd]/) - ($a =~ /^[bd]/) ||
                     $a cmp $b} <>'
python -c 'import sys; sys.stdout.write(sorted(sys.stdin.readlines(), key=lambda s: (0 if s[0]=="b" else 1 if s[0]=="d" else 2), s))'

or if you want to leave the b and d lines in the original order:

perl -e 'while (<>) {push @{/^b/ ? @b : /^d/ ? @d : @other}, $_}
         print @b, @d, sort @other'
python -c 'import sys
b = []; d = []; other = []
for line in sys.stdin.readlines():
    if line[0]=="b": b += line
    elif line[0]=="d": d += line
    else: other += line
other.sort()
sys.stdout.writelines(b); sys.stdout.writelines(d); sys.stdout.writelines(other)'

Method 2

You would need to use something more than just the sort command. First grep the b lines, then the d lines and then sort anything without the b or d at the end of that.

grep '^b' myfile > outfile
grep '^d' myfile >> outfile
grep -v '^b' myfile | grep -v '^d' | sort >> outfile
cat outfile

will result in:

b
d
a
c

This is assuming that the lines start with the ‘pattern’ b and d if that is the whole pattern or something inside the line you can leave out the caret (^)

A one-line equivalent would be:

(grep '^b' myfile ; grep '^d' myfile ; grep -v '^b' myfile | grep -v '^d' | sort)

Method 3

One way of solving this using awk would be:

sort myfile | awk '$0 ~ /^b/ || $0 ~ /^d/ {print} $0 !~ /^b/ && $0 !~ /^d/ { a[f++] = $0 } END { for (word = 0; word < f; word++) { print a[word] } }'

Method 4

cat file | tr bd '12' | LANG=C sort | tr '12' bd

Where the intermediary contents are: (printing CrtA, CrtB as , )

file  | tr-1  | sort  | tr-2
------------------------------
cat     cat     Ⓐat     bat
bed     ⒶeⒷ     ⒶeⒷ     bed
fog     fog     Ⓑay     day
dog     Ⓑog     Ⓑog     dog
egg     egg     cat     cat
day     Ⓑay     egg     egg
kin     kin     fog     fog
lay     lay     get     get
in      in      in      in
bat     Ⓐat     kin     kin
get     get     lay     lay


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x