Is there a way to output file’s contents using custom patterns?
For instance, having a file myfile with following contents:
a d b c
..how would one sort it using following pattern: print lines starting with “b” first, then print lines starting with “d” and then print lines in normal alphabetical order, so expected output is:
b d a c
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
When you need data to be sorted beyond sort‘s capability, a common approach is to pre-process the data to prepend a sort key, then sort, and finally remove the extra sort key. For example, here, add a 0 if a line starts with b, a 1 if a line starts with d, and a 2 otherwise.
sed -e 's/^b/0&/' -e t -e 's/^d/1&/' -e 't' -e 's/^/2/' | sort | sed 's/^.//'
Note that this sorts all the b and d lines. If you want these lines in the original order, then the easiest approach is to split off the lines that you want to leave unsorted. You can, however, work the original line into a sort key with nl — but here it’s more complicated. (Replace t by a literal tab character throughout if your sed doesn’t understand that syntax.)
nl -ba -nln | sed 's/^[0-9]* *t([bd])/1t&/; t; s/^[0-9]* *t/zt0t/' | sort -k1,1 -k2,2n | sed 's/^[^t]*t[^t]*t//'
Alternatively, use a language such as Perl, Python or Ruby that lets you easily specify a custom sort function.
perl -e 'print sort {($b =~ /^[bd]/) - ($a =~ /^[bd]/) ||
$a cmp $b} <>'
python -c 'import sys; sys.stdout.write(sorted(sys.stdin.readlines(), key=lambda s: (0 if s[0]=="b" else 1 if s[0]=="d" else 2), s))'
or if you want to leave the b and d lines in the original order:
perl -e 'while (<>) {push @{/^b/ ? @b : /^d/ ? @d : @other}, $_}
print @b, @d, sort @other'
python -c 'import sys
b = []; d = []; other = []
for line in sys.stdin.readlines():
if line[0]=="b": b += line
elif line[0]=="d": d += line
else: other += line
other.sort()
sys.stdout.writelines(b); sys.stdout.writelines(d); sys.stdout.writelines(other)'
Method 2
You would need to use something more than just the sort command. First grep the b lines, then the d lines and then sort anything without the b or d at the end of that.
grep '^b' myfile > outfile grep '^d' myfile >> outfile grep -v '^b' myfile | grep -v '^d' | sort >> outfile cat outfile
will result in:
b d a c
This is assuming that the lines start with the ‘pattern’ b and d if that is the whole pattern or something inside the line you can leave out the caret (^)
A one-line equivalent would be:
(grep '^b' myfile ; grep '^d' myfile ; grep -v '^b' myfile | grep -v '^d' | sort)
Method 3
One way of solving this using awk would be:
sort myfile | awk '$0 ~ /^b/ || $0 ~ /^d/ {print} $0 !~ /^b/ && $0 !~ /^d/ { a[f++] = $0 } END { for (word = 0; word < f; word++) { print a[word] } }'
Method 4
cat file | tr bd '12' | LANG=C sort | tr '12' bd
Where the intermediary contents are: (printing CrtA, CrtB as Ⓐ, Ⓑ)
file | tr-1 | sort | tr-2 ------------------------------ cat cat Ⓐat bat bed ⒶeⒷ ⒶeⒷ bed fog fog Ⓑay day dog Ⓑog Ⓑog dog egg egg cat cat day Ⓑay egg egg kin kin fog fog lay lay get get in in in in bat Ⓐat kin kin get get lay lay
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0