I’m entering the world of Linux and at work I’m using grep more and more. By doing that I’m figuring out that sometimes it’s not adequate for what I want.
I was struggling with grep a few days ago and a colleague of mine who is a senior Linux admin, told me to use awk. I was stunned by how fast I got a result.
So my question is when do you choose to use one over the other? What questions can I ask myself before going to work with grep and spending a lot of time, when I could have done it with awk and saved time?
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
sed and awk are supersets of grep, there are things that are easier to do with one or the other.
grep foo can be written sed '/foo/!d' or awk /foo/, but consider:
grep -i foo would have to be sed '/[fF][oO][oO]/!d' unless you want to consider non-standard extensions like GNU’s sed '/foo/I!d'. Or with awk: awk 'tolower($0) ~ /foo/' or again using a GNU extension: awk -v IGNORECASE=1 /foo/.
Things the different tools are good at and cumbersome with the other tools:
grep
grep is a simple tool but has very specialised modes of operation that are harder to reproduce with awk or sed:
grep -ifor case insensitive matching (see above)grep -Fe "$string"for fixed string search (export string; awk 'index($0, ENVIRON["string"])'withawk, no direct equivalent withsed).- (non standard)
grep -rfor recursive search - (non standard)
grep -P/pcregrepfor perl-like regexps (somesedimplementations have perl-like regexp support though not the most major ones) - (non standard)
grep -oto return the matched portion (several lines ofawkorsedto do the same) - (non standard)
grep -A/B/Cto return context around the match (again painful to do in a similar fashion withsedorawk)
sed
s/foo/bar/:sed‘sscommand has features that are hard to implement inawklike:s/foo(.*)bar/1/g: capturing (though GNU awk has agensub()extension for that)s/foo/bar/3: replace the 3rd occurrence on each line- (non-standard): in-place file editing (though it’s also supported by GNU
awknow).
awk
awk is the most feature rich of the three.
- good for dealing with numbers
- good for dealing with input formatted in columns.
- good for extracting and combining data from different sources, with its associative arrays.
perl
perl as a practical extraction and reporting tool has the best of all those. That’s what it was initially designed for (to be the tool that makes all those sed/awk obsolete).
Mastering perl to do text processing does give a serious advantage. I’d recommend spending some time on it, even before looking at the less common sed commands for instance.
performance
As a rule of thumb, the more specialised the tool, the most efficient it is at the task. But that also very much depends on the implementation, the task and a few other factors and performance can have trade-offs that may need to be taken into account.
For instance, there are some grep or sed implementations that are very fast, but for instance they don’t support multibyte characters so can only work correctly on US-English text in multi-byte locales. Or they’re fast because they work on a small fixed-length buffer and thus can’t work on arbitrary input…
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0