Count total number of occurrences using grep

grep -c is useful for finding how many times a string occurs in a file, but it only counts each occurence once per line. How to count multiple occurences per line?

I’m looking for something more elegant than:

perl -e '$_ = <>; print scalar ( () = m/needle/g ), "n"'

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

grep’s -o will only output the matches, ignoring lines; wc can count them:

grep -o 'needle' file | wc -l

This will also match ‘needles’ or ‘multineedle’.

To match only single words use one of the following commands:

grep -ow 'needle' file | wc -l
grep -o 'bneedleb' file | wc -l
grep -o '<needle>' file | wc -l

Method 2

If you have GNU grep (always on Linux and Cygwin, occasionally elsewhere), you can count the output lines from grep -o: grep -o needle | wc -l.

With Perl, here are a few ways I find more elegant than yours (even after it’s fixed).

perl -lne 'END {print $c} map ++$c, /needle/g'
perl -lne 'END {print $c} $c += s/needle//g'
perl -lne 'END {print $c} ++$c while /needle/g'

With only POSIX tools, one approach, if possible, is to split the input into lines with a single match before passing it to grep. For example, if you’re looking for whole words, then first turn every non-word character into a newline.

# equivalent to grep -ow 'needle' | wc -l
tr -c '[:alnum:]' '[n*]' | grep -c '^needle$'

Otherwise, there’s no standard command to do this particular bit of text processing, so you need to turn to sed (if you’re a masochist) or awk.

awk '{while (match($0, /set/)) {++c; $0=substr($0, RSTART+RLENGTH)}}
     END {print c}'
sed -n -e 's/set/n&n/g' -e 's/^/n/' -e 's/$/n/' 
       -e 's/n[^n]*n/n/g' -e 's/^n//' -e 's/n$//' 
       -e '/./p' | wc -l

Here’s a simpler solution using sed and grep, which works for strings or even by-the-book regular expressions but fails in a few corner cases with anchored patterns (e.g. it finds two occurrences of ^needle or bneedle in needleneedle).

sed 's/needle/n&n/g' | grep -cx 'needle'

Note that in the sed substitutions above, I used n to mean a newline. This is standard in the pattern part, but in the replacement text, for portability, substitute backslash-newline for n.

Method 3

If, like me, you actually wanted “both; each exactly once”, (this is actually “either; twice”) then it’s simple:

grep -E "thing1|thing2" -c

and check for the output 2.

The benefit of this approach (if exactly once is what you want) is that it scales easily.

Method 4

Another solution using awk and needle as field separator:

awk -F'^needle | needle | needle$' '{c+=NF-1}END{print c}'

If you want to match needle followed by punctuation, change the field separator accordingly i.e.

awk -F'^needle[ ,.?]|[ ,.?]needle[ ,.?]|[ ,.?]needle$' '{c+=NF-1}END{print c}'

Or use the class: [^[:alnum:]] to encompass all non alpha characters.

Method 5

Your example only prints out the number of occurrences per-line, and not the total in the file. If that’s what you want, something like this might work:

perl -nle '$c+=scalar(()=m/needle/g);END{print $c}'

Method 6

This is my pure bash solution

#!/bin/bash

B=$(for i in $(cat /tmp/a | sort -u); do
echo "$(grep $i /tmp/a | wc -l) $i"
done)

echo "$B" | sort --reverse

Method 7

I had a need to do this but for more than one search term. And I wanted them to be listed in columns with the number of occurrences of each.

My bash-only, one-liner, solution is as follows:

grep -o -E 'borp|flarb' flarb.log  | sort | uniq -c
 910 borp
9090 flarb


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x