How do I count the number of occurrences of a word in a text file with the command line?

I have a large JSON file that is on one line, and I want to use the command line to be able to count the number of occurrences of a word in the file. How can I do that?

Contents hide

Answers:

Method 1

Method 2

Method 3

Method 4

Method 5

Method 6

Method 7

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

$ tr ' ' 'n' < FILE | grep WORD | wc -l

Where tr replaces spaces with newlines, grep filters all resulting lines matching WORD and wc counts the remaining ones.

One can even save the wc part using the -c option of grep:

$ tr ' ' 'n' < FILE | grep -c WORD

The -c option is defined by POSIX.

If it is not guaranteed that there are spaces between the words, you have to use some other character (as delimiter) to replace. For example alternative tr parts are

tr '"' 'n'

tr "'" 'n'

if you want to replace double or single quotes. Of course, you can also use tr to replace multiple characters at once (think different kinds of whitespace and punctuation).

In case you need to count WORD but not prefixWORD, WORDsuffix or prefixWORDsuffix, you can enclose the WORD pattern in begin/end-of-line markers:

grep -c '^WORD$'

Which is equivalent to word-begin/end markers, in our context:

grep -c '<WORD>'

Method 2

With GNU grep, this works: grep -o '<WORD>' | wc -l

-o prints each matched parts of each line on a separate line.

< asserts the start of a word and > asserts the end of a word (similar to Perl’s b), so this ensures that you’re not matching a string in the middle of a word.

For example,

$ python -c 'import this' | grep '<one>'
There should be one-- and preferably only one --obvious way to do it.
Namespaces are one honking great idea -- let's do more of those!
$ python -c 'import this' | grep -o '<one>'
one
one
one
$ python -c 'import this' | grep -o '<one>' | wc -l
3

Method 3

This unfortunately does not work with GNU coreutils.

grep -o -c WORD file

If it works on your platform, it’s an elegant and fairly intuitive solution; but the GNU folks are still thinking.

Method 4

sed -e 's/[^[:alpha:]]/ /g' text_to_analize.txt | tr 'n' " " |  tr -s " " | tr " " 'n'| tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr | nl

This command makes the following:

Substitute all non alphanumeric characters with a blank space.
All line breaks are converted to spaces also.
Reduces all multiple blank spaces to one blank space
All spaces are now converted to line breaks. Each word in a line.
Translates all words to lower case to avoid ‘Hello’ and ‘hello’ to be different words
Sorts de text
Counts and remove the equal lines
Sorts reverse in order to count the most frequent words
Add a line number to each word in order to know the word posotion in the whole

For example if I want to analize the first Linus Torvald message:

From: [email protected] (Linus Benedict Torvalds)
Newsgroups: comp.os.minix Subject: What would you like to see most in
minix? Summary: small poll for my new operating system Message-ID:
<[email protected]> Date: 25 Aug 91 20:57:08
GMT Organization: University of Helsinki

Hello everybody out there using minix –

I’m doing a (free) operating system (just a hobby, won’t be big and
professional like gnu) for 386(486) AT clones. This has been brewing
since april, and is starting to get ready. I’d like any feedback on
things people like/dislike in minix, as my OS resembles it somewhat
(same physical layout of the file-system (due to practical reasons)
among other things).

I’ve currently ported bash(1.08) and gcc(1.40), and things seem to
work. This implies that I’ll get something practical within a few
months, and I’d like to know what features most people would want. Any
suggestions are welcome, but I won’t promise I’ll implement them 🙂

Linus ([email protected])

PS. Yes – it’s free of any minix code, and it has a multi-threaded fs.
It is NOT protable (uses 386 task switching etc), and it probably
never will support anything other than AT-harddisks, as that’s all I
have :-(.

I create a file named linus.txt, I paste the content and then I write in the console:

sed -e 's/[^[:alpha:]]/ /g' linus.txt | tr 'n' " " |  tr -s " " | tr " " 'n'| tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr | nl

The out put would be:

 1        7 i
 2        5 to
 3        5 like
 4        5 it
 5        5 and
 6        4 minix
 7        4 a
 8        3 torvalds
 9        3 of
10        3 helsinki
11        3 fi
12        3 any
13        2 would
14        2 won
15        2 what
16        ...

If you want to visualize only the first 20 words:

sed -e 's/[^[:alpha:]]/ /g' text_to_analize.txt | tr 'n' " " |  tr -s " " | tr " " 'n'| tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr | nl | head -n 20

Is important to note that the command tr ‘A-Z’ ‘a-z’ doesn’t suport UTF-8 yet, so that in foreign languages the word APRÈS would be translated as aprÈs.

If you only want to search for the occurency of one word you can add a grep at the end:

sed -e 's/[^[:alpha:]]/ /g' text_to_analize.txt | tr 'n' " " |  tr -s " " | tr " " 'n'| tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr | nl | grep "sword_to_search_for$"

In a script called search_freq:

#!/bin/bash
sed -e 's/[^[:alpha:]]/ /g' text_to_analize.txt | tr 'n' " " |  tr -s " " | tr " " 'n'| tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr | nl | grep "s$1$"

The script must be called:

 search_freq word_to_search_for

Method 5

Depending on whether you’d like to match the word in the keys or in the values of the JSON data, you are likely to want to extract only keys or only values from the data. Otherwise you may count some words too many times if they occur as both keys and values.

To extract all keys:

jq -r '..|objects|keys[]' <file.json

This recursively tests whether the current thing is an object, and if it is, it extracts the keys. The output will be a list of keys, one per line.

To extract all values:

jq -r '..|scalars' <file.json

This works in a similar way, but has fewer steps.

You may then pipe the output of the above through grep -c 'PATTERN' (to match some pattern against the keys or values), or grep -c -w -F 'WORD' (to match a word in the keys or values), or grep -c -x -F 'WORD' (to match a complete key or value), or similar, to do your counting.

Method 6

I have json with something like this: "number":"OK","number":OK" repeated multiple times in one line.

My simple “OK” counter:

sed "s|,|n|g" response | grep -c OK

Method 7

i Have used below awk command to find the number of occurrences

example file

cat file1

praveen ajay 
praveen
ajay monkey praveen
praveen boy praveen

command:

awk '{print gsub("praveen",$0)}' file1 | awk 'BEGIN{sum=0}{sum=sum+$1}END{print sum}'

output

awk '{print gsub("praveen",$0)}' file1 | awk 'BEGIN{sum=0}{sum=sum+$1}END{print sum}'

5

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes

Article Rating