Count distinct values of a field in a file

I have a file with one million lines. Each line has a field called transactionid, which has repetitive values. What I need to do is to count them distinctly. No matter how many times a value is repeated, it should be counted only once.

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

OK, Assuming that your file is a text file, having the fields separated by comma separator ‘,’. You would also know which field 'transactionid' is in terms of its position. Assuming that your 'transactionid' field is 7th field.

awk -F ',' '{print $7}' text_file | sort | uniq -c

This would count the distinct/unique occurrences in the 7th field and prints the result.

Method 2

Maybe not the sleekest method, but this should work:

awk '{print $1}' your_file | sort | uniq | wc -l

where $1 is the number corresponding to the field to be parsed.

Method 3

There is no need to sort the file .. (uniq requires the file to be sorted)
This awk script assumes the field is the first whitespace delimiited field.

awk 'a[$1] == "" { a[$1]="X" } END { print length(a) }' file


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x