I’m trying to come up with an solution to this problem, I need to incrementally count and then print the counts of the unique values in column 1 of a tab delimited text file. Here is an example:
Apple_1 1 300 Apple_2 1 500 Apple_2 500 1500 Apple_2 1500 2450 Apple_3 1 1250 Apple_3 1250 2000
And the desired output is:
Apple_1 1 300 1 Apple_2 1 500 1 Apple_2 500 1500 2 Apple_2 1500 2450 3 Apple_3 1 1250 1 Apple_3 1250 2000 2
I know that I can print the line number in awk with just print NR, but I don’t know how to reset it for each unique value of column 1.
Thanks for any help you can offer, I appreciate it.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
The standard trick for this kind of problem in Awk is to use an associative counter array:
awk '{ print $0 "t" ++count[$1] }'
This counts the number of times the first word in each line has been seen. It’s not quite what you’re asking for, since
Apple_1 1 300 Apple_2 1 500 Apple_1 500 1500
would produce
Apple_1 1 300 1 Apple_2 1 500 1 Apple_1 500 1500 2
(the count for Apple_1 isn’t reset when we see Apple_2), but if the input is sorted you’ll be OK.
Otherwise you’d need to track a counter and last-seen key:
awk '{ if (word == $1) { counter++ } else { counter = 1; word = $1 }; print $0 "t" counter }'
Method 2
This answer doesn’t give the exact output you specified, but may be of even greater interest to other users.
If you don’t need incremental counts, but just counts of each unique value, you could use the simpler:
cut -f1 file.txt | sort | uniq -c
(Note that cut depends on tab delimiters, not just any whitespace.)
Actually, since your file is already sorted on the first field, you don’t need to sort it:
cut -f1 file.txt | uniq -c
And if you want to include these in the original file as a new, fourth column, you can use join:
cut -f1 file.txt | uniq -c | join -2 2 file.txt -
(join depends on sorted input.)
Output on the input provided is:
Apple_1 1 300 1 Apple_2 1 500 3 Apple_2 500 1500 3 Apple_2 1500 2450 3 Apple_3 1 1250 2 Apple_3 1250 2000 2
Note that join reads whitespace delimiters in an intuitive manner whether tabs or spaces, but outputs exactly one space for a delimiter. If you want your tabs back, pipe to tr ' ' 't'
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0