How to print incremental count of occurrences of unique values in column 1

I’m trying to come up with an solution to this problem, I need to incrementally count and then print the counts of the unique values in column 1 of a tab delimited text file. Here is an example:

Apple_1   1      300
Apple_2   1      500
Apple_2   500    1500
Apple_2   1500   2450
Apple_3   1      1250
Apple_3   1250   2000

And the desired output is:

Apple_1   1      300     1
Apple_2   1      500     1
Apple_2   500    1500    2
Apple_2   1500   2450    3
Apple_3   1      1250    1
Apple_3   1250   2000    2

I know that I can print the line number in awk with just print NR, but I don’t know how to reset it for each unique value of column 1.

Thanks for any help you can offer, I appreciate it.

Contents hide

Answers:

Method 1

Method 2

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

The standard trick for this kind of problem in Awk is to use an associative counter array:

awk '{ print $0 "t" ++count[$1] }'

This counts the number of times the first word in each line has been seen. It’s not quite what you’re asking for, since

Apple_1   1      300
Apple_2   1      500
Apple_1   500    1500

would produce

Apple_1   1      300     1
Apple_2   1      500     1
Apple_1   500    1500    2

(the count for Apple_1 isn’t reset when we see Apple_2), but if the input is sorted you’ll be OK.

Otherwise you’d need to track a counter and last-seen key:

awk '{ if (word == $1) { counter++ } else { counter = 1; word = $1 }; print $0 "t" counter }'

Method 2

This answer doesn’t give the exact output you specified, but may be of even greater interest to other users.

If you don’t need incremental counts, but just counts of each unique value, you could use the simpler:

cut -f1 file.txt | sort | uniq -c

(Note that cut depends on tab delimiters, not just any whitespace.)

Actually, since your file is already sorted on the first field, you don’t need to sort it:

cut -f1 file.txt | uniq -c

And if you want to include these in the original file as a new, fourth column, you can use join:

cut -f1 file.txt | uniq -c | join -2 2 file.txt -

(join depends on sorted input.)

Output on the input provided is:

Apple_1 1 300 1
Apple_2 1 500 3
Apple_2 500 1500 3
Apple_2 1500 2450 3
Apple_3 1 1250 2
Apple_3 1250 2000 2

Note that join reads whitespace delimiters in an intuitive manner whether tabs or spaces, but outputs exactly one space for a delimiter. If you want your tabs back, pipe to tr ' ' 't'

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes

Article Rating