Suppose you have a file like this:
NW_006521251.1 428 84134 NW_006521251.1 511 84135 NW_006521038.1 202 84155 NW_006521038.1 1743 84153 NW_006521038.1 1743 84154 NW_006520495.1 198 84159 NW_006520086.1 473 84178 NW_006520086.1 511 84180
I want to keep the unique rows based on columns 1 and 2 (i.e. not just column two as this number may repeat under a different label in column one).
Such that I get this as output (removes the second repeat of NW_006521038.1 1743 from the list):
NW_006521251.1 428 84134
NW_006521251.1 511 84135
NW_006521038.1 202 84155
NW_006521038.1 1743 84153
NW_006520495.1 198 84159
NW_006520086.1 473 84178
NW_006520086.1 511 84180
Is there a way to do this with awk?
Using uniq file doesn’t work.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
There is a “famous” awk idiom for exactly this. You want to do:
awk '!seen[$1,$2]++' file
That creates an associative array “seen” with the 2 columns as the key. Use the post-increment operator so that, for the first time you encounter that key, the value is zero. The use the negation operator for a “true” result the first time you see the key.
Method 2
If you don’t mind that the output is sorted:
sort -u -k1,2 file
-u– unique-k1,2– use fields 1 and 2 together as the key
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0