Printing unique lines

Is there some better solution for printing unique lines other than a combination of sort and uniq?

Contents hide

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

To print each identical line only one, in any order:

sort -u

To print only the unique lines, in any order:

sort | uniq -u

To print each identical line only once, in the order of their first occurrence: (for each line, print the line if it hasn’t been seen yet, then in any case increment the seen counter)

awk '!seen[$0] {print}
     {++seen[$0]}'

To print only the unique lines, in the order of their first occurrence: (record each line in seen, and also in lines if it’s the first occurrence; at the end of the input, print the lines in order of occurrence but only the ones seen only once)

awk '!seen[$0]++ {lines[i++]=$0}
     END {for (i in lines) if (seen[lines[i]]==1) print lines[i]}'

Method 2

Some (most?) versions of sort have a -u flag that does the uniq part directly. Might be some line length restrictions depending on the implementation though, but you had those already with plain sort|uniq.

Method 3

For the last part of the answer mentioned in : Printing unique lines by @Gilles as an answer to this question, I tried to eliminate the need for using two hashes.

This solution is for : To print only the unique lines, in the order of their first occurrence:

awk '{counter[$0]++} END {for (line in counter) if (counter[line]==1) print line}'

Here, “counter” stores a count of each line that is similar to the one processed earlier.
At the end, we print only those lines, that have counter value as 1.

Method 4

Does Perl work for you? It can keep the lines in the original order, even if the duplicates are not adjacent. You could also code it in Python, or awk.

while (<>) {
    print if $lines{$_}++ == 0;
}

Which can be shortened to just

perl -ne 'print unless $lines{$_}++;'

Given input file:

abc
def
abc
ghi
abc
def
abc
ghi
jkl

It yields the output:

abc
def
ghi
jkl

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes

Article Rating