How to get only the unique results without having to sort data?

$ cat data.txt 
aaaaaa
aaaaaa
cccccc
aaaaaa
aaaaaa
bbbbbb
$ cat data.txt | uniq
aaaaaa
cccccc
aaaaaa
bbbbbb
$ cat data.txt | sort | uniq
aaaaaa
bbbbbb
cccccc
$

The result that I need is to display all the lines from the original file removing all the duplicates (not just the consecutive ones), while maintaining the original order of statements in the file.

Here, in this example, the result that I actually was looking for was

aaaaaa
cccccc
bbbbbb

How can I perform this generalized uniq operation in general?

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

perl -ne 'print unless $seen{$_}++' data.txt

Or, if you must have a useless use of cat:

cat data.txt | perl -ne 'print unless $seen{$_}++'

Here’s an awk translation, for systems that lack Perl:

awk '!seen[$0]++' data.txt
cat data.txt | awk '!seen[$0]++'

Method 2

john has a tool called unique:

<a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="fe8b8d8cbe8d8c88">[email protected]</a> % cat data.txt | unique out
<a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="3f4a4c4d7f4c4d49">[email protected]</a> % cat out
aaaaaa
cccccc
bbbbbb

To achieve the same without additional tools in a single commandline is a bit more complex:

<a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="91e4e2e3d1e2e3e7">[email protected]</a> % cat data.txt | nl | sort -k 2 | uniq -f 1 | sort -n | sed 's/s*[0-9]+s+//'
aaaaaa
cccccc
bbbbbb

nl prints line numbers in front of the lines, so if we sort/uniq behind them, we can restore the original order of the lines. sed just deletes the line numbers afterwards 😉

Method 3

I prefer to use this:

cat -n data.txt | sort --key=2.1 -b -u | sort -n | cut -c8-

cat -n adds line numbers,

sort --key=2.1 -b -u sorts on the second field (after the added line numbers), ignoring leading blanks, keeping unique lines

sort -n sorts in strict numeric order

cut -c8- keep all characters from column 8 to EOL (i.e., omit the line numbers we included)

Method 4

Perl has a module that you can use that includes a function called uniq. So if you ave your data loaded in an array in Perl you simply call the function like this to make it unique, yet still maintain the original order.

use List::MoreUtils qw(uniq)    
@output = uniq(@output);

You can read more about this module here: List::MoreUtils


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x