$ cat data.txt aaaaaa aaaaaa cccccc aaaaaa aaaaaa bbbbbb $ cat data.txt | uniq aaaaaa cccccc aaaaaa bbbbbb $ cat data.txt | sort | uniq aaaaaa bbbbbb cccccc $
The result that I need is to display all the lines from the original file removing all the duplicates (not just the consecutive ones), while maintaining the original order of statements in the file.
Here, in this example, the result that I actually was looking for was
aaaaaa cccccc bbbbbb
How can I perform this generalized uniq operation in general?
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
perl -ne 'print unless $seen{$_}++' data.txt
Or, if you must have a useless use of cat:
cat data.txt | perl -ne 'print unless $seen{$_}++'
Here’s an awk translation, for systems that lack Perl:
awk '!seen[$0]++' data.txt cat data.txt | awk '!seen[$0]++'
Method 2
john has a tool called unique:
<a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="fe8b8d8cbe8d8c88">[email protected]</a> % cat data.txt | unique out <a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="3f4a4c4d7f4c4d49">[email protected]</a> % cat out aaaaaa cccccc bbbbbb
To achieve the same without additional tools in a single commandline is a bit more complex:
<a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="91e4e2e3d1e2e3e7">[email protected]</a> % cat data.txt | nl | sort -k 2 | uniq -f 1 | sort -n | sed 's/s*[0-9]+s+//' aaaaaa cccccc bbbbbb
nl prints line numbers in front of the lines, so if we sort/uniq behind them, we can restore the original order of the lines. sed just deletes the line numbers afterwards 😉
Method 3
I prefer to use this:
cat -n data.txt | sort --key=2.1 -b -u | sort -n | cut -c8-
cat -n adds line numbers,
sort --key=2.1 -b -u sorts on the second field (after the added line numbers), ignoring leading blanks, keeping unique lines
sort -n sorts in strict numeric order
cut -c8- keep all characters from column 8 to EOL (i.e., omit the line numbers we included)
Method 4
Perl has a module that you can use that includes a function called uniq. So if you ave your data loaded in an array in Perl you simply call the function like this to make it unique, yet still maintain the original order.
use List::MoreUtils qw(uniq) @output = uniq(@output);
You can read more about this module here: List::MoreUtils
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0