"Ungrep" – which patterns aren't matched

I’m looking for a command or script to do the following – given:

file1.txt:

abcd
efgh 
ijkl
mnop

file2.txt:

123abcd123
123efgh123
123mnop123

I want a command that does something like this:

ungrep file1.txt file2.txt

and returns the following:

ijkl

In other words it is giving me the lines in file1.txt that will not return any results on a grep of file2.txt. I know that I can do this by iterating through file1.txt, grepping file2.txt for each line and storing the result, and outputting any lines where the result is empty, but I was hoping for a more efficient way to do this.

Contents hide

Answers:

Method 1

Method 2

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

With GNU grep the following should work. Using the -f option, pass file1.txt as a “pattern file” – but also pass it in a second time as a data file. Use -o to report only the matching parts. Finally extracts those words that match only once – these correspond to the lines from file1.txt that do not find a match in file2.txt.

grep -h -o -f  file1.txt file2.txt file1.txt | sort | uniq -u
ijkl

Method 2

You could do it with awk like:

awk '
  NR == FNR {w[$0]; next}
  {for (i in w) if (index($0,i)) delete w[i]}
  END {for (i in w) print i}' file1.txt file2.txt

By using index, we’re looking for substrings rather than matching regular expressions.

Because we delete the word from the array as soon as we find a match, we avoid unnecessary searches.

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes

Article Rating