I have a directory with ~1M files and need to search for particular patterns. I know how to do it for all the files:
find /path/ -exec grep -H -m 1 'pattern' {} ;
The full output is not desired (too slow). Several first hits are OK, so I tried to limit number of the lines:
find /path/ -exec grep -H -m 1 'pattern' {} ; | head -n 5
This results in 5 lines followed by
find: `grep' terminated by signal 13
and find continues to work. This is well explained here. I tried quit action:
find /path/ -exec grep -H -m 1 'pattern' {} ; -quit
This outputs only the first match.
Is it possible to limit find output with specific number of results (like providing an argument to quit similar to head -n)?
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
Since you’re already using GNU extensions (-quit, -H, -m1), you might as well use GNU grep‘s -r option, together with --line-buffered so it outputs the matches as soon as they are found, so it’s more likely to be killed of a SIGPIPE as soon as it writes the 6th line:
grep -rHm1 --line-buffered pattern /path | head -n 5
With find, you’d probably need to do something like:
find /path -type f -exec sh -c '
grep -Hm1 --line-buffered pattern "<a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="ceea8e">[email protected]</a>"
[ "$(kill -l "$?")" = PIPE ] && kill -s PIPE "$PPID"
' sh {} + | head -n 5
That is, wrap grep in sh (you still want to run as few grep invocations as possible, hence the {} +), and have sh kill its parent (find) when grep dies of a SIGPIPE.
Another approach could be to use xargs as an alternative to -exec {} +. xargs exits straight away when a command it spawns dies of a signal so in:
find . -type f -print0 | xargs -r0 grep -Hm1 --line-buffered pattern | head -n 5
(-r and -0 being GNU extensions). As soon as grep writes to the broken pipe, both grep and xargs will exit and find will exit itself as well the next time it prints something after that. Running find under stdbuf -oL might make it happen sooner.
A POSIX version could be:
trap - PIPE # restore default SIGPIPE handler in case it was disabled
RE=pattern find /path -type f -exec sh -c '
for file do
awk '''
$0 ~ ENVIRON["RE"] {
print FILENAME ": " $0
exit
}''' < "$file"
if [ "$(kill -l "$?")" = PIPE ]; then
kill -s PIPE "$PPID"
exit
fi
done' sh {} + | head -n 5
Very inefficient as it runs several commands for each file.
Method 2
A solution to avoid the errors could be this:
find / -type f -print0 | xargs -0 -L 1 grep -H -m 1 --line-buffered 2>/dev/null | head -10
In this example, xargs will stop once the command fails, so there’ll be just one pipe error, which will be filtered by the stderr redirection.
Method 3
You grep one file at a time. With your -quit, you stop the find at the first successful grep.
[update] My first solution was to grep mutiple files at once:
find /path/ -type f -exec grep -H -m 1 'pattern' {} + -quit | head -n 5
(the magic is in the + at the end of the -exec sub-command. Added -type f. You may want to remove the -H option to grep if you are certain that /path/ contains several files)
The problem here, as reported by @StéphaneChazelas, is that the -exec command is executed asynchronously and returns always true => find quits at the first file.
If we want find to stop when head has finished, find must also receive the SIGPIPE that grep is getting (signal 13). That means that find must send something through the pipe.
Here is a quick-and-dirty hack, enhanced with Stéphane’s suggestions:
find /path/ -type f -exec grep -H -m 1 --line-buffered 'pattern' {} + -printf 'r' | head -n 5
With -printf 'r' I force find to output a harmless character that will (hopefully) not alter the output of grep. Once head has stopped, find will receive a SIGPIPE and stop too.
[update2] I warned you that this is a dirty hack. Here is a better solution:
find /path/ -type f -exec grep --quiet 'pattern' {} ";" -print | head -n 5
Here, this is no longer grep that prints the filename, but find => no more “grep terminated by signal 13” and find stops with head. The problem is that matched lines are no longer printed by grep.
[update3] Finally, as suggested by @Andrey, the shamelessly hideous command below would solve this last issue:
find /path/ -type f
-exec grep --quiet 'pattern' {} ;
-printf '%p:'
-exec grep -h -m 1 'pattern' {} ;
| head -n 5`
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0