How can I get a list of the subdirectories which contain a file whose name matches a particular pattern?
More specifically, I am looking for directories which contain a file with the letter ‘f’ somewhere occurring in the file name.
Ideally, the list would not have duplicates and only contain the path without the filename.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
find . -type f -name '*f*' | sed -r 's|/[^/]+$||' |sort |uniq
The above finds all files below the current directory (.) that are regular files (-type f) and have f somewhere in their name (-name '*f*'). Next, sed removes the file name, leaving just the directory name. Then, the list of directories is sorted (sort) and duplicates removed (uniq).
The sed command consists of a single substitute. It looks for matches to the regular expression /[^/]+$ and replaces anything matching that with nothing. The dollar sign means the end of the line. [^/]+' means one or more characters that are not slashes. Thus, /[^/]+$ means all characters from the final slash to the end of the line. In other words, this matches the file name at the end of the full path. Thus, the sed command removes the file name, leaving unchanged the name of directory that the file was in.
Simplifications
Many modern sort commands support a -u flag which makes uniq unnecessary. For GNU sed:
find . -type f -name '*f*' | sed -r 's|/[^/]+$||' |sort -u
And, for MacOS sed:
find . -type f -name '*f*' | sed -E 's|/[^/]+$||' |sort -u
Also, if your find command supports it, it is possible to have find print the directory names directly. This avoids the need for sed:
find . -type f -name '*f*' -printf '%hn' | sort -u
More robust version (Requires GNU tools)
The above versions will be confused by file names that include newlines. A more robust solution is to do the sorting on NUL-terminated strings:
find . -type f -name '*f*' -printf '%h' | sort -zu | sed -z 's/$/n/'
Simplified using dirname
Imagine needing the command in a script where command will be in single quotes, escaping sed command is painful and less than ideal, so replace with dirname.
Issues regard special chars and newline are also mute if you did not need to sort or directories names are not affected.
find . -type f -name "*f*" -exec dirname "{}" ; |sort -u
take care of newline issue:
find . -type f -name "*f*" -exec dirname -z "{}" ; |sort -zu |sed -z 's/$/n/'
Method 2
Why not try this:
find / -name '*f*' -printf "%hn" | sort -u
Method 3
There are essentially 2 methods you can use to do this. One will parse the string while the other will operate on each file. Parsing the string use a tool such as grep, sed, or awk is obviously going to be faster but here’s an example showing both, as well as how you can “profile” the 2 methods.
Sample data
For the examples below we’ll use the following data
$ touch dir{1..3}/dir{100..112}/file{1..5}
$ touch dir{1..3}/dir{100..112}/nile{1..5}
$ touch dir{1..3}/dir{100..112}/knife{1..5}
Delete some of the *f* files from dir1/*:
$ rm dir1/dir10{0..2}/*f*
Approach #1 – Parsing via strings
Here we’re going to use the following tools, find, grep, and sort.
$ find . -type f -name '*f*' | grep -o "(.*)/" | sort -u | head -5 ./dir1/dir103/ ./dir1/dir104/ ./dir1/dir105/ ./dir1/dir106/ ./dir1/dir107/
Approach #2 – Parsing using files
Same tool chain as before, except this time we’ll be using dirname instead of grep.
$ find . -type f -name '*f*' -exec dirname {} ; | sort -u | head -5
./dir1/dir103
./dir1/dir104
./dir1/dir105
./dir1/dir106
./dir1/dir107
NOTE: The above examples are using head -5 to merely limit the amount of output we’re dealing with for these examples. They’d normally be removed to get your full listing!
Comparing the results
We can use time to take a look at the 2 approaches.
dirname
real 0m0.372s user 0m0.028s sys 0m0.106s
grep
real 0m0.012s user 0m0.009s sys 0m0.007s
So it’s always best to deal with the strings if possible.
Alternative string parsing methods
grep & PCRE
$ find . -type f -name '*f*' | grep -oP '^.*(?=/)' | sort -u
sed
$ find . -type f -name '*f*' | sed 's#/[^/]*$##' | sort -u
awk
$ find . -type f -name '*f*' | awk -F'/[^/]*$' '{print $1}' | sort -u
Method 4
Here’s one I find useful:
find . -type f -name "*somefile*" | xargs dirname | sort | uniq
Method 5
You can use the -exec switch to run dirname and get the directory name instead of the file name. This has the added benefit of being POSIX compatible.
find . -name "*file*" -exec dirname {} ;
Method 6
This answer is shamelessly based on slm answer. It was an interesting approach, but it has a limitation if the file and/or directory names had special chars (space, semi-column…). A good habit is to use find /somewhere -print0 | xargs -0 someprogam.
Sample data
For the examples below we’ll use the following data
mkdir -p dir{1..3}/dir {100..112}
touch dir{1..3}/dir {100..112}/nile{1..5}
touch dir{1..3}/dir {100..112}/file{1..5}
touch dir{1..3}/dir {100..112}/kni fe{1..5}
Delete some of the *f* files from dir1/*/:
rm dir1/dir 10{0..2}/*f*
Approach #1 – Parsing using files
$ find -type f -name '*f*' -print0 | sed -e 's#/[^/]*x00#x00#g' | sort -zu | xargs -0 -n1 echo | head -n5 ./dir1/dir 103 ./dir1/dir 104 ./dir1/dir 105 ./dir1/dir 106 ./dir1/dir 107
NOTE: The above examples are using head -5 to merely limit the amount of output we’re dealing with for these examples. They’d normally be removed to get your full listing! also, replace the echowhich whatever command you want to use.
Method 7
With zsh:
typeset -aU dirs # array with unique values dirs=(**/*f*(D:h)) printf '%sn' $dirs
Method 8
I’ve found this variation that doesn’t use sort or unique usefull
find . -type d -print0 | xargs -0 -I{} find {} -maxdepth 1 -iname '*.log' -print -quit
The advantage is that you don’t have to wait for the whole tree to be traversed before sorted.
-
Find all directories
find . -type d -print0 -
For each directory
| xargs -0 -I{}, find a file in the current directory-maxdepth 1that matches the pattern-iname '*.log'(case insensitive). If found, print the filename-printand stop traversing that directoryquit
Alternatively
find . -type d -print0 | xargs -0 -I// find // -maxdepth 1 -iname '*.log' -exec dirname {} ; -quit
which just prints the parent directories name, as inspired by Snowbuilders answer.
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0