I have a file that may be in a different place on each user’s machine. Is there a way to implement a search for the file? A way that I can pass the file’s name and the directory tree to search in?
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
os.walk is the answer, this will find the first match:
import os
def find(name, path):
for root, dirs, files in os.walk(path):
if name in files:
return os.path.join(root, name)
And this will find all matches:
def find_all(name, path):
result = []
for root, dirs, files in os.walk(path):
if name in files:
result.append(os.path.join(root, name))
return result
And this will match a pattern:
import os, fnmatch
def find(pattern, path):
result = []
for root, dirs, files in os.walk(path):
for name in files:
if fnmatch.fnmatch(name, pattern):
result.append(os.path.join(root, name))
return result
find('*.txt', '/path/to/dir')
Method 2
In Python 3.4 or newer you can use pathlib to do recursive globbing:
>>> import pathlib
>>> sorted(pathlib.Path('.').glob('**/*.py'))
[PosixPath('build/lib/pathlib.py'),
PosixPath('docs/conf.py'),
PosixPath('pathlib.py'),
PosixPath('setup.py'),
PosixPath('test_pathlib.py')]
Reference: https://docs.python.org/3/library/pathlib.html#pathlib.Path.glob
In Python 3.5 or newer you can also do recursive globbing like this:
>>> import glob
>>> glob.glob('**/*.txt', recursive=True)
['2.txt', 'sub/3.txt']
Reference: https://docs.python.org/3/library/glob.html#glob.glob
Method 3
I used a version of os.walk and on a larger directory got times around 3.5 sec. I tried two random solutions with no great improvement, then just did:
paths = [line[2:] for line in subprocess.check_output("find . -iname '*.txt'", shell=True).splitlines()]
While it’s POSIX-only, I got 0.25 sec.
From this, I believe it’s entirely possible to optimise whole searching a lot in a platform-independent way, but this is where I stopped the research.
Method 4
If you are using Python on Ubuntu and you only want it to work on Ubuntu a substantially faster way is the use the terminal’s locate program like this.
import subprocess
def find_files(file_name):
command = ['locate', file_name]
output = subprocess.Popen(command, stdout=subprocess.PIPE).communicate()[0]
output = output.decode()
search_results = output.split('n')
return search_results
search_results is a list of the absolute file paths. This is 10,000’s of times faster than the methods above and for one search I’ve done it was ~72,000 times faster.
Method 5
For fast, OS-independent search, use scandir
https://github.com/benhoyt/scandir/#readme
Read http://bugs.python.org/issue11406 for details why.
Method 6
If you are working with Python 2 you have a problem with infinite recursion on windows caused by self-referring symlinks.
This script will avoid following those. Note that this is windows-specific!
import os
from scandir import scandir
import ctypes
def is_sym_link(path):
# http://stackoverflow.com/a/35915819
FILE_ATTRIBUTE_REPARSE_POINT = 0x0400
return os.path.isdir(path) and (ctypes.windll.kernel32.GetFileAttributesW(unicode(path)) & FILE_ATTRIBUTE_REPARSE_POINT)
def find(base, filenames):
hits = []
def find_in_dir_subdir(direc):
content = scandir(direc)
for entry in content:
if entry.name in filenames:
hits.append(os.path.join(direc, entry.name))
elif entry.is_dir() and not is_sym_link(os.path.join(direc, entry.name)):
try:
find_in_dir_subdir(os.path.join(direc, entry.name))
except UnicodeDecodeError:
print "Could not resolve " + os.path.join(direc, entry.name)
continue
if not os.path.exists(base):
return
else:
find_in_dir_subdir(base)
return hits
It returns a list with all paths that point to files in the filenames list.
Usage:
find("C:\", ["file1.abc", "file2.abc", "file3.abc", "file4.abc", "file5.abc"])
Method 7
Below we use a boolean “first” argument to switch between first match and all matches (a default which is equivalent to “find . -name file”):
import os
def find(root, file, first=False):
for d, subD, f in os.walk(root):
if file in f:
print("{0} : {1}".format(file, d))
if first == True:
break
Method 8
The answer is very similar to existing ones, but slightly optimized.
So you can find any files or folders by pattern:
def iter_all(pattern, path):
return (
os.path.join(root, entry)
for root, dirs, files in os.walk(path)
for entry in dirs + files
if pattern.match(entry)
)
either by substring:
def iter_all(substring, path):
return (
os.path.join(root, entry)
for root, dirs, files in os.walk(path)
for entry in dirs + files
if substring in entry
)
or using a predicate:
def iter_all(predicate, path):
return (
os.path.join(root, entry)
for root, dirs, files in os.walk(path)
for entry in dirs + files
if predicate(entry)
)
to search only files or only folders – replace “dirs + files”, for example, with only “dirs” or only “files”, depending on what you need.
Regards.
Method 9
SARose’s answer worked for me until I updated from Ubuntu 20.04 LTS. The slight change I made to his code makes it work on the latest Ubuntu release.
import subprocess
def find_files(file_name):
command = ['locate'+ ' ' + file_name]
output = subprocess.Popen(command, stdout=subprocess.PIPE, shell=True).communicate()[0]
output = output.decode()
search_results = output.split('n')
return search_results
Method 10
I’m new to Stack Overflow so I’m unable to comment on someone’s else answer. I’m currently working with Python 3.7.4, on Windows.
@F.M.F’s answers has a few problems in this version, so I made a few adjustments to make it work.
import os
from os import scandir
import ctypes
def is_sym_link(path):
# http://stackoverflow.com/a/35915819
FILE_ATTRIBUTE_REPARSE_POINT = 0x0400
return os.path.isdir(path) and (ctypes.windll.kernel32.GetFileAttributesW(str(path)) & FILE_ATTRIBUTE_REPARSE_POINT)
def find(base, filenames):
hits = []
def find_in_dir_subdir(direc):
content = scandir(direc)
for entry in content:
if entry.name in filenames:
hits.append(os.path.join(direc, entry.name))
elif entry.is_dir() and not is_sym_link(os.path.join(direc, entry.name)):
try:
find_in_dir_subdir(os.path.join(direc, entry.name))
except UnicodeDecodeError:
print("Could not resolve " + os.path.join(direc, entry.name))
continue
except PermissionError:
print("Skipped " + os.path.join(direc, entry.name) + ". I lacked permission to navigate")
continue
if not os.path.exists(base):
return
else:
find_in_dir_subdir(base)
return hits
unicode() was changed to str() in Python 3, so I made that adjustment (line 8)
I also added (in line 25) and exception to PermissionError. This way, the program won’t stop if it finds a folder it can’t access.
Finally, I would like to give a little warning. When running the program, even if you are looking for a single file/directory, make sure you pass it as a list. Otherwise, you will get a lot of answers that not necessarily match your search.
example of use:
find(“C:”, [“Python”, “Homework”])
or
find(“C:\”, [“Homework”])
but, for example: find(“C:\”, “Homework”) will give un-wanted answers.
I would be lying if I said I know why this happens. Again, this is not my code and I just made the adjustments I needed to make it work. All credit should go to @F.M.F.
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0