I am trying to get a list of files in a directory using Python, but I do not want a list of ALL the files.
What I essentially want is the ability to do something like the following but using Python and not executing ls.
ls 145592*.jpg
If there is no built-in method for this, I am currently thinking of writing a for loop to iterate through the results of an os.listdir() and to append all the matching files to a new list.
However, there are a lot of files in that directory and therefore I am hoping there is a more efficient method (or a built-in method).
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
import glob
jpgFilenamesList = glob.glob('145592*.jpg')
See glob in python documenttion
Method 2
glob.glob() is definitely the way to do it (as per Ignacio). However, if you do need more complicated matching, you can do it with a list comprehension and re.match(), something like so:
files = [f for f in os.listdir('.') if re.match(r'[0-9]+.*.jpg', f)]
More flexible, but as you note, less efficient.
Method 3
Keep it simple:
import os
relevant_path = "[path to folder]"
included_extensions = ['jpg','jpeg', 'bmp', 'png', 'gif']
file_names = [fn for fn in os.listdir(relevant_path)
if any(fn.endswith(ext) for ext in included_extensions)]
I prefer this form of list comprehensions because it reads well in English.
I read the fourth line as:
For each fn in os.listdir for my path, give me only the ones that match any one of my included extensions.
It may be hard for novice python programmers to really get used to using list comprehensions for filtering, and it can have some memory overhead for very large data sets, but for listing a directory and other simple string filtering tasks, list comprehensions lead to more clean documentable code.
The only thing about this design is that it doesn’t protect you against making the mistake of passing a string instead of a list. For example if you accidentally convert a string to a list and end up checking against all the characters of a string, you could end up getting a slew of false positives.
But it’s better to have a problem that’s easy to fix than a solution that’s hard to understand.
Method 4
Another option:
>>> import os, fnmatch
>>> fnmatch.filter(os.listdir('.'), '*.py')
['manage.py']
https://docs.python.org/3/library/fnmatch.html
Method 5
Filter with glob module:
Import glob
import glob
Wild Cards:
files=glob.glob("data/*")
print(files)
Out:
['data/ks_10000_0', 'data/ks_1000_0', 'data/ks_100_0', 'data/ks_100_1',
'data/ks_100_2', 'data/ks_106_0', 'data/ks_19_0', 'data/ks_200_0', 'data/ks_200_1',
'data/ks_300_0', 'data/ks_30_0', 'data/ks_400_0', 'data/ks_40_0', 'data/ks_45_0',
'data/ks_4_0', 'data/ks_500_0', 'data/ks_50_0', 'data/ks_50_1', 'data/ks_60_0',
'data/ks_82_0', 'data/ks_lecture_dp_1', 'data/ks_lecture_dp_2']
Fiter extension .txt:
files = glob.glob("/home/ach/*/*.txt")
A single character
glob.glob("/home/ach/file?.txt")
Number Ranges
glob.glob("/home/ach/*[0-9]*")
Alphabet Ranges
glob.glob("/home/ach/[a-c]*")
Method 6
Preliminary code
import glob import fnmatch import pathlib import os pattern = '*.py' path = '.'
Solution 1 – use “glob”
# lookup in current dir glob.glob(pattern) In [2]: glob.glob(pattern) Out[2]: ['wsgi.py', 'manage.py', 'tasks.py']
Solution 2 – use “os” + “fnmatch”
Variant 2.1 – Lookup in current dir
# lookup in current dir fnmatch.filter(os.listdir(path), pattern) In [3]: fnmatch.filter(os.listdir(path), pattern) Out[3]: ['wsgi.py', 'manage.py', 'tasks.py']
Variant 2.2 – Lookup recursive
# lookup recursive
for dirpath, dirnames, filenames in os.walk(path):
if not filenames:
continue
pythonic_files = fnmatch.filter(filenames, pattern)
if pythonic_files:
for file in pythonic_files:
print('{}/{}'.format(dirpath, file))
Result
./wsgi.py ./manage.py ./tasks.py ./temp/temp.py ./apps/diaries/urls.py ./apps/diaries/signals.py ./apps/diaries/actions.py ./apps/diaries/querysets.py ./apps/library/tests/test_forms.py ./apps/library/migrations/0001_initial.py ./apps/polls/views.py ./apps/polls/formsets.py ./apps/polls/reports.py ./apps/polls/admin.py
Solution 3 – use “pathlib”
# lookup in current dir
path_ = pathlib.Path('.')
tuple(path_.glob(pattern))
# lookup recursive
tuple(path_.rglob(pattern))
Notes:
- Tested on the Python 3.4
- The module “pathlib” was added only in the Python 3.4
- The Python 3.5 added a feature for recursive lookup with glob.glob
https://docs.python.org/3.5/library/glob.html#glob.glob. Since my machine is installed with Python 3.4, I have not tested that.
Method 7
use os.walk to recursively list your files
import os
root = "/home"
pattern = "145992"
alist_filter = ['jpg','bmp','png','gif']
path=os.path.join(root,"mydir_to_scan")
for r,d,f in os.walk(path):
for file in f:
if file[-3:] in alist_filter and pattern in file:
print os.path.join(root,file)
Method 8
You can use pathlib that is available in Python standard library 3.4 and above.
from pathlib import Path
files = [f for f in Path.cwd().iterdir() if f.match("145592*.jpg")]
Method 9
import os
dir="/path/to/dir"
[x[0]+"/"+f for x in os.walk(dir) for f in x[2] if f.endswith(".jpg")]
This will give you a list of jpg files with their full path. You can replace x[0]+"/"+f with f for just filenames. You can also replace f.endswith(".jpg") with whatever string condition you wish.
Method 10
you might also like a more high-level approach (I have implemented and packaged as findtools):
from findtools.find_files import (find_files, Match)
# Recursively find all *.txt files in **/home/**
txt_files_pattern = Match(filetype='f', name='*.txt')
found_files = find_files(path='/home', match=txt_files_pattern)
for found_file in found_files:
print found_file
can be installed with
pip install findtools
Method 11
Filenames with “jpg” and “png” extensions in “path/to/images”:
import os
accepted_extensions = ["jpg", "png"]
filenames = [fn for fn in os.listdir("path/to/images") if fn.split(".")[-1] in accepted_extensions]
Method 12
You can define pattern and check for it. Here I have taken both start and end pattern and looking for them in the filename. FILES contains the list of all the files in a directory.
import os
PATTERN_START = "145592"
PATTERN_END = ".jpg"
CURRENT_DIR = os.path.dirname(os.path.realpath(__file__))
for r,d,FILES in os.walk(CURRENT_DIR):
for FILE in FILES:
if PATTERN_START in FILE.startwith(PATTERN_START) and PATTERN_END in FILE.endswith(PATTERN_END):
print FILE
Method 13
You can use subprocess.check_ouput() as
import subprocess
list_files = subprocess.check_output("ls 145992*.jpg", shell=True)
Of course, the string between quotes can be anything you want to execute in the shell, and store the output.
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0