Is there a function to extract the extension from a filename?
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
Yes. Use os.path.splitext(see Python 2.X documentation or Python 3.X documentation):
>>> import os
>>> filename, file_extension = os.path.splitext('/path/to/somefile.ext')
>>> filename
'/path/to/somefile'
>>> file_extension
'.ext'
Unlike most manual string-splitting attempts, os.path.splitext will correctly treat /a/b.c/d as having no extension instead of having extension .c/d, and it will treat .bashrc as having no extension instead of having extension .bashrc:
>>> os.path.splitext('/a/b.c/d')
('/a/b.c/d', '')
>>> os.path.splitext('.bashrc')
('.bashrc', '')
Method 2
New in version 3.4.
import pathlib
print(pathlib.Path('yourPath.example').suffix) # '.example'
print(pathlib.Path("hello/foo.bar.tar.gz").suffixes) # ['.bar', '.tar', '.gz']
I’m surprised no one has mentioned pathlib yet, pathlib IS awesome!
Method 3
import os.path extension = os.path.splitext(filename)[1]
Method 4
import os.path extension = os.path.splitext(filename)[1][1:]
To get only the text of the extension, without the dot.
Method 5
For simple use cases one option may be splitting from dot:
>>> filename = "example.jpeg"
>>> filename.split(".")[-1]
'jpeg'
No error when file doesn’t have an extension:
>>> "filename".split(".")[-1]
'filename'
But you must be careful:
>>> "png".split(".")[-1]
'png' # But file doesn't have an extension
Also will not work with hidden files in Unix systems:
>>> ".bashrc".split(".")[-1]
'bashrc' # But this is not an extension
For general use, prefer os.path.splitext
Method 6
worth adding a lower in there so you don’t find yourself wondering why the JPG’s aren’t showing up in your list.
os.path.splitext(filename)[1][1:].strip().lower()
Method 7
Any of the solutions above work, but on linux I have found that there is a newline at the end of the extension string which will prevent matches from succeeding. Add the strip() method to the end. For example:
import os.path extension = os.path.splitext(filename)[1][1:].strip()
Method 8
With splitext there are problems with files with double extension (e.g. file.tar.gz, file.tar.bz2, etc..)
>>> fileName, fileExtension = os.path.splitext('/path/to/somefile.tar.gz')
>>> fileExtension
'.gz'
but should be: .tar.gz
The possible solutions are here
Method 9
You can find some great stuff in pathlib module (available in python 3.x).
import pathlib
x = pathlib.PurePosixPath("C:\Path\To\File\myfile.txt").suffix
print(x)
# Output
'.txt'
Method 10
Although it is an old topic, but i wonder why there is none mentioning a very simple api of python called rpartition in this case:
to get extension of a given file absolute path, you can simply type:
filepath.rpartition('.')[-1]
example:
path = '/home/jersey/remote/data/test.csv'
print path.rpartition('.')[-1]
will give you: ‘csv’
Method 11
Just join all pathlib suffixes.
>>> x = 'file/path/archive.tar.gz' >>> y = 'file/path/text.txt' >>> ''.join(pathlib.Path(x).suffixes) '.tar.gz' >>> ''.join(pathlib.Path(y).suffixes) '.txt'
Method 12
Surprised this wasn’t mentioned yet:
import os
fn = '/some/path/a.tar.gz'
basename = os.path.basename(fn) # os independent
Out[] a.tar.gz
base = basename.split('.')[0]
Out[] a
ext = '.'.join(basename.split('.')[1:]) # <-- main part
# if you want a leading '.', and if no result `None`:
ext = '.' + ext if ext else None
Out[] .tar.gz
Benefits:
- Works as expected for anything I can think of
- No modules
- No regex
- Cross-platform
- Easily extendible (e.g. no leading dots for extension, only last part of extension)
As function:
def get_extension(filename):
basename = os.path.basename(filename) # os independent
ext = '.'.join(basename.split('.')[1:])
return '.' + ext if ext else None
Method 13
You can use a split on a filename:
f_extns = filename.split(".")
print ("The extension of the file is : " + repr(f_extns[-1]))
This does not require additional library
Method 14
filename='ext.tar.gz'
extension = filename[filename.rfind('.'):]
Method 15
Extracting extension from filename in Python
Python os module splitext()
splitext() function splits the file path into a tuple having two values – root and extension.
import os
# unpacking the tuple
file_name, file_extension = os.path.splitext("/Users/Username/abc.txt")
print(file_name)
print(file_extension)
Get File Extension using Pathlib Module
Pathlib module to get the file extension
import pathlib
pathlib.Path("/Users/pankaj/abc.txt").suffix
#output:'.txt'
Method 16
This is a direct string representation techniques :
I see a lot of solutions mentioned, but I think most are looking at split.
Split however does it at every occurrence of “.” .
What you would rather be looking for is partition.
string = "folder/to_path/filename.ext"
extension = string.rpartition(".")[-1]
Method 17
Another solution with right split:
# to get extension only
s = 'test.ext'
if '.' in s: ext = s.rsplit('.', 1)[1]
# or, to get file name and extension
def split_filepath(s):
"""
get filename and extension from filepath
filepath -> (filename, extension)
"""
if not '.' in s: return (s, '')
r = s.rsplit('.', 1)
return (r[0], r[1])
Method 18
Even this question is already answered I’d add the solution in Regex.
>>> import re >>> file_suffix = ".*(..*)" >>> result = re.search(file_suffix, "somefile.ext") >>> result.group(1) '.ext'
Method 19
you can use following code to split file name and extension.
import os.path
filenamewithext = os.path.basename(filepath)
filename, ext = os.path.splitext(filenamewithext)
#print file name
print(filename)
#print file extension
print(ext)
Method 20
A true one-liner, if you like regex.
And it doesn’t matter even if you have additional “.” in the middle
import re file_ext = re.search(r".([^.]+)$", filename).group(1)
See here for the result: Click Here
Method 21
try this:
files = ['file.jpeg','file.tar.gz','file.png','file.foo.bar','file.etc']
pen_ext = ['foo', 'tar', 'bar', 'etc']
for file in files: #1
if (file.split(".")[-2] in pen_ext): #2
ext = file.split(".")[-2]+"."+file.split(".")[-1]#3
else:
ext = file.split(".")[-1] #4
print (ext) #5
- get all file name inside the list
- splitting file name and check the penultimate extension, is it in the pen_ext list or not?
- if yes then join it with the last extension and set it as the file’s extension
- if not then just put the last extension as the file’s extension
- and then check it out
Method 22
You can use endswith to identify the file extension in python
like bellow example
for file in os.listdir():
if file.endswith('.csv'):
df1 =pd.read_csv(file)
frames.append(df1)
result = pd.concat(frames)
Method 23
For funsies… just collect the extensions in a dict, and track all of them in a folder. Then just pull the extensions you want.
import os
search = {}
for f in os.listdir(os.getcwd()):
fn, fe = os.path.splitext(f)
try:
search[fe].append(f)
except:
search[fe]=[f,]
extensions = ('.png','.jpg')
for ex in extensions:
found = search.get(ex,'')
if found:
print(found)
Method 24
# try this, it works for anything, any length of extension
# e.g www.google.com/downloads/file1.gz.rs -> .gz.rs
import os.path
class LinkChecker:
@staticmethod
def get_link_extension(link: str)->str:
if link is None or link == "":
return ""
else:
paths = os.path.splitext(link)
ext = paths[1]
new_link = paths[0]
if ext != "":
return LinkChecker.get_link_extension(new_link) + ext
else:
return ""
Method 25
def NewFileName(fichier):
cpt = 0
fic , *ext = fichier.split('.')
ext = '.'.join(ext)
while os.path.isfile(fichier):
cpt += 1
fichier = '{0}-({1}).{2}'.format(fic, cpt, ext)
return fichier
Method 26
This is The Simplest Method to get both Filename & Extension in just a single line.
fName, ext = 'C:/folder name/Flower.jpeg'.split('/')[-1].split('.')
>>> print(fName)
Flower
>>> print(ext)
jpeg
Unlike other solutions, you don’t need to import any package for this.
Method 27
a = ".bashrc"
b = "text.txt"
extension_a = a.split(".")
extension_b = b.split(".")
print(extension_a[-1]) # bashrc
print(extension_b[-1]) # txt
Method 28
name_only=file_name[:filename.index(".")
That will give you the file name up to the first “.”, which would be the most common.
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0