I need to use python to extract the date from filenames. The date is in the following format:
month-day-year.somefileextension
Examples:
10-12-2011.zip somedatabase-10-04-2011.sql.tar.gz
The best way to extract this would be using regular expressions?
I have some code:
import re
m = re.search('(?<=-)w+', 'derer-10-12-2001.zip')
print m.group(0)
The code will print ’10’. Some clue on how to print the date?
Best Regards,
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
Assuming the date is always in the format: [MM]-[DD]-[YYYY].
re.search("([0-9]{2}-[0-9]{2}-[0-9]{4})", fileName)
Method 2
You want to use a capture group.
m = re.search('b(d{2}-d{2}-d{4}).', 'derer-10-12-2001.zip')
print m.group(1)
Should print 10-12-2001.
You could get away with a more terse regex, but ensuring that it is preceded by a - and followed by a . provides some minimal protection against double-matches with funky filenames, or malformed filenames that shouldn’t match at all.
EDIT: I replaced the initial - with a b, which matches any border between an alphanumeric and a non-alphanumeric. That way it will match whether there is a hyphen or the beginning of the string preceding the date.
Method 3
I think you can extract the date using re.split as follows
$ ipython
In [1]: import re
In [2]: input_file = '10-12-2011.zip'
In [3]: file_split = re.split('(d{2}-d{2}-d{4})', input_file, 1)
In [4]: file_split
Out[4]: ['', '10-12-2011', '.zip']
In [5]: file_split[1]
Out[5]: '10-12-2011'
In [6]: input_file = 'somedatabase-10-04-2011.sql.tar.gz'
In [7]: file_split = re.split('(d{2}-d{2}-d{4})', input_file, 1)
In [8]: file_split
Out[8]: ['somedatabase-', '10-04-2011', '.sql.tar.gz']
In [9]: file_split[1]
Out[9]: '10-04-2011'
I ran the tests with Python 3.6.6, IPython 5.3.0
Method 4
well the w+ you put in matches one or more word characters following a hypen, so that’s the expected result. What you want to do is use a lookaround on either side, matching numbers and hyphens that occur between the first hyphen and a period:
re.search(r'(?<=-)[d-]+(?=.)', name).group(0)
Method 5
**This is simple method to find date from text file in python**
import os
import re
file='rain.txt' #name of the file
if(os.path.isfile(file)): #cheak if file exists or not
with open(file,'r') as i:
for j in i: #we will travarse line by line in file
try:
match=re.search(r'd{2}-d{2}-d{4}',j) #regular expression for date
print(match.group()) #print date if match is found
except AttributeError:
pass
else:
print("file does not exist")
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0