Python/Regex – How to extract date from filename using regular expression?

I need to use python to extract the date from filenames. The date is in the following format:

month-day-year.somefileextension

Examples:

10-12-2011.zip
somedatabase-10-04-2011.sql.tar.gz

The best way to extract this would be using regular expressions?

I have some code:

import re
m = re.search('(?<=-)w+', 'derer-10-12-2001.zip')
print m.group(0)

The code will print ’10’. Some clue on how to print the date?

Best Regards,

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

Assuming the date is always in the format: [MM]-[DD]-[YYYY].

re.search("([0-9]{2}-[0-9]{2}-[0-9]{4})", fileName)

Method 2

You want to use a capture group.

m = re.search('b(d{2}-d{2}-d{4}).', 'derer-10-12-2001.zip')
print m.group(1)

Should print 10-12-2001.

You could get away with a more terse regex, but ensuring that it is preceded by a - and followed by a . provides some minimal protection against double-matches with funky filenames, or malformed filenames that shouldn’t match at all.

EDIT: I replaced the initial - with a b, which matches any border between an alphanumeric and a non-alphanumeric. That way it will match whether there is a hyphen or the beginning of the string preceding the date.

Method 3

I think you can extract the date using re.split as follows

$ ipython

In [1]: import re

In [2]: input_file = '10-12-2011.zip'

In [3]: file_split = re.split('(d{2}-d{2}-d{4})', input_file, 1)

In [4]: file_split
Out[4]: ['', '10-12-2011', '.zip']

In [5]: file_split[1]
Out[5]: '10-12-2011'

In [6]: input_file = 'somedatabase-10-04-2011.sql.tar.gz'

In [7]: file_split = re.split('(d{2}-d{2}-d{4})', input_file, 1)

In [8]: file_split
Out[8]: ['somedatabase-', '10-04-2011', '.sql.tar.gz']

In [9]: file_split[1]
Out[9]: '10-04-2011'

I ran the tests with Python 3.6.6, IPython 5.3.0

Method 4

well the w+ you put in matches one or more word characters following a hypen, so that’s the expected result. What you want to do is use a lookaround on either side, matching numbers and hyphens that occur between the first hyphen and a period:

re.search(r'(?<=-)[d-]+(?=.)', name).group(0)

Method 5

**This is simple method to find date from text file in python**
import os
import re
file='rain.txt' #name of the file
if(os.path.isfile(file)): #cheak if file exists or not
    with open(file,'r') as i:
        for j in i: #we will travarse line by line in file 
            try:
                match=re.search(r'd{2}-d{2}-d{4}',j) #regular expression for date
                print(match.group()) #print date if match is found
            except AttributeError: 
                pass
else:
    print("file does not exist")


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x