A faster strptime?

I have code which reads vast numbers of dates in ‘YYYY-MM-DD’ format. Parsing all these dates, so that it can add one, two, or three days then write back in the same format is slowing things down quite considerably.

 3214657   14.330    0.000  103.698    0.000 trade.py:56(effective)
 3218418   34.757    0.000   66.155    0.000 _strptime.py:295(_strptime)

 day = datetime.datetime.strptime(endofdaydate, "%Y-%m-%d").date()

Any suggestions how to speed it up a bit (or a lot)?

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

Is factor 7 lot enough?

datetime.datetime.strptime(a, '%Y-%m-%d').date()       # 8.87us

datetime.date(*map(int, a.split('-')))                 # 1.28us

EDIT: great idea with explicit slicing:

datetime.date(int(a[:4]), int(a[5:7]), int(a[8:10]))   # 1.06us

that makes factor 8.

Method 2

Python 3.7+: fromisoformat()

Since Python 3.7, the datetime class has a method fromisoformat. It should be noted that this can also be applied to this question:

Performance vs. strptime()

Explicit string slicing may give you about a 9x increase in performance compared to normal strptime, but you can get about a 90x increase with the built-in fromisoformat method!

%timeit isofmt(datelist)
569 µs ± 8.45 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit slice2int(datelist)
5.51 ms ± 48.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit normalstrptime(datelist)
52.1 ms ± 1.27 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
from datetime import datetime, timedelta
base, n = datetime(2000, 1, 1, 1, 2, 3, 420001), 10000
datelist = [(base + timedelta(days=i)).strftime('%Y-%m-%d') for i in range(n)]

def isofmt(l):
    return list(map(datetime.fromisoformat, l))
    
def slice2int(l):   
    def slicer(t):
        return datetime(int(t[:4]), int(t[5:7]), int(t[8:10]))
    return list(map(slicer, l))

def normalstrptime(l):
    return [datetime.strptime(t, '%Y-%m-%d') for t in l]
    
print(isofmt(datelist[0:1]))
print(slice2int(datelist[0:1]))
print(normalstrptime(datelist[0:1]))

# [datetime.datetime(2000, 1, 1, 0, 0)]
# [datetime.datetime(2000, 1, 1, 0, 0)]
# [datetime.datetime(2000, 1, 1, 0, 0)]

Python 3.8.3rc1 x64 / Win10

Method 3

For an ISO-formatted timezone-free string, eg.: "2021-01-04T14:30:03.123":

datetime.datetime(int(d[:4]), int(d[5:7]), int(d[8:10]), int(d[11:13]), int(d[14:16]), int(d[17:19]), int(d[20:]))

Seems to run faster than strptime() and fromisoformat().


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x