I have code which reads vast numbers of dates in ‘YYYY-MM-DD’ format. Parsing all these dates, so that it can add one, two, or three days then write back in the same format is slowing things down quite considerably.
3214657 14.330 0.000 103.698 0.000 trade.py:56(effective) 3218418 34.757 0.000 66.155 0.000 _strptime.py:295(_strptime) day = datetime.datetime.strptime(endofdaydate, "%Y-%m-%d").date()
Any suggestions how to speed it up a bit (or a lot)?
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
Is factor 7 lot enough?
datetime.datetime.strptime(a, '%Y-%m-%d').date() # 8.87us
datetime.date(*map(int, a.split('-'))) # 1.28us
EDIT: great idea with explicit slicing:
datetime.date(int(a[:4]), int(a[5:7]), int(a[8:10])) # 1.06us
that makes factor 8.
Method 2
Python 3.7+: fromisoformat()
Since Python 3.7, the datetime class has a method fromisoformat. It should be noted that this can also be applied to this question:
Performance vs. strptime()
Explicit string slicing may give you about a 9x increase in performance compared to normal strptime, but you can get about a 90x increase with the built-in fromisoformat method!
%timeit isofmt(datelist) 569 µs ± 8.45 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) %timeit slice2int(datelist) 5.51 ms ± 48.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) %timeit normalstrptime(datelist) 52.1 ms ± 1.27 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
from datetime import datetime, timedelta
base, n = datetime(2000, 1, 1, 1, 2, 3, 420001), 10000
datelist = [(base + timedelta(days=i)).strftime('%Y-%m-%d') for i in range(n)]
def isofmt(l):
return list(map(datetime.fromisoformat, l))
def slice2int(l):
def slicer(t):
return datetime(int(t[:4]), int(t[5:7]), int(t[8:10]))
return list(map(slicer, l))
def normalstrptime(l):
return [datetime.strptime(t, '%Y-%m-%d') for t in l]
print(isofmt(datelist[0:1]))
print(slice2int(datelist[0:1]))
print(normalstrptime(datelist[0:1]))
# [datetime.datetime(2000, 1, 1, 0, 0)]
# [datetime.datetime(2000, 1, 1, 0, 0)]
# [datetime.datetime(2000, 1, 1, 0, 0)]
Python 3.8.3rc1 x64 / Win10
Method 3
For an ISO-formatted timezone-free string, eg.: "2021-01-04T14:30:03.123":
datetime.datetime(int(d[:4]), int(d[5:7]), int(d[8:10]), int(d[11:13]), int(d[14:16]), int(d[17:19]), int(d[20:]))
Seems to run faster than strptime() and fromisoformat().
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0