I have two columns, fromdate and todate, in a dataframe.
import pandas as pd
data = {'todate': [pd.Timestamp('2014-01-24 13:03:12.050000'), pd.Timestamp('2014-01-27 11:57:18.240000'), pd.Timestamp('2014-01-23 10:07:47.660000')],
'fromdate': [pd.Timestamp('2014-01-26 23:41:21.870000'), pd.Timestamp('2014-01-27 15:38:22.540000'), pd.Timestamp('2014-01-23 18:50:41.420000')]}
df = pd.DataFrame(data)
I add a new column, diff, to find the difference between the two dates using
df['diff'] = df['fromdate'] - df['todate']
I get the diff column, but it contains days, when there’s more than 24 hours.
todate fromdate diff
0 2014-01-24 13:03:12.050 2014-01-26 23:41:21.870 2 days 10:38:09.820000
1 2014-01-27 11:57:18.240 2014-01-27 15:38:22.540 0 days 03:41:04.300000
2 2014-01-23 10:07:47.660 2014-01-23 18:50:41.420 0 days 08:42:53.760000
How do I convert my results to only hours and minutes (i.e. days are converted to hours)?
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
Pandas timestamp differences returns a datetime.timedelta object. This can easily be converted into hours by using the *as_type* method, like so
import pandas
df = pandas.DataFrame(columns=['to','fr','ans'])
df.to = [pandas.Timestamp('2014-01-24 13:03:12.050000'), pandas.Timestamp('2014-01-27 11:57:18.240000'), pandas.Timestamp('2014-01-23 10:07:47.660000')]
df.fr = [pandas.Timestamp('2014-01-26 23:41:21.870000'), pandas.Timestamp('2014-01-27 15:38:22.540000'), pandas.Timestamp('2014-01-23 18:50:41.420000')]
(df.fr-df.to).astype('timedelta64[h]')
to yield,
0 58 1 3 2 8 dtype: float64
Method 2
This was driving me bonkers as the .astype() solution above didn’t work for me. But I found another way. Haven’t timed it or anything, but might work for others out there:
t1 = pd.to_datetime('1/1/2015 01:00')
t2 = pd.to_datetime('1/1/2015 03:30')
print pd.Timedelta(t2 - t1).seconds / 3600.0
…if you want hours. Or:
print pd.Timedelta(t2 - t1).seconds / 60.0
…if you want minutes.
UPDATE: There used to be a helpful comment here that mentioned using .total_seconds() for time periods spanning multiple days. Since it’s gone, I’ve updated the answer.
Method 3
- How do I convert my results to only hours and minutes
- The accepted answer only returns
days + hours. Minutes are not included.
- The accepted answer only returns
- To provide a column that has hours and minutes as
hh:mmorx hours y minutes, would require additional calculations and string formatting. - This answer shows how to get either total hours or total minutes as a float, using
timedeltamath, and is faster than using.astype('timedelta64[h]'). - Pandas Time Deltas User Guide
- Pandas Time series / date functionality User Guide
- python
timedeltaobjects: See supported operations. - The following sample data is already a
datetime64[ns] dtype. It is required that all relevant columns are converted usingpandas.to_datetime().
import pandas as pd
# test data from OP, with values already in a datetime format
data = {'to_date': [pd.Timestamp('2014-01-24 13:03:12.050000'), pd.Timestamp('2014-01-27 11:57:18.240000'), pd.Timestamp('2014-01-23 10:07:47.660000')],
'from_date': [pd.Timestamp('2014-01-26 23:41:21.870000'), pd.Timestamp('2014-01-27 15:38:22.540000'), pd.Timestamp('2014-01-23 18:50:41.420000')]}
# test dataframe; the columns must be in a datetime format; use pandas.to_datetime if needed
df = pd.DataFrame(data)
# add a timedelta column if wanted. It's added here for information only
# df['time_delta_with_sub'] = df.from_date.sub(df.to_date) # also works
df['time_delta'] = (df.from_date - df.to_date)
# create a column with timedelta as total hours, as a float type
df['tot_hour_diff'] = (df.from_date - df.to_date) / pd.Timedelta(hours=1)
# create a colume with timedelta as total minutes, as a float type
df['tot_mins_diff'] = (df.from_date - df.to_date) / pd.Timedelta(minutes=1)
# display(df)
to_date from_date time_delta tot_hour_diff tot_mins_diff
0 2014-01-24 13:03:12.050 2014-01-26 23:41:21.870 2 days 10:38:09.820000 58.636061 3518.163667
1 2014-01-27 11:57:18.240 2014-01-27 15:38:22.540 0 days 03:41:04.300000 3.684528 221.071667
2 2014-01-23 10:07:47.660 2014-01-23 18:50:41.420 0 days 08:42:53.760000 8.714933 522.896000
Other methods
- An item of note from the podcast in Other Resources,
.total_seconds()was added and merged when the core developer was on vacation, and would not have been approved.- This is also why there aren’t other
.total_xxmethods.
- This is also why there aren’t other
# convert the entire timedelta to seconds
# this is the same as td / timedelta(seconds=1)
(df.from_date - df.to_date).dt.total_seconds()
[out]:
0 211089.82
1 13264.30
2 31373.76
dtype: float64
# get the number of days
(df.from_date - df.to_date).dt.days
[out]:
0 2
1 0
2 0
dtype: int64
# get the seconds for hours + minutes + seconds, but not days
# note the difference from total_seconds
(df.from_date - df.to_date).dt.seconds
[out]:
0 38289
1 13264
2 31373
dtype: int64
Other Resources
- Talk Python to Me: Episode #271: Unlock the mysteries of time, Python’s datetime that is!
- Timedelta begins at 31 minutes
- As per Python core developer Paul Ganssle and python
dateutilmaintainer:- Use
(df.from_date - df.to_date) / pd.Timedelta(hours=1) - Don’t use
(df.from_date - df.to_date).dt.total_seconds() / 3600
- Use
- Real Python: Using Python datetime to Work With Dates and Times
- The
dateutilmodule provides powerful extensions to the standarddatetimemodule.
%%timeit test
import pandas as pd
# dataframe with 2M rows
data = {'to_date': [pd.Timestamp('2014-01-24 13:03:12.050000'), pd.Timestamp('2014-01-27 11:57:18.240000')], 'from_date': [pd.Timestamp('2014-01-26 23:41:21.870000'), pd.Timestamp('2014-01-27 15:38:22.540000')]}
df = pd.DataFrame(data)
df = pd.concat([df] * 1000000).reset_index(drop=True)
%%timeit
(df.from_date - df.to_date) / pd.Timedelta(hours=1)
[out]:
43.1 ms ± 1.05 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%%timeit
(df.from_date - df.to_date).astype('timedelta64[h]')
[out]:
59.8 ms ± 1.29 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0