Pandas Timedelta in Days

I have a dataframe in pandas called ‘munged_data’ with two columns ‘entry_date’ and ‘dob’ which i have converted to Timestamps using pd.to_timestamp.I am trying to figure out how to calculate ages of people based on the time difference between ‘entry_date’ and ‘dob’ and to do this i need to get the difference in days between the two columns ( so that i can then do somehting like round(days/365.25). I do not seem to be able to find a way to do this using a vectorized operation. When I do munged_data.entry_date-munged_data.dob i get the following :

internal_quote_id
2                    15685977 days, 23:54:30.457856
3                    11651985 days, 23:49:15.359744
4                     9491988 days, 23:39:55.621376
7                     11907004 days, 0:10:30.196224
9                    15282164 days, 23:30:30.196224
15                  15282227 days, 23:50:40.261632

However i do not seem to be able to extract the days as an integer so that i can continue with my calculation.
Any help appreciated.

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

Using the Pandas type Timedelta available since v0.15.0 you also can do:

In[1]: import pandas as pd
In[2]: df = pd.DataFrame([ pd.Timestamp('20150111'), 
                           pd.Timestamp('20150301') ], columns=['date'])
In[3]: df['today'] = pd.Timestamp('20150315')
In[4]: df
Out[4]: 
        date      today
0 2015-01-11 2015-03-15
1 2015-03-01 2015-03-15

In[5]: (df['today'] - df['date']).dt.days
Out[5]: 
0    63
1    14
dtype: int64

Method 2

You need 0.11 for this (0.11rc1 is out, final prob next week)

In [9]: df = DataFrame([ Timestamp('20010101'), Timestamp('20040601') ])

In [10]: df
Out[10]: 
                    0
0 2001-01-01 00:00:00
1 2004-06-01 00:00:00

In [11]: df = DataFrame([ Timestamp('20010101'), 
                          Timestamp('20040601') ],columns=['age'])

In [12]: df
Out[12]: 
                  age
0 2001-01-01 00:00:00
1 2004-06-01 00:00:00

In [13]: df['today'] = Timestamp('20130419')

In [14]: df['diff'] = df['today']-df['age']

In [16]: df['years'] = df['diff'].apply(lambda x: float(x.item().days)/365)

In [17]: df
Out[17]: 
                  age               today                diff      years
0 2001-01-01 00:00:00 2013-04-19 00:00:00 4491 days, 00:00:00  12.304110
1 2004-06-01 00:00:00 2013-04-19 00:00:00 3244 days, 00:00:00   8.887671

You need this odd apply at the end because not yet full support for timedelta64[ns] scalars (e.g. like how we use Timestamps now for datetime64[ns], coming in 0.12)

Method 3

Not sure if you still need it, but in Pandas 0.14 i usually use .astype(‘timedelta64[X]’) method
http://pandas.pydata.org/pandas-docs/stable/timeseries.html (frequency conversion)

df = pd.DataFrame([ pd.Timestamp('20010101'), pd.Timestamp('20040605') ])
df.ix[0]-df.ix[1]

Returns:

0   -1251 days
dtype: timedelta64[ns]
(df.ix[0]-df.ix[1]).astype('timedelta64[Y]')

Returns:

  0   -4
 dtype: float64

Hope that will help

Method 4

Let’s specify that you have a pandas series named time_difference which has type
numpy.timedelta64[ns]

One way of extracting just the day (or whatever desired attribute) is the following:

just_day = time_difference.apply(lambda x: pd.tslib.Timedelta(x).days)

This function is used because the numpy.timedelta64 object does not have a ‘days’ attribute.

Method 5

To convert any type of data into days just use Timedelta().days:

pd.Timedelta(1985, unit='Y').days
84494


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x