Is there a way to look back to a previous row, and calculate a new variable? so as long as the previous row is the same case what is the (previous change) – (current change), and attribute it to the previous ‘ChangeEvent’ in new columns?
here is my DataFrame
>>> df ChangeEvent StartEvent case change open 0 Homeless Homeless 1 2014-03-08 00:00:00 2014-02-08 1 other Homeless 1 2014-04-08 00:00:00 2014-02-08 2 Homeless Homeless 1 2014-05-08 00:00:00 2014-02-08 3 Jail Homeless 1 2014-06-08 00:00:00 2014-02-08 4 Jail Jail 2 2014-06-08 00:00:00 2014-02-08
to add columns
Jail Homeless case 0 6 1 0 30 1 0 0 1
… and so on
here is the df build
import pandas as pd
import datetime as DT
d = {'case' : pd.Series([1,1,1,1,2]),
'open' : pd.Series([DT.datetime(2014, 3, 2), DT.datetime(2014, 3, 2),DT.datetime(2014, 3, 2),DT.datetime(2014, 3, 2),DT.datetime(2014, 3, 2)]),
'change' : pd.Series([DT.datetime(2014, 3, 8), DT.datetime(2014, 4, 8),DT.datetime(2014, 5, 8),DT.datetime(2014, 6, 8),DT.datetime(2014, 6, 8)]),
'StartEvent' : pd.Series(['Homeless','Homeless','Homeless','Homeless','Jail']),
'ChangeEvent' : pd.Series(['Homeless','irrelivant','Homeless','Jail','Jail']),
'close' : pd.Series([DT.datetime(2015, 3, 2), DT.datetime(2015, 3, 2),DT.datetime(2015, 3, 2),DT.datetime(2015, 3, 2),DT.datetime(2015, 3, 2)])}
df=pd.DataFrame(d)
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
The way to get the previous is using the shift method:
In [11]: df1.change.shift(1) Out[11]: 0 NaT 1 2014-03-08 2 2014-04-08 3 2014-05-08 4 2014-06-08 Name: change, dtype: datetime64[ns]
Now you can subtract these columns. Note: This is with 0.13.1 (datetime stuff has had a lot of work recently, so YMMV with older versions).
In [12]: df1.change.shift(1) - df1.change Out[12]: 0 NaT 1 -31 days 2 -30 days 3 -31 days 4 0 days Name: change, dtype: timedelta64[ns]
You can just apply this to each case/group:
In [13]: df.groupby('case')['change'].apply(lambda x: x.shift(1) - x)
Out[13]:
0 NaT
1 -31 days
2 -30 days
3 -31 days
4 NaT
dtype: timedelta64[ns]
Method 2
In addition to the previous responses, I’ll add a link to solving the NaT / NaN problem, so one has uninterrupted series:
How to fill NaT and NaN values separately
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0