I have a datetime column as below –
>>> df['ACC_DATE'].head(2) 538 2006-04-07 550 2006-04-12 Name: ACC_DATE, dtype: datetime64[ns]
Now, I want to subtract an year from each row of this column. How can I achieve the same & which library can I use?
The expected field –
ACC_DATE NEW_DATE 538 2006-04-07 2005-04-07 549 2006-04-12 2005-04-12
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
You can use DateOffset to achieve this:
In[88]:
df['NEW_DATE'] = df['ACC_DATE'] - pd.DateOffset(years=1)
df
Out[88]:
ACC_DATE NEW_DATE
index
538 2006-04-07 2005-04-07
550 2006-04-12 2005-04-12
Method 2
Use DateOffset:
df["NEW_DATE"] = df["ACC_DATE"] - pd.offsets.DateOffset(years=1)
print (df)
ACC_DATE NEW_DATE
index
538 2006-04-07 2005-04-07
550 2006-04-12 2005-04-12
Method 3
You could use pd.Timedelta:
df["NEW_DATE"] = df["ACC_DATE"] - pd.Timedelta(days=365)
Or replace:
df["NEW_DATE"] = df["ACC_DATE"].apply(lambda x: x.replace(year=x.year - 1))
But neither will catch leap years so you could use dateutil.relativedelta :
from dateutil.relativedelta import relativedelta df["NEW_DATE"] = df["ACC_DATE"].apply(lambda x: x - relativedelta(years=1))
Method 4
If having a pd.Timestamp object rather than a column,
- Using
pd.DateOffset(years=n)is not ideal as it produces:
UserWarning: Discarding nonzero nanoseconds in conversion
pd.Timedelta()doesn’t accept years.
The only approach that worked for me in this case is pd.Timestamp.replace:
t = pd.Timestamp.now() t = t.replace(year=t.year - n)
This was hinted at in the answer by Padriac but it needed further clarity.
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0