I’m trying to figure out how to add 3 months to a date in a Pandas dataframe, while keeping it in the date format, so I can use it to lookup a range.
This is what I’ve tried:
#create dataframe
df = pd.DataFrame([pd.Timestamp('20161011'),
pd.Timestamp('20161101') ], columns=['date'])
#create a future month period
plus_month_period = 3
#calculate date + future period
df['future_date'] = plus_month_period.astype("timedelta64[M]")
However, I get the following error:
AttributeError: 'int' object has no attribute 'astype'
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
You could use pd.DateOffset
In [1756]: df.date + pd.DateOffset(months=plus_month_period) Out[1756]: 0 2017-01-11 1 2017-02-01 Name: date, dtype: datetime64[ns]
Another way using pd.offsets.MonthOffset
In [1785]: df.date + pd.offsets.MonthOffset(plus_month_period) Out[1785]: 0 2016-10-14 1 2016-11-04 Name: date, dtype: datetime64[ns]
Details
In [1757]: df
Out[1757]:
date
0 2016-10-11
1 2016-11-01
In [1758]: plus_month_period
Out[1758]: 3
Method 2
Suppose you have a dataframe of the following format, where you have to add integer months to a date column.
| Start_Date | Months_to_add |
|---|---|
| 2014-06-01 | 23 |
| 2014-06-01 | 4 |
| 2000-10-01 | 10 |
| 2016-07-01 | 3 |
| 2017-12-01 | 90 |
| 2019-01-01 | 2 |
In such a scenario, using Zero’s code or mattblack’s code won’t be useful. You have to use lambda function over the rows where the function takes 2 arguments –
- A date to which months need to be added to
- A month value in integer format
You can use the following function:
# Importing required modules from dateutil.relativedelta import relativedelta # Defining the function def add_months(start_date, delta_period): end_date = start_date + relativedelta(months=delta_period) return end_date
After this you can use the following code snippet to add months to the Start_Date column. Use progress_apply functionality of Pandas. Refer to this Stackoverflow answer on progress_apply : Progress indicator during pandas operations.
from tqdm import tqdm tqdm.pandas() df["End_Date"] = df.progress_apply(lambda row: add_months(row["Start_Date"], row["Months_to_add"]), axis = 1)
Here’s the full code form dataset creation, for your reference:
import pandas as pd
from dateutil.relativedelta import relativedelta
from tqdm import tqdm
tqdm.pandas()
# Initilize a new dataframe
df = pd.DataFrame()
# Add Start Date column
df["Start_Date"] = ['2014-06-01T00:00:00.000000000',
'2014-06-01T00:00:00.000000000',
'2000-10-01T00:00:00.000000000',
'2016-07-01T00:00:00.000000000',
'2017-12-01T00:00:00.000000000',
'2019-01-01T00:00:00.000000000']
# To convert the date column to a datetime format
df["Start_Date"] = pd.to_datetime(df["Start_Date"])
# Add months column
df["Months_to_add"] = [23, 4, 10, 3, 90, 2]
# Defining the Add Months function
def add_months(start_date, delta_period):
end_date = start_date + relativedelta(months=delta_period)
return end_date
# Apply function on the dataframe using lambda operation.
df["End_Date"] = df.progress_apply(lambda row: add_months(row["Start_Date"], row["Months_to_add"]), axis = 1)
You will have the final output dataframe as follows.
| Start_Date | Months_to_add | End_Date |
|---|---|---|
| 2014-06-01 | 23 | 2016-05-01 |
| 2014-06-01 | 4 | 2014-10-01 |
| 2000-10-01 | 10 | 2001-08-01 |
| 2016-07-01 | 3 | 2016-10-01 |
| 2017-12-01 | 90 | 2025-06-01 |
| 2019-01-01 | 2 | 2019-03-01 |
Please add to comments if there are any issues with the above code.
All the best!
Method 3
I believe that the simplest and most efficient (faster) way to solve this is to transform the date to monthly periods with to_period(M), add the result with the values of the Months_to_add column and then retrieve the data as datetime with the .dt.to_timestamp() command.
Using the sample data created by @Aruparna Maity
| Start_Date | Months_to_add |
|---|---|
| 2014-06-01 | 23 |
| 2014-06-20 | 4 |
| 2000-10-01 | 10 |
| 2016-07-05 | 3 |
| 2017-12-15 | 90 |
| 2019-01-01 | 2 |
df['End_Date'] = ((df['Start_Date'].dt.to_period('M')) + df['Months_to_add']).dt.to_timestamp()
df.head(6)
#output
Start_Date Months_to_add End_Date
0 2014-06-01 23 2016-05-01
1 2014-06-20 4 2014-10-01
2 2000-10-01 10 2001-08-01
3 2016-07-05 3 2016-10-01
4 2017-12-15 90 2025-06-01
5 2019-01-01 2 2019-03-01
If the exact day is needed, just repeat the process, but changing the periods to days
df['End_Date'] = ((df['End_Date'].dt.to_period('D')) + df['Start_Date'].dt.day -1).dt.to_timestamp()
#output:
Start_Date Months_to_add End_Date
0 2014-06-01 23 2016-05-01
1 2014-06-20 4 2014-10-20
2 2000-10-01 10 2001-08-01
3 2016-07-05 3 2016-10-05
4 2017-12-15 90 2025-06-15
5 2019-01-01 2 2019-03-01
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0