I ended up figuring it out while writing out this question so I’ll just post anyway and answer my own question in case someone else needs a little help.
Problem
Suppose we have a DataFrame, df, containing this data.
import pandas as pd from io import StringIO data = StringIO( """ date spendings category 2014-03-25 10 A 2014-04-05 20 A 2014-04-15 10 A 2014-04-25 10 B 2014-05-05 10 B 2014-05-15 10 A 2014-05-25 10 A """ ) df = pd.read_csv(data,sep="s+",parse_dates=True,index_col="date")
Goal
For each row, sum the spendings over every row that is within one month of it, ideally using DataFrame.rolling as it’s a very clean syntax.
What I have tried
df = df.rolling("M").sum()
But this throws an exception
ValueError: <MonthEnd> is a non-fixed frequency
version: pandas==0.19.2
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
Use the "D" offset rather than "M" and specifically use "30D" for 30 days or approximately one month.
df = df.rolling("30D").sum()
Initially, I intuitively jumped to using "M" as I figured it stands for one month, but now it’s clear why that doesn’t work.
Method 2
To address why you cannot use things like “AS” or “Y”, in this case, “Y” offset is not “a year”, it is actually referencing YearEnd (http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases), and therefore the rolling function does not get a fixed window (e.g. you get a 365 day window if your index falls on Jan 1, and 1 day if Dec 31).
The proposed solution (offset by 30D) works if you do not need strict calendar months. Alternatively, you would iterate over your date index, and slice with an offset to get more precise control over your sum.
If you have to do it in one line (separated for readability):
df['Sum'] = [
df.loc[
edt - pd.tseries.offsets.DateOffset(months=1):edt, 'spendings'
].sum() for edt in df.index
]
spendings category Sum
date
2014-03-25 10 A 10
2014-04-05 20 A 30
2014-04-15 10 A 40
2014-04-25 10 B 50
2014-05-05 10 B 50
2014-05-15 10 A 40
2014-05-25 10 A 40
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0