I have a data frame like this which is imported from a CSV.
stock pop Date 2016-01-04 325.316 82 2016-01-11 320.036 83 2016-01-18 299.169 79 2016-01-25 296.579 84 2016-02-01 295.334 82 2016-02-08 309.777 81 2016-02-15 317.397 75 2016-02-22 328.005 80 2016-02-29 315.504 81 2016-03-07 328.802 81 2016-03-14 339.559 86 2016-03-21 352.160 82 2016-03-28 348.773 84 2016-04-04 346.482 83 2016-04-11 346.980 80 2016-04-18 357.140 75 2016-04-25 357.439 77 2016-05-02 356.443 78 2016-05-09 365.158 78 2016-05-16 352.160 72 2016-05-23 344.540 74 2016-05-30 354.998 81 2016-06-06 347.428 77 2016-06-13 341.053 78 2016-06-20 363.515 80 2016-06-27 349.669 80 2016-07-04 371.583 82 2016-07-11 358.335 81 2016-07-18 362.021 79 2016-07-25 368.844 77 ... ... ...
I wanted to add a new column MA which calculates Rolling mean for the column pop. I tried the following
df['MA']=data.rolling(5,on='pop').mean()
I get an error
ValueError: Wrong number of items passed 2, placement implies 1
So I thought let me try if it just works without adding a column. I used
data.rolling(5,on='pop').mean()
I got the output
stock pop Date 2016-01-04 NaN 82 2016-01-11 NaN 83 2016-01-18 NaN 79 2016-01-25 NaN 84 2016-02-01 307.2868 82 2016-02-08 304.1790 81 2016-02-15 303.6512 75 2016-02-22 309.4184 80 2016-02-29 313.2034 81 2016-03-07 319.8970 81 2016-03-14 325.8534 86 2016-03-21 332.8060 82 2016-03-28 336.9596 84 2016-04-04 343.1552 83 2016-04-11 346.7908 80 2016-04-18 350.3070 75 2016-04-25 351.3628 77 2016-05-02 352.8968 78 2016-05-09 356.6320 78 2016-05-16 357.6680 72 2016-05-23 355.1480 74 2016-05-30 354.6598 81 2016-06-06 352.8568 77 2016-06-13 348.0358 78 2016-06-20 350.3068 80 2016-06-27 351.3326 80 2016-07-04 354.6496 82 2016-07-11 356.8310 81 2016-07-18 361.0246 79 2016-07-25 362.0904 77 ... ... ...
I can’t seem to apply Rolling mean on the column pop. What am I doing wrong?
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
To assign a column, you can create a rolling object based on your Series:
df['new_col'] = data['column'].rolling(5).mean()
The answer posted by ac2001 is not the most performant way of doing this. He is calculating a rolling mean on every column in the dataframe, then he is assigning the “ma” column using the “pop” column. The first method of the following is much more efficient:
%timeit df['ma'] = data['pop'].rolling(5).mean() %timeit df['ma_2'] = data.rolling(5).mean()['pop'] 1000 loops, best of 3: 497 µs per loop 100 loops, best of 3: 2.6 ms per loop
I would not recommend using the second method unless you need to store computed rolling means on all other columns.
Method 2
Edit: pd.rolling_mean is deprecated in pandas and will be removed in future. Instead: Using pd.rolling you can do:
df['MA'] = df['pop'].rolling(window=5,center=False).mean()
for a dataframe df:
Date stock pop 0 2016-01-04 325.316 82 1 2016-01-11 320.036 83 2 2016-01-18 299.169 79 3 2016-01-25 296.579 84 4 2016-02-01 295.334 82 5 2016-02-08 309.777 81 6 2016-02-15 317.397 75 7 2016-02-22 328.005 80 8 2016-02-29 315.504 81 9 2016-03-07 328.802 81
To get:
Date stock pop MA 0 2016-01-04 325.316 82 NaN 1 2016-01-11 320.036 83 NaN 2 2016-01-18 299.169 79 NaN 3 2016-01-25 296.579 84 NaN 4 2016-02-01 295.334 82 82.0 5 2016-02-08 309.777 81 81.8 6 2016-02-15 317.397 75 80.2 7 2016-02-22 328.005 80 80.4 8 2016-02-29 315.504 81 79.8 9 2016-03-07 328.802 81 79.6
Documentation: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.rolling.html
Old: Although it is deprecated you can use:
df['MA']=pd.rolling_mean(df['pop'], window=5)
to get:
Date stock pop MA 0 2016-01-04 325.316 82 NaN 1 2016-01-11 320.036 83 NaN 2 2016-01-18 299.169 79 NaN 3 2016-01-25 296.579 84 NaN 4 2016-02-01 295.334 82 82.0 5 2016-02-08 309.777 81 81.8 6 2016-02-15 317.397 75 80.2 7 2016-02-22 328.005 80 80.4 8 2016-02-29 315.504 81 79.8 9 2016-03-07 328.802 81 79.6
Documentation: http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.rolling_mean.html
Method 3
This solution worked for me.
data['MA'] = data.rolling(5).mean()['pop']
I think the issue may be that the on=’pop’ is just changing the column to perform the rolling window from the index.
From the doc string: ” For a DataFrame, column on which to calculate the rolling window, rather than the index”
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0