I have a DataFrame
>>> df = pd.DataFrame({'a':[1,1,1,2,2,2],
... 'b':[10,20,20,10,20,20],
... 'result':[100,200,300,400,500,600]})
...
>>> df
a b result
0 1 10 100
1 1 20 200
2 1 20 300
3 2 10 400
4 2 20 500
5 2 20 600
and want to create a new column that is the average result for the corresponding values for ‘a’ and ‘b’. I can get those values with a groupby:
>>> df.groupby(['a','b'])['result'].mean() a b 1 10 100 20 250 2 10 400 20 550 Name: result, dtype: int64
but can not figure out how to turn that into a new column in the original DataFrame. The final result should look like this,
>>> df a b result avg_result 0 1 10 100 100 1 1 20 200 250 2 1 20 300 250 3 2 10 400 400 4 2 20 500 550 5 2 20 600 550
I could do this by looping through the combinations of ‘a’ and ‘b’ but that would get really slow and unwieldy for larger sets of data. There is probably a much simpler and faster way to go.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
You need transform:
df['avg_result'] = df.groupby(['a', 'b'])['result'].transform('mean')
This generates a correctly indexed column of the groupby values for you:
a b result avg_result 0 1 10 100 100 1 1 20 200 250 2 1 20 300 250 3 2 10 400 400 4 2 20 500 550 5 2 20 600 550
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0