I am trying to calculate the mean of other values by excluding the focal company. I know it is a little bit complicated but let me explain:
Such as, assume the below code is my data:
d = {'col1': ["A", "A", "A", "B", "B", "B", "c", "c","c", "d", "d", "d", "e", "e", "e"],
'col2': [2015, 2016, 2017, 2015, 2016, 2017, 2015, 2016, 2017, 2015, 2016, 2017, 2015, 2016, 2017],
'col3': [10, 20, 25, 10, 12, 14, 8, 9, 10, 50, 60, 70, 40, 50, 60],
'group':[10, 10, 10, 10, 10, 10, 10, 10, 10, 20, 20, 20, 20, 20,20]}
df = pd.DataFrame(d)
I would like to get (B+C)’s mean for 2015 and add it into a new column in A.2016, by considering df.group. So, we need to take mean by df.group for the previous year, by excluding the focal item.
The result should correspond to this:
d = {'col1': ["A", "A", "A", "B", "B", "B", "c", "c", "c", "d", "d", "d", "e", "e", "e"],
'col2': [2015, 2016, 2017, 2015, 2016, 2017, 2015, 2016, 2017, 2015, 2016, 2017, 2015, 2016, 2017],
'col3': [10, 20, 25, 10, 12, 14, 8, 9, 10, 50, 60, 70, 40, 50, 60],
'group':[10, 10, 10, 10, 10, 10, 10, 10, 10, 20, 20, 20, 20, 20,20],
'operation':['0', '(B2015+C2015)/2', '(B2016+C2016)/2', '0', '(A2015+C2015)/2', '(A2016+C2016)/2', '0', '(A2015+B2015)/2', '(A2016+B2016)/2',"0", "E2015", "E2016", "0","D2015", "D2016" ],
'mean': [nan, 9, 10.5, nan, 9, 14.5, nan, 10, 16, nan, 40, 50, nan, 50, 60]}
output = pd.DataFrame(d)
>>> output
col1 col2 col3 group operation mean
0 A 2015 10 10 nan 0.0
1 A 2016 20 10 (B2015+C2015)/2 9.0
2 A 2017 25 10 (B2016+C2016)/2 10.5
3 B 2015 10 10 0 0.0
4 B 2016 12 10 (A2015+C2015)/2 9.0
5 B 2017 14 10 (A2016+C2016)/2 14.5
6 c 2015 8 10 0 0.0
7 c 2016 9 10 (A2015+B2015)/2 10.0
8 c 2017 10 10 (A2016+B2016)/2 16.0
9 d 2015 50 20 0 0.0
10 d 2016 60 20 E2015 40.0
11 d 2017 70 20 E2016 50.0
12 e 2015 40 20 0 0.0
13 e 2016 50 20 D2015 50.0
14 e 2017 60 20 D2016 60.0
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
- Compute the means of all other values within each group using a double
groupby:
sumall the values within the group- subtract the current (focal) value
- divide by one less than the number of items in the group
- Assign the
shift-ed means to a new column:
means = df.groupby("group").apply(lambda x: x.groupby("col2")["col3"].transform("sum").sub(x["col3"]).div(len(x["col1"].unique())-1)).droplevel(0)
df["mean"] = means.shift().where(df["col1"].eq(df["col1"].shift()),0)
>>> df
col1 col2 col3 group mean
0 A 2015 10 10 0.0
1 A 2016 20 10 9.0
2 A 2017 25 10 10.5
3 B 2015 10 10 0.0
4 B 2016 12 10 9.0
5 B 2017 14 10 14.5
6 c 2015 8 10 0.0
7 c 2016 9 10 10.0
8 c 2017 10 10 16.0
9 d 2015 50 20 0.0
10 d 2016 60 20 40.0
11 d 2017 70 20 50.0
12 e 2015 40 20 0.0
13 e 2016 50 20 50.0
14 e 2017 60 20 60.0
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0