Creating a new column based on the mean of other values in group

I am trying to calculate the mean of other values by excluding the focal company. I know it is a little bit complicated but let me explain:

Such as, assume the below code is my data:

d = {'col1': ["A", "A", "A", "B", "B", "B", "c", "c","c", "d", "d", "d", "e", "e", "e"], 
     'col2': [2015, 2016, 2017, 2015, 2016, 2017, 2015, 2016, 2017, 2015, 2016, 2017, 2015, 2016, 2017], 
     'col3': [10, 20, 25, 10, 12, 14, 8, 9, 10, 50, 60, 70, 40, 50, 60],
     'group':[10, 10, 10, 10, 10, 10, 10, 10, 10, 20, 20, 20, 20, 20,20]} 
df = pd.DataFrame(d)

I would like to get (B+C)’s mean for 2015 and add it into a new column in A.2016, by considering df.group. So, we need to take mean by df.group for the previous year, by excluding the focal item.

The result should correspond to this:

d = {'col1': ["A", "A", "A", "B", "B", "B", "c", "c", "c", "d", "d", "d", "e", "e", "e"], 
     'col2': [2015, 2016, 2017, 2015, 2016, 2017, 2015, 2016, 2017, 2015, 2016, 2017, 2015, 2016, 2017], 
     'col3': [10, 20, 25, 10, 12, 14, 8, 9, 10, 50, 60, 70, 40, 50, 60],
     'group':[10, 10, 10, 10, 10, 10, 10, 10, 10, 20, 20, 20, 20, 20,20],  
     'operation':['0', '(B2015+C2015)/2', '(B2016+C2016)/2', '0', '(A2015+C2015)/2', '(A2016+C2016)/2', '0', '(A2015+B2015)/2', '(A2016+B2016)/2',"0", "E2015", "E2016", "0","D2015", "D2016" ], 
     'mean': [nan, 9, 10.5, nan, 9, 14.5, nan, 10, 16, nan, 40, 50, nan, 50, 60]}
output = pd.DataFrame(d)

>>> output 
   col1  col2  col3  group        operation  mean
0     A  2015    10     10              nan   0.0
1     A  2016    20     10  (B2015+C2015)/2   9.0
2     A  2017    25     10  (B2016+C2016)/2  10.5
3     B  2015    10     10                0   0.0
4     B  2016    12     10  (A2015+C2015)/2   9.0
5     B  2017    14     10  (A2016+C2016)/2  14.5
6     c  2015     8     10                0   0.0
7     c  2016     9     10  (A2015+B2015)/2  10.0
8     c  2017    10     10  (A2016+B2016)/2  16.0
9     d  2015    50     20                0   0.0
10    d  2016    60     20            E2015  40.0
11    d  2017    70     20            E2016  50.0
12    e  2015    40     20                0   0.0
13    e  2016    50     20            D2015  50.0
14    e  2017    60     20            D2016  60.0

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

  1. Compute the means of all other values within each group using a double groupby:
  • sum all the values within the group
  • subtract the current (focal) value
  • divide by one less than the number of items in the group
  1. Assign the shift-ed means to a new column:
means = df.groupby("group").apply(lambda x: x.groupby("col2")["col3"].transform("sum").sub(x["col3"]).div(len(x["col1"].unique())-1)).droplevel(0)

df["mean"] = means.shift().where(df["col1"].eq(df["col1"].shift()),0)

>>> df
   col1  col2  col3  group  mean
0     A  2015    10     10   0.0
1     A  2016    20     10   9.0
2     A  2017    25     10  10.5
3     B  2015    10     10   0.0
4     B  2016    12     10   9.0
5     B  2017    14     10  14.5
6     c  2015     8     10   0.0
7     c  2016     9     10  10.0
8     c  2017    10     10  16.0
9     d  2015    50     20   0.0
10    d  2016    60     20  40.0
11    d  2017    70     20  50.0
12    e  2015    40     20   0.0
13    e  2016    50     20  50.0
14    e  2017    60     20  60.0


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x