Y1961 Y1962 Y1963 Y1964 Y1965 Region 0 82.567307 83.104757 83.183700 83.030338 82.831958 US 1 2.699372 2.610110 2.587919 2.696451 2.846247 US 2 14.131355 13.690028 13.599516 13.649176 13.649046 US 3 0.048589 0.046982 0.046583 0.046225 0.051750 US 4 0.553377 0.548123 0.582282 0.577811 0.620999 US
In the above dataframe, I would like to get average of each row. currently, I am doing this:
df.mean(axis=0)
However, this does away with the Region column as well. how can I compute mean and also retain Region column
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
You can specify a new column. You also need to compute the mean along the rows, so use axis=1.
df['mean'] = df.mean(axis=1)
>>> df
Y1961 Y1962 Y1963 Y1964 Y1965 Region mean
0 82.567307 83.104757 83.183700 83.030338 82.831958 US 82.943612
1 2.699372 2.610110 2.587919 2.696451 2.846247 US 2.688020
2 14.131355 13.690028 13.599516 13.649176 13.649046 US 13.743824
3 0.048589 0.046982 0.046583 0.046225 0.051750 US 0.048026
4 0.553377 0.548123 0.582282 0.577811 0.620999 US 0.576518
Method 2
We can find the the mean of a row using the range function, i.e in your case, from the Y1961 column to the Y1965
df['mean'] = df.iloc[:, 0:4].mean(axis=1)
And if you want to select individual columns
df['mean'] = df.iloc[:, [0,1,2,3,4].mean(axis=1)
Method 3
I think this is what you are looking for:
df.drop('Region', axis=1).apply(lambda x: x.mean(), axis=1)
Method 4
Taking the mean based on the column names
I am just sharing this which might be useful for those folks who want to take average of a few columns based on the their names, instead of counting the column index. This simply would be done using pandas’s loc instead of iloc. For instance, taking the odd-year average would be:
df["mean_odd_year"] = df.loc[:, ["Y1961","Y1963","Y1965"]].mean(axis = 1)
Method 5
If you are looking to average column wise. Try this,
df.drop('Region', axis=1).apply(lambda x: x.mean())
# it drops the Region column
df.drop('Region', axis=1,inplace=True)
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0