I’ve to write a function (column_means), that calculates the mean of each column from Dataframe and give me a list of means at the end. I’m not allowed to use the mean function .mean(), so I’m implementing the general formula of the mean: sum(x_i)/Number of elements.
This is my code:
df = pd.DataFrame({'a':[1,2,3], 'b': [4,5,6]})
def column_means(df):
means = []
for i,n in zip(df.columns, df.shape[0]):
means [n] = sum(df[i])/ df.shape[0]
return means
It doesn’t work as intended. could you please help me and tell me, what are my mistakes?
Thank you in advance.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
You are iterating over int in zip function, as df.shape[0] is returning single integer and not an iterable datatype.
So you can simply do as following:
def column_means(df):
means = []
for i in df.columns:
means.append(sum(df[i]) / df.shape[0])
return means
And if you want mean to be just an integer instead of float, you can just do sum(df[i]) // df.shape[0]
I hope this answers your question.
Method 2
Do you want the mean of each column? You have to be careful if they don’t have the exact same length:
import pandas as pd
df = pd.DataFrame({'a':[1,2,3], 'b': [4,5,6]})
def column_means(df):
means = []
for i,n in enumerate(df.columns):
means.append(sum(df[n])/len(df[n]))
return means
print(column_means(df))
You can also use the mean method of pd DataFrame
df.mean()
Method 3
change the first df.shape[0] to df.indexand the assignment line.
def column_means(df):
means = []
for i,n in zip(df.columns, df.index):
means.append(sum(df[i])/ df.shape[0])
return means
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0