For the following dataframe:
StationID HoursAhead BiasTemp SS0279 0 10 SS0279 1 20 KEOPS 0 0 KEOPS 1 5 BB 0 5 BB 1 5
I’d like to get something like:
StationID BiasTemp SS0279 15 KEOPS 2.5 BB 5
I know I can script something like this to get the desired result:
def transform_DF(old_df,col):
list_stations = list(set(old_df['StationID'].values.tolist()))
header = list(old_df.columns.values)
header.remove(col)
header_new = header
new_df = pandas.DataFrame(columns = header_new)
for i,station in enumerate(list_stations):
general_results = old_df[(old_df['StationID'] == station)].describe()
new_row = []
for column in header_new:
if column in ['StationID']:
new_row.append(station)
continue
new_row.append(general_results<div class="su-column su-column-size-1-2"><div class="su-column-inner su-u-clearfix su-u-trim"></div></div>['mean'])
new_df.loc[i] = new_row
return new_df
But I wonder if there is something more straightforward in pandas.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
You could groupby on StationID and then take mean() on BiasTemp. To output Dataframe, use as_index=False
In [4]: df.groupby('StationID', as_index=False)['BiasTemp'].mean()
Out[4]:
StationID BiasTemp
0 BB 5.0
1 KEOPS 2.5
2 SS0279 15.0
Without as_index=False, it returns a Series instead
In [5]: df.groupby('StationID')['BiasTemp'].mean()
Out[5]:
StationID
BB 5.0
KEOPS 2.5
SS0279 15.0
Name: BiasTemp, dtype: float64
Read more about groupby in this pydata tutorial.
Method 2
This is what groupby is for:
In [117]:
df.groupby('StationID')['BiasTemp'].mean()
Out[117]:
StationID
BB 5.0
KEOPS 2.5
SS0279 15.0
Name: BiasTemp, dtype: float64
Here we groupby the ‘StationID’ column, we then access the ‘BiasTemp’ column and call mean on it
There is a section in the docs on this functionality.
Method 3
can be done as follows:
df.groupby('StationID').mean()
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0