I have a DataFrame like this one:
In [7]: frame.head() Out[7]: Communications and Search Business General Lifestyle 0 0.745763 0.050847 0.118644 0.084746 0 0.333333 0.000000 0.583333 0.083333 0 0.617021 0.042553 0.297872 0.042553 0 0.435897 0.000000 0.410256 0.153846 0 0.358974 0.076923 0.410256 0.153846
In here, I want to ask how to get column name which has maximum value for each row, the desired output is like this:
In [7]:
frame.head()
Out[7]:
Communications and Search Business General Lifestyle Max
0 0.745763 0.050847 0.118644 0.084746 Communications
0 0.333333 0.000000 0.583333 0.083333 Business
0 0.617021 0.042553 0.297872 0.042553 Communications
0 0.435897 0.000000 0.410256 0.153846 Communications
0 0.358974 0.076923 0.410256 0.153846 Business
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
You can use idxmax with axis=1 to find the column with the greatest value on each row:
>>> df.idxmax(axis=1) 0 Communications 1 Business 2 Communications 3 Communications 4 Business dtype: object
To create the new column ‘Max’, use df['Max'] = df.idxmax(axis=1).
To find the row index at which the maximum value occurs in each column, use df.idxmax() (or equivalently df.idxmax(axis=0)).
Method 2
And if you want to produce a column containing the name of the column with the maximum value but considering only a subset of columns then you use a variation of @ajcr’s answer:
df['Max'] = df[['Communications','Business']].idxmax(axis=1)
Method 3
You could apply on dataframe and get argmax() of each row via axis=1
In [144]: df.apply(lambda x: x.argmax(), axis=1) Out[144]: 0 Communications 1 Business 2 Communications 3 Communications 4 Business dtype: object
Here’s a benchmark to compare how slow apply method is to idxmax() for len(df) ~ 20K
In [146]: %timeit df.apply(lambda x: x.argmax(), axis=1) 1 loops, best of 3: 479 ms per loop In [147]: %timeit df.idxmax(axis=1) 10 loops, best of 3: 47.3 ms per loop
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0