How to sort a dataFrame in python pandas by two or more columns?

Suppose I have a dataframe with columns a, b and c, I want to sort the dataframe by column b in ascending order, and by column c in descending order, how do I do this?

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

As of the 0.17.0 release, the sort method was deprecated in favor of sort_values. sort was completely removed in the 0.20.0 release. The arguments (and results) remain the same:

df.sort_values(['a', 'b'], ascending=[True, False])

You can use the ascending argument of sort:

df.sort(['a', 'b'], ascending=[True, False])

For example:

In [11]: df1 = pd.DataFrame(np.random.randint(1, 5, (10,2)), columns=['a','b'])

In [12]: df1.sort(['a', 'b'], ascending=[True, False])
Out[12]:
   a  b
2  1  4
7  1  3
1  1  2
3  1  2
4  3  2
6  4  4
0  4  3
9  4  3
5  4  1
8  4  1

As commented by @renadeen

Sort isn’t in place by default! So you should assign result of the sort method to a variable or add inplace=True to method call.

that is, if you want to reuse df1 as a sorted DataFrame:

df1 = df1.sort(['a', 'b'], ascending=[True, False])

or

df1.sort(['a', 'b'], ascending=[True, False], inplace=True)

Method 2

As of pandas 0.17.0, DataFrame.sort() is deprecated, and set to be removed in a future version of pandas. The way to sort a dataframe by its values is now is DataFrame.sort_values

As such, the answer to your question would now be

df.sort_values(['b', 'c'], ascending=[True, False], inplace=True)

Method 3

For large dataframes of numeric data, you may see a significant performance improvement via numpy.lexsort, which performs an indirect sort using a sequence of keys:

import pandas as pd
import numpy as np

np.random.seed(0)

df1 = pd.DataFrame(np.random.randint(1, 5, (10,2)), columns=['a','b'])
df1 = pd.concat([df1]*100000)

def pdsort(df1):
    return df1.sort_values(['a', 'b'], ascending=[True, False])

def lex(df1):
    arr = df1.values
    return pd.DataFrame(arr[np.lexsort((-arr[:, 1], arr[:, 0]))])

assert (pdsort(df1).values == lex(df1).values).all()

%timeit pdsort(df1)  # 193 ms per loop
%timeit lex(df1)     # 143 ms per loop

One peculiarity is that the defined sorting order with numpy.lexsort is reversed: (-'b', 'a') sorts by series a first. We negate series b to reflect we want this series in descending order.

Be aware that np.lexsort only sorts with numeric values, while pd.DataFrame.sort_values works with either string or numeric values. Using np.lexsort with strings will give: TypeError: bad operand type for unary -: 'str'.


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x