Fastest way to sort each row in a pandas dataframe

I need to find the quickest way to sort each row in a dataframe with millions of rows and around a hundred columns.

So something like this:

A   B   C   D
3   4   8   1
9   2   7   2

Needs to become:

A   B   C   D
8   4   3   1
9   7   2   2

Right now I’m applying sort to each row and building up a new dataframe row by row. I’m also doing a couple of extra, less important things to each row (hence why I’m using pandas and not numpy). Could it be quicker to instead create a list of lists and then build the new dataframe at once? Or do I need to go cython?

Contents hide

Answers:

Method 1

Method 2

Method 3

Method 4

Method 5

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

I think I would do this in numpy:

In [11]: a = df.values

In [12]: a.sort(axis=1)  # no ascending argument

In [13]: a = a[:, ::-1]  # so reverse

In [14]: a
Out[14]:
array([[8, 4, 3, 1],
       [9, 7, 2, 2]])

In [15]: pd.DataFrame(a, df.index, df.columns)
Out[15]:
   A  B  C  D
0  8  4  3  1
1  9  7  2  2

I had thought this might work, but it sorts the columns:

In [21]: df.sort(axis=1, ascending=False)
Out[21]:
   D  C  B  A
0  1  8  4  3
1  2  7  2  9

Ah, pandas raises:

In [22]: df.sort(df.columns, axis=1, ascending=False)

ValueError: When sorting by column, axis must be 0 (rows)

Method 2

To Add to the answer given by @Andy-Hayden, to do this inplace to the whole frame… not really sure why this works, but it does. There seems to be no control on the order.

    In [97]: A = pd.DataFrame(np.random.randint(0,100,(4,5)), columns=['one','two','three','four','five'])

    In [98]: A
    Out[98]: 
    one  two  three  four  five
    0   22   63     72    46    49
    1   43   30     69    33    25
    2   93   24     21    56    39
    3    3   57     52    11    74

    In [99]: A.values.sort
    Out[99]: <function ndarray.sort>

    In [100]: A
    Out[100]: 
    one  two  three  four  five
    0   22   63     72    46    49
    1   43   30     69    33    25
    2   93   24     21    56    39
    3    3   57     52    11    74

    In [101]: A.values.sort()

    In [102]: A
    Out[102]: 
    one  two  three  four  five
    0   22   46     49    63    72
    1   25   30     33    43    69
    2   21   24     39    56    93
    3    3   11     52    57    74
    In [103]: A = A.iloc[:,::-1]

    In [104]: A
    Out[104]: 
    five  four  three  two  one
    0    72    63     49   46   22
    1    69    43     33   30   25
    2    93    56     39   24   21
    3    74    57     52   11    3

I hope someone can explain the why of this, just happy that it works 8)

Method 3

You could use pd.apply.

Eg:

A = pd.DataFrame(np.random.randint(0,100,(4,5)), columns=['one','two','three','four','five']) 
print (A)

   one  two  three  four  five
0    2   75     44    53    46
1   18   51     73    80    66
2   35   91     86    44    25
3   60   97     57    33    79

A = A.apply(np.sort, axis = 1) 
print(A)

   one  two  three  four  five
0    2   44     46    53    75
1   18   51     66    73    80
2   25   35     44    86    91
3   33   57     60    79    97

Since you want it in descending order, you can simply multiply the dataframe with -1 and sort it.

A = pd.DataFrame(np.random.randint(0,100,(4,5)), columns=['one','two','three','four','five'])
A = A * -1
A = A.apply(np.sort, axis = 1)
A = A * -1

Method 4

Instead of using pd.DataFrame constructor, an easier way to assign the sorted values back is to use double brackets:

original dataframe:

A   B   C   D
3   4   8   1
9   2   7   2

df[['A', 'B', 'C', 'D']] = np.sort(df)[:, ::-1]

   A  B  C  D
0  8  4  3  1
1  9  7  2  2

This way you can also sort a part of the columns:

df[['B', 'C']] = np.sort(df[['B', 'C']])[:, ::-1]

   A  B  C  D
0  3  8  4  1
1  9  7  2  2

Method 5

One could try this approach to preserve the integrity of the df:

import pandas as pd 
import numpy as np

A = pd.DataFrame(np.random.randint(0,100,(4,5)), columns=['one','two','three','four','five']) 
print (A) 
print(type(A))

   one  two  three  four  five
0   85   27     64    50    55
1    3   90     65    22     8
2    0    7     64    66    82
3   58   21     42    27    30
<class 'pandas.core.frame.DataFrame'>

B = A.apply(lambda x: np.sort(x), axis=1, raw=True) 
print(B) 
print(type(B))

   one  two  three  four  five
0   27   50     55    64    85
1    3    8     22    65    90
2    0    7     64    66    82
3   21   27     30    42    58
<class 'pandas.core.frame.DataFrame'>

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes

Article Rating