Sorting a 2D numpy array by multiple axes

I have a 2D numpy array of shape (N,2) which is holding N points (x and y coordinates). For example:

array([[3, 2],
       [6, 2],
       [3, 6],
       [3, 4],
       [5, 3]])

I’d like to sort it such that my points are ordered by x-coordinate, and then by y in cases where the x coordinate is the same. So the array above should look like this:

array([[3, 2],
       [3, 4],
       [3, 6],
       [5, 3],
       [6, 2]])

If this was a normal Python list, I would simply define a comparator to do what I want, but as far as I can tell, numpy’s sort function doesn’t accept user-defined comparators. Any ideas?


EDIT: Thanks for the ideas! I set up a quick test case with 1000000 random integer points, and benchmarked the ones that I could run (sorry, can’t upgrade numpy at the moment).

Mine:   4.078 secs 
mtrw:   7.046 secs
unutbu: 0.453 secs

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

Using lexsort:

import numpy as np    
a = np.array([(3, 2), (6, 2), (3, 6), (3, 4), (5, 3)])

ind = np.lexsort((a[:,1],a[:,0]))    

a[ind]
# array([[3, 2],
#       [3, 4],
#       [3, 6],
#       [5, 3],
#       [6, 2]])

a.ravel() returns a view if a is C_CONTIGUOUS. If that is true,
@ars’s method, slightly modifed by using ravel instead of flatten, yields a nice way to sort a in-place:

a = np.array([(3, 2), (6, 2), (3, 6), (3, 4), (5, 3)])
dt = [('col1', a.dtype),('col2', a.dtype)]
assert a.flags['C_CONTIGUOUS']
b = a.ravel().view(dt)
b.sort(order=['col1','col2'])

Since b is a view of a, sorting b sorts a as well:

print(a)
# [[3 2]
#  [3 4]
#  [3 6]
#  [5 3]
#  [6 2]]

Method 2

The title says “sorting 2D arrays”. Although the questioner uses an (N,2)-shaped array, it’s possible to generalize unutbu’s solution to work with any (N,M) array, as that’s what people might actually be looking for.

One could transpose the array and use slice notation with negative step to pass all the columns to lexsort in reversed order:

>>> import numpy as np
>>> a = np.random.randint(1, 6, (10, 3))
>>> a
array([[4, 2, 3],
       [4, 2, 5],
       [3, 5, 5],
       [1, 5, 5],
       [3, 2, 1],
       [5, 2, 2],
       [3, 2, 3],
       [4, 3, 4],
       [3, 4, 1],
       [5, 3, 4]])

>>> a[np.lexsort(np.transpose(a)[::-1])]
array([[1, 5, 5],
       [3, 2, 1],
       [3, 2, 3],
       [3, 4, 1],
       [3, 5, 5],
       [4, 2, 3],
       [4, 2, 5],
       [4, 3, 4],
       [5, 2, 2],
       [5, 3, 4]])

Method 3

The numpy_indexed package (disclaimer: I am its author) can be used to solve these kind of processing-on-nd-array problems in an efficient fully vectorized manner:

import numpy_indexed as npi
npi.sort(a)  # by default along axis=0, but configurable

Method 4

You can use np.complex_sort. This has the side effect of changing your data to floating point, I hope that’s not a problem:

>>> a = np.array([[3, 2], [6, 2], [3, 6], [3, 4], [5, 3]])
>>> atmp = np.sort_complex(a[:,0] + a[:,1]*1j)
>>> b = np.array([[np.real(x), np.imag(x)] for x in atmp])
>>> b
array([[ 3.,  2.],
       [ 3.,  4.],
       [ 3.,  6.],
       [ 5.,  3.],
       [ 6.,  2.]])

Method 5

I was struggling with the same thing and just got help and solved the problem. It works smoothly if your array have column names (structured array) and I think this is a very simple way to sort using the same logic that excel does:

array_name[array_name[['colname1','colname2']].argsort()]

Note the double-brackets enclosing the sorting criteria. And off course, you can use more than 2 columns as sorting criteria.

Method 6

EDIT: removed bad answer.

Here’s one way to do it using an intermediate structured array:

from numpy import array

a = array([[3, 2], [6, 2], [3, 6], [3, 4], [5, 3]])

b = a.flatten()
b.dtype = [('x', '<i4'), ('y', '<i4')]
b.sort()
b.dtype = '<i4'
b.shape = a.shape

print b

which gives the desired output:

[[3 2]
 [3 4]
 [3 6]
 [5 3]
 [6 2]]

Not sure if this is quite the best way to go about it though.

Method 7

I found one way to do it:

from numpy import array
a = array([(3,2),(6,2),(3,6),(3,4),(5,3)])
array(sorted(sorted(a,key=lambda e:e[1]),key=lambda e:e[0]))

It’s pretty terrible to have to sort twice (and use the plain python sorted function instead of a faster numpy sort), but it does fit nicely on one line.


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x