# Vectorized searchsorted numpy

Assume that I have two arrays `A` and `B`, where both `A` and `B` are `m x n`. My goal is now, for each row of `A` and `B`, to find where I should insert the elements of row `i` of `A` in the corresponding row of `B`. That is, I wish to apply `np.digitize` or `np.searchsorted` to each row of `A` and `B`.

My naive solution is to simply iterate over the rows. However, this is far too slow for my application. My question is therefore: is there a vectorized implementation of either algorithm that I haven’t managed to find?

Contents

### Method 1

We can add each row some offset as compared to the previous row. We would use the same offset for both arrays. The idea is to use `np.searchsorted` on flattened version of input arrays thereafter and thus each row from `b` would be restricted to find sorted positions in the corresponding row in `a`. Additionally, to make it work for negative numbers too, we just need to offset for the minimum numbers as well.

So, we would have a vectorized implementation like so –

```def searchsorted2d(a,b):
m,n = a.shape
max_num = np.maximum(a.max() - a.min(), b.max() - b.min()) + 1
r = max_num*np.arange(a.shape)[:,None]
p = np.searchsorted( (a+r).ravel(), (b+r).ravel() ).reshape(m,-1)
return p - n*(np.arange(m)[:,None])```

Runtime test –

```In : def searchsorted2d_loopy(a,b):
...:     out = np.zeros(a.shape,dtype=int)
...:     for i in range(len(a)):
...:         out[i] = np.searchsorted(a[i],b[i])
...:     return out
...:

In : # Setup input arrays
...: a = np.random.randint(11,99,(10000,20))
...: b = np.random.randint(11,99,(10000,20))
...: a = np.sort(a,1)
...: b = np.sort(b,1)
...:

In : np.allclose(searchsorted2d(a,b),searchsorted2d_loopy(a,b))
Out: True

In : %timeit searchsorted2d_loopy(a,b)
10 loops, best of 3: 28.6 ms per loop

In : %timeit searchsorted2d(a,b)
100 loops, best of 3: 13.7 ms per loop```

### Method 2

The solution provided by @Divakar is ideal for integer data, but beware of precision issues for floating point values, especially if they span multiple orders of magnitude (e.g. `[[1.0, 2,0, 3.0, 1.0e+20],...]`). In some cases `r` may be so large that applying `a+r` and `b+r` wipes out the original values you’re trying to run `searchsorted` on, and you’re just comparing `r` to `r`.

To make the approach more robust for floating-point data, you could embed the row information into the arrays as part of the values (as a structured dtype), and run searchsorted on these structured dtypes instead.

```def searchsorted_2d (a, v, side='left', sorter=None):
import numpy as np

# Make sure a and v are numpy arrays.
a = np.asarray(a)
v = np.asarray(v)

# Augment a with row id
ai = np.empty(a.shape,dtype=[('row',int),('value',a.dtype)])
ai['row'] = np.arange(a.shape).reshape(-1,1)
ai['value'] = a

# Augment v with row id
vi = np.empty(v.shape,dtype=[('row',int),('value',v.dtype)])
vi['row'] = np.arange(v.shape).reshape(-1,1)
vi['value'] = v

# Perform searchsorted on augmented array.
# The row information is embedded in the values, so only the equivalent rows
# between a and v are considered.
result = np.searchsorted(ai.flatten(),vi.flatten(), side=side, sorter=sorter)

# Restore the original shape, decode the searchsorted indices so they apply to the original data.
result = result.reshape(vi.shape) - vi['row']*a.shape

return result```

Edit: The timing on this approach is abysmal!

```In : %timeit searchsorted_2d(a,b)
10 loops, best of 3: 92.5 ms per loop```

You would be better off just just using `map` over the array:

```In : %timeit np.array(list(map(np.searchsorted,a,b)))
100 loops, best of 3: 13.8 ms per loop```

For integer data, @Divakar’s approach is still the fastest:

```In : %timeit searchsorted2d(a,b)
100 loops, best of 3: 7.26 ms per loop``` 