I have a NumPy array with integer values. Values of matrix range from 0 to max element in matrix(in other words, all numbers from 0 to max data element presented in it). I need to build effective( effective means fast fullyvectorized solution) for searching number of elements in each row and encode them according to matrix values.
I could not find a similar question, or a question that somehow helped to solve this.
So if i have this data
in input:
# shape is (N0=4, m0=4) 1 1 0 4 2 4 2 1 1 2 3 5 4 4 4 1
desired output is :
# shape(N=N0, m=data.max()+1): 1 2 0 0 1 0 0 1 2 0 1 0 0 1 1 1 0 1 0 1 0 0 3 0
I know how to solve this by simply counting unique values in each row of data
iterating one by one, and then combining results taking in account all possible values in data
array.
While using NumPy for vectorizing this the key problem is that searching each number one by one is slow and assuming that there are a lot of unique numbers presented, this can not be effective solution. Generally both N
and unique numbers count is rather large(by the way, N
seem to be larger than unique numbers count).
Has somebody have great ideas?)
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
Well that’s basically what does np.bincount
does with 1D
arrays. But, we need to use it on each row iteratively (thinking of it simply). To make it vectorized, we could offset each row by that max number. The idea is to have different bins for each row such that they are not affected by other row elements with same numbers.
Hence, the implementation would be –
# Vectorized solution def bincount2D_vectorized(a): N = a.max()+1 a_offs = a + np.arange(a.shape[0])[:,None]*N return np.bincount(a_offs.ravel(), minlength=a.shape[0]*N).reshape(1,N)
Sample run –
In [189]: a Out[189]: array([[1, 1, 0, 4], [2, 4, 2, 1], [1, 2, 3, 5], [4, 4, 4, 1]]) In [190]: bincount2D_vectorized(a) Out[190]: array([[1, 2, 0, 0, 1, 0], [0, 1, 2, 0, 1, 0], [0, 1, 1, 1, 0, 1], [0, 1, 0, 0, 3, 0]])
Numba Tweaks
We can bring in numba
for further speedups. Now, numba
allows few tweaks.
 First off, it allows JIT compilation.

Also, recently they had introduced experimental
parallel
that automatically parallelizes operations in the function known to have parallel semantics. 
Final tweak would be to use
prange
as a subsititute forrange
. The docs state that this runs loops in parallel, similar to OpenMP parallel for loops and Cython’s prange.prange
performs well with larger datasets, which probably is because of the overhead needed to setup the parallel work.
So, with these new two tweaks along with the njit
for noPython mode, we would have three variants –
# Numba solutions def bincount2D_numba(a, use_parallel=False, use_prange=False): N = a.max()+1 m,n = a.shape out = np.zeros((m,N),dtype=int) # Choose fucntion based on args func = bincount2D_numba_func0 if use_parallel: if use_prange: func = bincount2D_numba_func2 else: func = bincount2D_numba_func1 # Run chosen function on input data and output func(a, out, m, n) return out @njit def bincount2D_numba_func0(a, out, m, n): for i in range(m): for j in range(n): out[i,a[i,j]] += 1 @njit(parallel=True) def bincount2D_numba_func1(a, out, m, n): for i in range(m): for j in range(n): out[i,a[i,j]] += 1 @njit(parallel=True) def bincount2D_numba_func2(a, out, m, n): for i in prange(m): for j in prange(n): out[i,a[i,j]] += 1
For completeness and testing out later on, the loopy version would be –
# Loopy solution def bincount2D_loopy(a): N = a.max()+1 m,n = a.shape out = np.zeros((m,N),dtype=int) for i in range(m): out[i] = np.bincount(a[i], minlength=N) return out
Runtime test
Case #1 :
In [312]: a = np.random.randint(0,100,(100,100)) In [313]: %timeit bincount2D_loopy(a) ...: %timeit bincount2D_vectorized(a) ...: %timeit bincount2D_numba(a, use_parallel=False, use_prange=False) ...: %timeit bincount2D_numba(a, use_parallel=True, use_prange=False) ...: %timeit bincount2D_numba(a, use_parallel=True, use_prange=True) 10000 loops, best of 3: 115 µs per loop 10000 loops, best of 3: 36.7 µs per loop 10000 loops, best of 3: 22.6 µs per loop 10000 loops, best of 3: 22.7 µs per loop 10000 loops, best of 3: 39.9 µs per loop
Case #2 :
In [316]: a = np.random.randint(0,100,(1000,1000)) In [317]: %timeit bincount2D_loopy(a) ...: %timeit bincount2D_vectorized(a) ...: %timeit bincount2D_numba(a, use_parallel=False, use_prange=False) ...: %timeit bincount2D_numba(a, use_parallel=True, use_prange=False) ...: %timeit bincount2D_numba(a, use_parallel=True, use_prange=True) 100 loops, best of 3: 2.97 ms per loop 100 loops, best of 3: 3.54 ms per loop 1000 loops, best of 3: 1.83 ms per loop 100 loops, best of 3: 1.78 ms per loop 1000 loops, best of 3: 1.4 ms per loop
Case #3 :
In [318]: a = np.random.randint(0,1000,(1000,1000)) In [319]: %timeit bincount2D_loopy(a) ...: %timeit bincount2D_vectorized(a) ...: %timeit bincount2D_numba(a, use_parallel=False, use_prange=False) ...: %timeit bincount2D_numba(a, use_parallel=True, use_prange=False) ...: %timeit bincount2D_numba(a, use_parallel=True, use_prange=True) 100 loops, best of 3: 4.01 ms per loop 100 loops, best of 3: 4.86 ms per loop 100 loops, best of 3: 3.21 ms per loop 100 loops, best of 3: 3.18 ms per loop 100 loops, best of 3: 2.45 ms per loop
Seems like the numba
variants are performing very well. Choosing one out of the three variants would depend on the input array shape parameters and to some extent on the number of unique elements in it.
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc bysa 2.5, cc bysa 3.0 and cc bysa 4.0