This question is based on this older question:
Given an array:
In [122]: arr = np.array([[1, 3, 7], [4, 9, 8]]); arr Out[122]: array([[1, 3, 7], [4, 9, 8]])And given its indices:
In [127]: np.indices(arr.shape) Out[127]: array([[[0, 0, 0], [1, 1, 1]], [[0, 1, 2], [0, 1, 2]]])How would I be able to stack them neatly one against the other to form
a new 2D array? This is what I’d like:array([[0, 0, 1], [0, 1, 3], [0, 2, 7], [1, 0, 4], [1, 1, 9], [1, 2, 8]])
This solution by Divakar is what I currently use for 2D arrays:
def indices_merged_arr(arr):
m,n = arr.shape
I,J = np.ogrid[:m,:n]
out = np.empty((m,n,3), dtype=arr.dtype)
out[...,0] = I
out[...,1] = J
out[...,2] = arr
out.shape = (-1,3)
return out
Now, if I wanted to pass a 3D array, I need to modify this function:
def indices_merged_arr(arr):
m,n,k = arr.shape # here
I,J,K = np.ogrid[:m,:n,:k] # here
out = np.empty((m,n,k,4), dtype=arr.dtype) # here
out[...,0] = I
out[...,1] = J
out[...,2] = K # here
out[...,3] = arr
out.shape = (-1,4) # here
return out
But this function now works for 3D arrays only – I can’t pass a 2D array to it.
Is there some sort of way I can generalise this to work for any dimension? Here’s my attempt:
def indices_merged_arr_general(arr):
tup = arr.shape
idx = np.ogrid[????] # not sure what to do here....
out = np.empty(tup + (len(tup) + 1, ), dtype=arr.dtype)
for i, j in enumerate(idx):
out[...,i] = j
out[...,len(tup) - 1] = arr
out.shape = (-1, len(tup)
return out
I’m having trouble with this line:
idx = np.ogrid[????]
How can I get this working?
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
Here’s the extension to handle generic ndarrays –
def indices_merged_arr_generic(arr, arr_pos="last"):
n = arr.ndim
grid = np.ogrid[tuple(map(slice, arr.shape))]
out = np.empty(arr.shape + (n+1,), dtype=np.result_type(arr.dtype, int))
if arr_pos=="first":
offset = 1
elif arr_pos=="last":
offset = 0
else:
raise Exception("Invalid arr_pos")
for i in range(n):
out[...,i+offset] = grid[i]
out[...,-1+offset] = arr
out.shape = (-1,n+1)
return out
Sample runs
2D case :
In [252]: arr
Out[252]:
array([[37, 32, 73],
[95, 80, 97]])
In [253]: indices_merged_arr_generic(arr)
Out[253]:
array([[ 0, 0, 37],
[ 0, 1, 32],
[ 0, 2, 73],
[ 1, 0, 95],
[ 1, 1, 80],
[ 1, 2, 97]])
In [254]: indices_merged_arr_generic(arr, arr_pos='first')
Out[254]:
array([[37, 0, 0],
[32, 0, 1],
[73, 0, 2],
[95, 1, 0],
[80, 1, 1],
[97, 1, 2]])
3D case :
In [226]: arr
Out[226]:
array([[[35, 45, 33],
[48, 38, 20],
[69, 31, 90]],
[[73, 65, 73],
[27, 51, 45],
[89, 50, 74]]])
In [227]: indices_merged_arr_generic(arr)
Out[227]:
array([[ 0, 0, 0, 35],
[ 0, 0, 1, 45],
[ 0, 0, 2, 33],
[ 0, 1, 0, 48],
[ 0, 1, 1, 38],
[ 0, 1, 2, 20],
[ 0, 2, 0, 69],
[ 0, 2, 1, 31],
[ 0, 2, 2, 90],
[ 1, 0, 0, 73],
[ 1, 0, 1, 65],
[ 1, 0, 2, 73],
[ 1, 1, 0, 27],
[ 1, 1, 1, 51],
[ 1, 1, 2, 45],
[ 1, 2, 0, 89],
[ 1, 2, 1, 50],
[ 1, 2, 2, 74]])
Method 2
For large arrays, AFAIK, senderle’s cartesian_product is the fastest way1 to generate cartesian products using NumPy :
In [372]: A = np.random.random((100,100,100)) In [373]: %timeit indices_merged_arr_generic_using_cp(A) 100 loops, best of 3: 16.8 ms per loop In [374]: %timeit indices_merged_arr_generic(A) 10 loops, best of 3: 28.9 ms per loop
Here is the setup I used to benchmark.
Below, indices_merged_arr_generic_using_cp is a modification of senderle’s cartesian_product to include the flattened array beside with the cartesian product:
import numpy as np
import functools
def indices_merged_arr_generic_using_cp(arr):
"""
Based on cartesian_product
http://stackoverflow.com/a/11146645/190597 (senderle)
"""
shape = arr.shape
arrays = [np.arange(s, dtype='int') for s in shape]
broadcastable = np.ix_(*arrays)
broadcasted = np.broadcast_arrays(*broadcastable)
rows, cols = functools.reduce(np.multiply, broadcasted[0].shape), len(broadcasted)+1
out = np.empty(rows * cols, dtype=arr.dtype)
start, end = 0, rows
for a in broadcasted:
out[start:end] = a.reshape(-1)
start, end = end, end + rows
out[start:] = arr.flatten()
return out.reshape(cols, rows).T
def indices_merged_arr_generic(arr):
"""
https://stackoverflow.com/a/46135084/190597 (Divakar)
"""
n = arr.ndim
grid = np.ogrid[tuple(map(slice, arr.shape))]
out = np.empty(arr.shape + (n+1,), dtype=arr.dtype)
for i in range(n):
out[...,i] = grid[i]
out[...,-1] = arr
out.shape = (-1,n+1)
return out
1Note that above I actually used senderle’s cartesian_product_transpose. For me, this is
the fastest version. For others, including senderle, cartesian_product is
faster.
Method 3
ndenumerate iterates on the elements, as opposed to the dimensions in the other solutions. So I don’t expect it to win the speed tests. But here’s a way of using it
In [588]: arr = np.array([[1, 3, 7], [4, 9, 8]])
In [589]: arr
Out[589]:
array([[1, 3, 7],
[4, 9, 8]])
In [590]: list(np.ndenumerate(arr))
Out[590]: [((0, 0), 1), ((0, 1), 3), ((0, 2), 7), ((1, 0), 4), ((1, 1), 9), ((1, 2), 8)]
In py3 * unpacking can be used in a tuple, so the nested tuples can be flattened:
In [591]: [(*ij,v) for ij,v in np.ndenumerate(arr)]
Out[591]: [(0, 0, 1), (0, 1, 3), (0, 2, 7), (1, 0, 4), (1, 1, 9), (1, 2, 8)]
In [592]: np.array(_)
Out[592]:
array([[0, 0, 1],
[0, 1, 3],
[0, 2, 7],
[1, 0, 4],
[1, 1, 9],
[1, 2, 8]])
And it generalizes nicely to more dimensions:
In [593]: arr3 = np.arange(24).reshape(2,3,4)
In [594]: np.array([(*ij,v) for ij,v in np.ndenumerate(arr3)])
Out[594]:
array([[ 0, 0, 0, 0],
[ 0, 0, 1, 1],
[ 0, 0, 2, 2],
[ 0, 0, 3, 3],
[ 0, 1, 0, 4],
[ 0, 1, 1, 5],
....
[ 1, 2, 3, 23]])
With these small samples, it’s actually faster than @Diakar’s function. 🙂
In [598]: timeit indices_merged_arr_generic(arr) 52.8 µs ± 271 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [599]: timeit indices_merged_arr_generic(arr3) 66.9 µs ± 434 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [600]: timeit np.array([(*ij,v) for ij,v in np.ndenumerate(arr)]) 21.2 µs ± 40.5 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [601]: timeit np.array([(*ij,v) for ij,v in np.ndenumerate(arr3)]) 59.4 µs ± 1.28 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
But for a large 3d array it is much slower
In [602]: A = np.random.random((100,100,100)) In [603]: timeit indices_merged_arr_generic(A) 50.3 ms ± 141 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) In [604]: timeit np.array([(*ij,v) for ij,v in np.ndenumerate(A)]) 2.39 s ± 11.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
And with `@unutbu’s – slower for small, faster for big:
In [609]: timeit indices_merged_arr_generic_using_cp(arr) 104 µs ± 1.78 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [610]: timeit indices_merged_arr_generic_using_cp(arr3) 141 µs ± 1.09 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [611]: timeit indices_merged_arr_generic_using_cp(A) 31.1 ms ± 1.28 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Method 4
We can use the following oneliner:
from numpy import hstack, array, meshgrid
hstack((
array(meshgrid(*map(range, t.shape))).T.reshape(-1,t.ndim),
t.flatten().reshape(-1,1)
))
Here we first use map(range, t.shape) to construct an iterable of ranges. By using np.meshgrid(..).T.reshape(-1, t.dim) we construct the first part of the table: an n×m matrix with n the number of elements of t, and m the number of dimensions, we then add a flattened version of t at the right.
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0