I have a large array where each row is a time series and thus needs to stay in order.
I want to select a random window of a given size for each row.
Example:
>>>import numpy as np
>>>arr = np.array(range(42)).reshape(6,7)
>>>arr
array([[ 0, 1, 2, 3, 4, 5, 6],
[ 7, 8, 9, 10, 11, 12, 13],
[14, 15, 16, 17, 18, 19, 20],
[21, 22, 23, 24, 25, 26, 27],
[28, 29, 30, 31, 32, 33, 34],
[35, 36, 37, 38, 39, 40, 41]])
>>># What I want to do:
>>>select_random_windows(arr, window_size=3)
array([[ 1, 2, 3],
[11, 12, 13],
[14, 15, 16],
[22, 23, 24],
[38, 39, 40]])
What an ideal solution would look like to me:
def select_random_windows(arr, window_size):
offsets = np.random.randint(0, arr.shape[0] - window_size, size = arr.shape[1])
return arr[:, offsets: offsets + window_size]
But unfortunately this does not work
What I’m going with right now is terribly slow:
def select_random_windows(arr, wndow_size):
result = []
offsets = np.random.randint(0, arr.shape[0]-window_size, size = arr.shape[1])
for row, offset in enumerate(start_indices):
result.append(arr<div class="su-row"></div>[offset: offset + window_size])
return np.array(result)
Sure, I could do the same with a list comprehension (and get a minimal speed boost), but I was wondering wether there is some super smart numpy vectorized way to do this.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
Here’s one leveraging np.lib.stride_tricks.as_strided –
def random_windows_per_row_strided(arr, W=3):
idx = np.random.randint(0,arr.shape[1]-W+1, arr.shape[0])
strided = np.lib.stride_tricks.as_strided
m,n = arr.shape
s0,s1 = arr.strides
windows = strided(arr, shape=(m,n-W+1,W), strides=(s0,s1,s1))
return windows[np.arange(len(idx)), idx]
Runtime test on bigger array with 10,000 rows –
In [469]: arr = np.random.rand(100000,100) # @Psidom's soln In [470]: %timeit select_random_windows(arr, window_size=3) 100 loops, best of 3: 7.41 ms per loop In [471]: %timeit random_windows_per_row_strided(arr, W=3) 100 loops, best of 3: 6.84 ms per loop # @Psidom's soln In [472]: %timeit select_random_windows(arr, window_size=30) 10 loops, best of 3: 26.8 ms per loop In [473]: %timeit random_windows_per_row_strided(arr, W=30) 100 loops, best of 3: 9.65 ms per loop # @Psidom's soln In [474]: %timeit select_random_windows(arr, window_size=50) 10 loops, best of 3: 41.8 ms per loop In [475]: %timeit random_windows_per_row_strided(arr, W=50) 100 loops, best of 3: 10 ms per loop
Method 2
In the return statement, change the slicing to advanced indexing, also you need to fix the sampling code a little bit:
def select_random_windows(arr, window_size):
offsets = np.random.randint(0, arr.shape[1]-window_size+1, size=arr.shape[0])
return arr[np.arange(arr.shape[0])[:,None], offsets[:,None] + np.arange(window_size)]
select_random_windows(arr, 3)
#array([[ 4, 5, 6],
# [ 7, 8, 9],
# [17, 18, 19],
# [25, 26, 27],
# [31, 32, 33],
# [39, 40, 41]])
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0