Selecting Random Windows from Multidimensional Numpy Array Rows

I have a large array where each row is a time series and thus needs to stay in order.

I want to select a random window of a given size for each row.

Example:

>>>import numpy as np
>>>arr = np.array(range(42)).reshape(6,7)
>>>arr
array([[ 0,  1,  2,  3,  4,  5,  6],
       [ 7,  8,  9, 10, 11, 12, 13],
       [14, 15, 16, 17, 18, 19, 20],
       [21, 22, 23, 24, 25, 26, 27],
       [28, 29, 30, 31, 32, 33, 34],
       [35, 36, 37, 38, 39, 40, 41]])
>>># What I want to do:
>>>select_random_windows(arr, window_size=3)
array([[ 1,  2,  3],
       [11, 12, 13],
       [14, 15, 16],
       [22, 23, 24],
       [38, 39, 40]])

What an ideal solution would look like to me:

def select_random_windows(arr, window_size):
    offsets = np.random.randint(0, arr.shape[0] - window_size, size = arr.shape[1])
    return arr[:, offsets: offsets + window_size]

But unfortunately this does not work

What I’m going with right now is terribly slow:

def select_random_windows(arr, wndow_size):
    result = []
    offsets = np.random.randint(0, arr.shape[0]-window_size, size = arr.shape[1])
    for row, offset in enumerate(start_indices):
        result.append(arr<div class="su-row"></div>[offset: offset + window_size])
    return np.array(result)

Sure, I could do the same with a list comprehension (and get a minimal speed boost), but I was wondering wether there is some super smart numpy vectorized way to do this.

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

Here’s one leveraging np.lib.stride_tricks.as_strided

def random_windows_per_row_strided(arr, W=3):
    idx = np.random.randint(0,arr.shape[1]-W+1, arr.shape[0])
    strided = np.lib.stride_tricks.as_strided 
    m,n = arr.shape
    s0,s1 = arr.strides
    windows = strided(arr, shape=(m,n-W+1,W), strides=(s0,s1,s1))
    return windows[np.arange(len(idx)), idx]

Runtime test on bigger array with 10,000 rows –

In [469]: arr = np.random.rand(100000,100)

# @Psidom's soln
In [470]: %timeit select_random_windows(arr, window_size=3)
100 loops, best of 3: 7.41 ms per loop

In [471]: %timeit random_windows_per_row_strided(arr, W=3)
100 loops, best of 3: 6.84 ms per loop

# @Psidom's soln
In [472]: %timeit select_random_windows(arr, window_size=30)
10 loops, best of 3: 26.8 ms per loop

In [473]: %timeit random_windows_per_row_strided(arr, W=30)
100 loops, best of 3: 9.65 ms per loop

# @Psidom's soln
In [474]: %timeit select_random_windows(arr, window_size=50)
10 loops, best of 3: 41.8 ms per loop

In [475]: %timeit random_windows_per_row_strided(arr, W=50)
100 loops, best of 3: 10 ms per loop

Method 2

In the return statement, change the slicing to advanced indexing, also you need to fix the sampling code a little bit:

def select_random_windows(arr, window_size):
    offsets = np.random.randint(0, arr.shape[1]-window_size+1, size=arr.shape[0])
    return arr[np.arange(arr.shape[0])[:,None], offsets[:,None] + np.arange(window_size)]

select_random_windows(arr, 3)
#array([[ 4,  5,  6],
#       [ 7,  8,  9],
#       [17, 18, 19],
#       [25, 26, 27],
#       [31, 32, 33],
#       [39, 40, 41]])


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x