Currently, I have some code like this
import numpy as np ret = np.array([]) for i in range(100000): tmp = get_input(i) ret = np.append(ret, np.zeros(len(tmp))) ret = np.append(ret, np.ones(fixed_length))
I think this code is not efficient as np.append needs to return a copy of the array instead of modify the ret in-place
I was wondering whether I can use the extend for a numpy array like this:
import numpy as np from somewhere import np_extend ret = np.array([]) for i in range(100000): tmp = get_input(i) np_extend(ret, np.zeros(len(tmp))) np_extend(ret, np.ones(fixed_length))
So that the extend would be much more efficient.
Does anyone have ideas about this?
Thanks!
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
Imagine a numpy array as occupying one contiguous block of memory. Now imagine other objects, say other numpy arrays, which are occupying the memory just to the left and right of our numpy array. There would be no room to append to or extend our numpy array. The underlying data in a numpy array always occupies a contiguous block of memory.
So any request to append to or extend our numpy array can only be satisfied by allocating a whole new larger block of memory, copying the old data into the new block and then appending or extending.
So:
- It will not occur in-place.
- It will not be efficient.
Method 2
You can use the .resize() method of ndarrays. It requires that the memory is not referred to by other arrays/variables.
import numpy as np
ret = np.array([])
for i in range(100):
tmp = np.random.rand(np.random.randint(1, 100))
ret.resize(len(ret) + len(tmp)) # <- ret is not referred to by anything else,
# so this works
ret[-len(tmp):] = tmp
The efficiency can be improved by using the usual array memory overrallocation schemes.
Method 3
The usual way to handle this is something like this:
import numpy as np ret = [] for i in range(100000): tmp = get_input(i) ret.append(np.zeros(len(tmp))) ret.append(np.zeros(fixed_length)) ret = np.concatenate(ret)
For reasons that other answers have gotten into, it is in general impossible to extend an array without copying the data.
Method 4
I came across this question researching for inplace numpy insertion methods.
While reading the answers that have been given here, it occurred to me an alternative (maybe a naive one, but still an idea): why not convert the numpy array back to a list, append whatever you want to append to it and reconvert it back to an array?
In case you have to many insertions to be done, you could create a kind of “list cache” where you would put all insertions and the insert them in the list in one step.
Of course, if one is trying to avoid at all costs a conversion to a list and back to a numpy this is not an option.
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0