I’m a complete rookie to Python, but it seems like a given string is able to be (effectively) arbitrary length. i.e. you can take a string str and keeping adding to it: str += "some stuff...". Is there a way to make an array of such strings?
When I try this, each element only stores a single character
strArr = numpy.empty(10, dtype='string')
for i in range(0,10)
strArr[i] = "test"
On the other hand, I know I can initialize an array of certain length strings, i.e.
strArr = numpy.empty(10, dtype='s256')
which can store 10 strings of up to 256 characters.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
You can do so by creating an array of dtype=object. If you try to assign a long string to a normal numpy array, it truncates the string:
>>> a = numpy.array(['apples', 'foobar', 'cowboy'])
>>> a[2] = 'bananas'
>>> a
array(['apples', 'foobar', 'banana'],
dtype='|S6')
But when you use dtype=object, you get an array of python object references. So you can have all the behaviors of python strings:
>>> a = numpy.array(['apples', 'foobar', 'cowboy'], dtype=object) >>> a array([apples, foobar, cowboy], dtype=object) >>> a[2] = 'bananas' >>> a array([apples, foobar, bananas], dtype=object)
Indeed, because it’s an array of objects, you can assign any kind of python object to the array:
>>> a[2] = {1:2, 3:4}
>>> a
array([apples, foobar, {1: 2, 3: 4}], dtype=object)
However, this undoes a lot of the benefits of using numpy, which is so fast because it works on large contiguous blocks of raw memory. Working with python objects adds a lot of overhead. A simple example:
>>> a = numpy.array(['abba' for _ in range(10000)]) >>> b = numpy.array(['abba' for _ in range(10000)], dtype=object) >>> %timeit a.copy() 100000 loops, best of 3: 2.51 us per loop >>> %timeit b.copy() 10000 loops, best of 3: 48.4 us per loop
Method 2
You could use the object data type:
>>> import numpy >>> s = numpy.array(['a', 'b', 'dude'], dtype='object') >>> s[0] += 'bcdef' >>> s array([abcdef, b, dude], dtype=object)
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0