I have a python list which runs into 1000’s. Something like:
data=["I","am","a","python","programmer".....]
where, len(data)= say 1003
I would now like to create a subset of this list (data) by splitting the orginal list into chunks of 100. So, at the end, Id like to have something like:
data_chunk1=[.....] #first 100 items of list data data_chunk2=[.....] #second 100 items of list data . . . data_chunk11=[.....] # remainder of the entries,& its len <=100, len(data_chunk_11)=3
Is there a pythonic way to achieve this task? Obviously I can use data[0:100] and so on, but I am assuming that is terribly non-pythonic and very inefficient.
Many thanks.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
I’d say
chunks = [data[x:x+100] for x in range(0, len(data), 100)]
If you are using python 2.x instead of 3.x, you can be more memory-efficient by using xrange(), changing the above code to:
chunks = [data[x:x+100] for x in xrange(0, len(data), 100)]
Method 2
Actually I think using plain slices is the best solution in this case:
for i in range(0, len(data), 100):
chunk = data[i:i + 100]
...
If you want to avoid copying the slices, you could use itertools.islice(), but it doesn’t seem to be necessary here.
The itertools() documentation also contains the famous “grouper” pattern:
def grouper(n, iterable, fillvalue=None):
"grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return izip_longest(fillvalue=fillvalue, *args)
You would need to modify it to treat the last chunk correctly, so I think the straight-forward solution using plain slices is preferable.
Method 3
chunks = [data[100*i:100*(i+1)] for i in range(len(data)/100 + 1)]
This is equivalent to the accepted answer. For example, shortening to batches of 10 for readability:
data = range(35) print [data[x:x+10] for x in xrange(0, len(data), 10)] print [data[10*i:10*(i+1)] for i in range(len(data)/10 + 1)]
Outputs:
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [20, 21, 22, 23, 24, 25, 26, 27, 28, 29], [30, 31, 32, 33, 34]] [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [20, 21, 22, 23, 24, 25, 26, 27, 28, 29], [30, 31, 32, 33, 34]]
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0