Possible Duplicate:
How do you split a list into evenly sized chunks in Python?
I am surprised I could not find a “batch” function that would take as input an iterable and return an iterable of iterables.
For example:
for i in batch(range(0,10), 1): print i [0] [1] ... [9]
or:
for i in batch(range(0,10), 3): print i [0,1,2] [3,4,5] [6,7,8] [9]
Now, I wrote what I thought was a pretty simple generator:
def batch(iterable, n = 1):
current_batch = []
for item in iterable:
current_batch.append(item)
if len(current_batch) == n:
yield current_batch
current_batch = []
if current_batch:
yield current_batch
But the above does not give me what I would have expected:
for x in batch(range(0,10),3): print x [0] [0, 1] [0, 1, 2] [3] [3, 4] [3, 4, 5] [6] [6, 7] [6, 7, 8] [9]
So, I have missed something and this probably shows my complete lack of understanding of python generators. Anyone would care to point me in the right direction ?
[Edit: I eventually realized that the above behavior happens only when I run this within ipython rather than python itself]
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
This is probably more efficient (faster)
def batch(iterable, n=1):
l = len(iterable)
for ndx in range(0, l, n):
yield iterable[ndx:min(ndx + n, l)]
for x in batch(range(0, 10), 3):
print x
Example using list
data = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] # list of data
for x in batch(data, 3):
print(x)
# Output
[0, 1, 2]
[3, 4, 5]
[6, 7, 8]
[9, 10]
It avoids building new lists.
Method 2
FWIW, the recipes in the itertools module provides this example:
def grouper(n, iterable, fillvalue=None):
"grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return zip_longest(fillvalue=fillvalue, *args)
It works like this:
>>> list(grouper(3, range(10))) [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, None, None)]
Method 3
More-itertools includes two functions that do what you need:
chunked(iterable, n)returns an iterable of lists, each of lengthn(except the last one, which may be shorter);ichunked(iterable, n)is similar, but returns an iterable of iterables instead.
Method 4
As others have noted, the code you have given does exactly what you want. For another approach using itertools.islice you could see an example of following recipe:
from itertools import islice, chain
def batch(iterable, size):
sourceiter = iter(iterable)
while True:
batchiter = islice(sourceiter, size)
yield chain([batchiter.next()], batchiter)
Method 5
Solution for Python 3.8 if you are working with iterables that don’t define a len function, and get exhausted:
from itertools import islice
def batcher(iterable, batch_size):
iterator = iter(iterable)
while batch := list(islice(iterator, batch_size)):
yield batch
Example usage:
def my_gen():
yield from range(10)
for batch in batcher(my_gen(), 3):
print(batch)
>>> [0, 1, 2]
>>> [3, 4, 5]
>>> [6, 7, 8]
>>> [9]
Could of course be implemented without the walrus operator as well.
Method 6
Weird, seems to work fine for me in Python 2.x
>>> def batch(iterable, n = 1): ... current_batch = [] ... for item in iterable: ... current_batch.append(item) ... if len(current_batch) == n: ... yield current_batch ... current_batch = [] ... if current_batch: ... yield current_batch ... >>> for x in batch(range(0, 10), 3): ... print x ... [0, 1, 2] [3, 4, 5] [6, 7, 8] [9]
Method 7
This is a very short code snippet I know that does not use len and works under both Python 2 and 3 (not my creation):
def chunks(iterable, size):
from itertools import chain, islice
iterator = iter(iterable)
for first in iterator:
yield list(chain([first], islice(iterator, size - 1)))
Method 8
A workable version without new features in python 3.8, adapted from @Atra Azami’s answer.
import itertools
def batch_generator(iterable, batch_size=1):
iterable = iter(iterable)
while True:
batch = list(itertools.islice(iterable, batch_size))
if len(batch) > 0:
yield batch
else:
break
for x in batch_generator(range(0, 10), 3):
print(x)
Output:
[0, 1, 2] [3, 4, 5] [6, 7, 8] [9]
Method 9
def batch(iterable, n):
iterable=iter(iterable)
while True:
chunk=[]
for i in range(n):
try:
chunk.append(next(iterable))
except StopIteration:
yield chunk
return
yield chunk
list(batch(range(10), 3))
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]
Method 10
Moving as much into CPython as possible, by leveraging islice and iter(callable) behavior:
from itertools import islice
def chunked(generator, size):
"""Read parts of the generator, pause each time after a chunk"""
# islice returns results until 'size',
# make_chunk gets repeatedly called by iter(callable).
gen = iter(generator)
make_chunk = lambda: list(islice(gen, size))
return iter(make_chunk, [])
Inspired by more-itertools, and shortened to the essence of that code.
Method 11
I like this one,
def batch(x, bs):
return [x[i:i+bs] for i in range(0, len(x), bs)]
This returns a list of batches of size bs, you can make it a generator by using a generator expression (i for i in iterable) of course.
Method 12
This is what I use in my project. It handles iterables or lists as efficiently as possible.
def chunker(iterable, size):
if not hasattr(iterable, "__len__"):
# generators don't have len, so fall back to slower
# method that works with generators
for chunk in chunker_gen(iterable, size):
yield chunk
return
it = iter(iterable)
for i in range(0, len(iterable), size):
yield [k for k in islice(it, size)]
def chunker_gen(generator, size):
iterator = iter(generator)
for first in iterator:
def chunk():
yield first
for more in islice(iterator, size - 1):
yield more
yield [k for k in chunk()]
Method 13
Here is an approach using reduce function.
Oneliner:
from functools import reduce reduce(lambda cumulator,item: cumulator[-1].append(item) or cumulator if len(cumulator[-1]) < batch_size else cumulator + [[item]], input_array, [[]])
Or more readable version:
from functools import reduce
def batch(input_list, batch_size):
def reducer(cumulator, item):
if len(cumulator[-1]) < batch_size:
cumulator[-1].append(item)
return cumulator
else:
cumulator.append([item])
return cumulator
return reduce(reducer, input_list, [[]])
Test:
>>> batch([1,2,3,4,5,6,7], 3) [[1, 2, 3], [4, 5, 6], [7]] >>> batch(a, 8) [[1, 2, 3, 4, 5, 6, 7]] >>> batch([1,2,3,None,4], 3) [[1, 2, 3], [None, 4]]
Method 14
This would work for any iterable.
from itertools import zip_longest, filterfalse
def batch_iterable(iterable, batch_size=2):
args = [iter(iterable)] * batch_size
return (tuple(filterfalse(lambda x: x is None, group)) for group in zip_longest(fillvalue=None, *args))
It would work like this:
>>>list(batch_iterable(range(0,5)), 2) [(0, 1), (2, 3), (4,)]
PS: It would not work if iterable has None values.
Method 15
You can just group iterable items by their batch index.
def batch(items: Iterable, batch_size: int) -> Iterable[Iterable]:
# enumerate items and group them by batch index
enumerated_item_groups = itertools.groupby(enumerate(items), lambda t: t[0] // batch_size)
# extract items from enumeration tuples
item_batches = ((t[1] for t in enumerated_items) for key, enumerated_items in enumerated_item_groups)
return item_batches
It is often the case when you want to collect inner iterables so here is more advanced version.
def batch_advanced(items: Iterable, batch_size: int, batches_mapper: Callable[[Iterable], Any] = None) -> Iterable[Iterable]:
enumerated_item_groups = itertools.groupby(enumerate(items), lambda t: t[0] // batch_size)
if batches_mapper:
item_batches = (batches_mapper(t[1] for t in enumerated_items) for key, enumerated_items in enumerated_item_groups)
else:
item_batches = ((t[1] for t in enumerated_items) for key, enumerated_items in enumerated_item_groups)
return item_batches
Examples:
print(list(batch_advanced([1, 9, 3, 5, 2, 4, 2], 4, tuple))) # [(1, 9, 3, 5), (2, 4, 2)] print(list(batch_advanced([1, 9, 3, 5, 2, 4, 2], 4, list))) # [[1, 9, 3, 5], [2, 4, 2]]
Method 16
Related functionality you may need:
def batch(size, i):
""" Get the i'th batch of the given size """
return slice(size* i, size* i + size)
Usage:
>>> [1,2,3,4,5,6,7,8,9,10][batch(3, 1)] >>> [4, 5, 6]
It gets the i’th batch from the sequence and it can work with other data structures as well, like pandas dataframes (df.iloc[batch(100,0)]) or numpy array (array[batch(100,0)]).
Method 17
from itertools import *
class SENTINEL: pass
def batch(iterable, n):
return (tuple(filterfalse(lambda x: x is SENTINEL, group)) for group in zip_longest(fillvalue=SENTINEL, *[iter(iterable)] * n))
print(list(range(10), 3)))
# outputs: [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9,)]
print(list(batch([None]*10, 3)))
# outputs: [(None, None, None), (None, None, None), (None, None, None), (None,)]
Method 18
I use
def batchify(arr, batch_size):
num_batches = math.ceil(len(arr) / batch_size)
return [arr[i*batch_size:(i+1)*batch_size] for i in range(num_batches)]
Method 19
Keep taking (at most) n elements until it runs out.
def chop(n, iterable):
iterator = iter(iterable)
while chunk := list(take(n, iterator)):
yield chunk
def take(n, iterable):
iterator = iter(iterable)
for i in range(n):
try:
yield next(iterator)
except StopIteration:
return
Method 20
This code has the following features:
- Can take lists or generators (no len()) as input
- Does not require imports of other packages
- No padding added to last batch
def batch_generator(items, batch_size):
itemid=0 # Keeps track of current position in items generator/list
batch = [] # Empty batch
for item in items:
batch.append(item) # Append items to batch
if len(batch)==batch_size:
yield batch
itemid += batch_size # Increment the position in items
batch = []
yield batch # yield last bit
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0