how to split an iterable in constant-size chunks

Possible Duplicate:
How do you split a list into evenly sized chunks in Python?

I am surprised I could not find a “batch” function that would take as input an iterable and return an iterable of iterables.

For example:

for i in batch(range(0,10), 1): print i
[0]
[1]
...
[9]

or:

for i in batch(range(0,10), 3): print i
[0,1,2]
[3,4,5]
[6,7,8]
[9]

Now, I wrote what I thought was a pretty simple generator:

def batch(iterable, n = 1):
   current_batch = []
   for item in iterable:
       current_batch.append(item)
       if len(current_batch) == n:
           yield current_batch
           current_batch = []
   if current_batch:
       yield current_batch

But the above does not give me what I would have expected:

for x in   batch(range(0,10),3): print x
[0]
[0, 1]
[0, 1, 2]
[3]
[3, 4]
[3, 4, 5]
[6]
[6, 7]
[6, 7, 8]
[9]

So, I have missed something and this probably shows my complete lack of understanding of python generators. Anyone would care to point me in the right direction ?
[Edit: I eventually realized that the above behavior happens only when I run this within ipython rather than python itself]

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

This is probably more efficient (faster)

def batch(iterable, n=1):
    l = len(iterable)
    for ndx in range(0, l, n):
        yield iterable[ndx:min(ndx + n, l)]

for x in batch(range(0, 10), 3):
    print x

Example using list

data = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] # list of data 

for x in batch(data, 3):
    print(x)

# Output

[0, 1, 2]
[3, 4, 5]
[6, 7, 8]
[9, 10]

It avoids building new lists.

Method 2

FWIW, the recipes in the itertools module provides this example:

def grouper(n, iterable, fillvalue=None):
    "grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return zip_longest(fillvalue=fillvalue, *args)

It works like this:

>>> list(grouper(3, range(10)))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, None, None)]

Method 3

More-itertools includes two functions that do what you need:

Method 4

As others have noted, the code you have given does exactly what you want. For another approach using itertools.islice you could see an example of following recipe:

from itertools import islice, chain

def batch(iterable, size):
    sourceiter = iter(iterable)
    while True:
        batchiter = islice(sourceiter, size)
        yield chain([batchiter.next()], batchiter)

Method 5

Solution for Python 3.8 if you are working with iterables that don’t define a len function, and get exhausted:

from itertools import islice

def batcher(iterable, batch_size):
    iterator = iter(iterable)
    while batch := list(islice(iterator, batch_size)):
        yield batch

Example usage:

def my_gen():
    yield from range(10)
 
for batch in batcher(my_gen(), 3):
    print(batch)

>>> [0, 1, 2]
>>> [3, 4, 5]
>>> [6, 7, 8]
>>> [9]

Could of course be implemented without the walrus operator as well.

Method 6

Weird, seems to work fine for me in Python 2.x

>>> def batch(iterable, n = 1):
...    current_batch = []
...    for item in iterable:
...        current_batch.append(item)
...        if len(current_batch) == n:
...            yield current_batch
...            current_batch = []
...    if current_batch:
...        yield current_batch
...
>>> for x in batch(range(0, 10), 3):
...     print x
...
[0, 1, 2]
[3, 4, 5]
[6, 7, 8]
[9]

Method 7

This is a very short code snippet I know that does not use len and works under both Python 2 and 3 (not my creation):

def chunks(iterable, size):
    from itertools import chain, islice
    iterator = iter(iterable)
    for first in iterator:
        yield list(chain([first], islice(iterator, size - 1)))

Method 8

A workable version without new features in python 3.8, adapted from @Atra Azami’s answer.

import itertools    

def batch_generator(iterable, batch_size=1):
    iterable = iter(iterable)

    while True:
        batch = list(itertools.islice(iterable, batch_size))
        if len(batch) > 0:
            yield batch
        else:
            break

for x in batch_generator(range(0, 10), 3):
    print(x)

Output:

[0, 1, 2]
[3, 4, 5]
[6, 7, 8]
[9]

Method 9

def batch(iterable, n):
    iterable=iter(iterable)
    while True:
        chunk=[]
        for i in range(n):
            try:
                chunk.append(next(iterable))
            except StopIteration:
                yield chunk
                return
        yield chunk

list(batch(range(10), 3))
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]

Method 10

Moving as much into CPython as possible, by leveraging islice and iter(callable) behavior:

from itertools import islice

def chunked(generator, size):
    """Read parts of the generator, pause each time after a chunk"""
    # islice returns results until 'size',
    # make_chunk gets repeatedly called by iter(callable).
    gen = iter(generator)
    make_chunk = lambda: list(islice(gen, size))
    return iter(make_chunk, [])

Inspired by more-itertools, and shortened to the essence of that code.

Method 11

I like this one,

def batch(x, bs):
    return [x[i:i+bs] for i in range(0, len(x), bs)]

This returns a list of batches of size bs, you can make it a generator by using a generator expression (i for i in iterable) of course.

Method 12

This is what I use in my project. It handles iterables or lists as efficiently as possible.

def chunker(iterable, size):
    if not hasattr(iterable, "__len__"):
        # generators don't have len, so fall back to slower
        # method that works with generators
        for chunk in chunker_gen(iterable, size):
            yield chunk
        return

    it = iter(iterable)
    for i in range(0, len(iterable), size):
        yield [k for k in islice(it, size)]


def chunker_gen(generator, size):
    iterator = iter(generator)
    for first in iterator:

        def chunk():
            yield first
            for more in islice(iterator, size - 1):
                yield more

        yield [k for k in chunk()]

Method 13

Here is an approach using reduce function.

Oneliner:

from functools import reduce
reduce(lambda cumulator,item: cumulator[-1].append(item) or cumulator if len(cumulator[-1]) < batch_size else cumulator + [[item]], input_array, [[]])

Or more readable version:

from functools import reduce
def batch(input_list, batch_size):
  def reducer(cumulator, item):
    if len(cumulator[-1]) < batch_size:
      cumulator[-1].append(item)
      return cumulator
    else:
      cumulator.append([item])
    return cumulator
  return reduce(reducer, input_list, [[]])

Test:

>>> batch([1,2,3,4,5,6,7], 3)
[[1, 2, 3], [4, 5, 6], [7]]
>>> batch(a, 8)
[[1, 2, 3, 4, 5, 6, 7]]
>>> batch([1,2,3,None,4], 3)
[[1, 2, 3], [None, 4]]

Method 14

This would work for any iterable.

from itertools import zip_longest, filterfalse

def batch_iterable(iterable, batch_size=2): 
    args = [iter(iterable)] * batch_size 
    return (tuple(filterfalse(lambda x: x is None, group)) for group in zip_longest(fillvalue=None, *args))

It would work like this:

>>>list(batch_iterable(range(0,5)), 2)
[(0, 1), (2, 3), (4,)]

PS: It would not work if iterable has None values.

Method 15

You can just group iterable items by their batch index.

def batch(items: Iterable, batch_size: int) -> Iterable[Iterable]:
    # enumerate items and group them by batch index
    enumerated_item_groups = itertools.groupby(enumerate(items), lambda t: t[0] // batch_size)
    # extract items from enumeration tuples
    item_batches = ((t[1] for t in enumerated_items) for key, enumerated_items in enumerated_item_groups)
    return item_batches

It is often the case when you want to collect inner iterables so here is more advanced version.

def batch_advanced(items: Iterable, batch_size: int, batches_mapper: Callable[[Iterable], Any] = None) -> Iterable[Iterable]:
    enumerated_item_groups = itertools.groupby(enumerate(items), lambda t: t[0] // batch_size)
    if batches_mapper:
        item_batches = (batches_mapper(t[1] for t in enumerated_items) for key, enumerated_items in enumerated_item_groups)
    else:
        item_batches = ((t[1] for t in enumerated_items) for key, enumerated_items in enumerated_item_groups)
    return item_batches

Examples:

print(list(batch_advanced([1, 9, 3, 5, 2, 4, 2], 4, tuple)))
# [(1, 9, 3, 5), (2, 4, 2)]
print(list(batch_advanced([1, 9, 3, 5, 2, 4, 2], 4, list)))
# [[1, 9, 3, 5], [2, 4, 2]]

Method 16

Related functionality you may need:

def batch(size, i):
    """ Get the i'th batch of the given size """
    return slice(size* i, size* i + size)

Usage:

>>> [1,2,3,4,5,6,7,8,9,10][batch(3, 1)]
>>> [4, 5, 6]

It gets the i’th batch from the sequence and it can work with other data structures as well, like pandas dataframes (df.iloc[batch(100,0)]) or numpy array (array[batch(100,0)]).

Method 17

from itertools import *

class SENTINEL: pass

def batch(iterable, n):
    return (tuple(filterfalse(lambda x: x is SENTINEL, group)) for group in zip_longest(fillvalue=SENTINEL, *[iter(iterable)] * n))

print(list(range(10), 3)))
# outputs: [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9,)]
print(list(batch([None]*10, 3)))
# outputs: [(None, None, None), (None, None, None), (None, None, None), (None,)]

Method 18

I use

def batchify(arr, batch_size):
  num_batches = math.ceil(len(arr) / batch_size)
  return [arr[i*batch_size:(i+1)*batch_size] for i in range(num_batches)]
  

Method 19

Keep taking (at most) n elements until it runs out.

def chop(n, iterable):
    iterator = iter(iterable)
    while chunk := list(take(n, iterator)):
        yield chunk


def take(n, iterable):
    iterator = iter(iterable)
    for i in range(n):
        try:
            yield next(iterator)
        except StopIteration:
            return

Method 20

This code has the following features:

  • Can take lists or generators (no len()) as input
  • Does not require imports of other packages
  • No padding added to last batch
def batch_generator(items, batch_size):
    itemid=0 # Keeps track of current position in items generator/list
    batch = [] # Empty batch
    for item in items: 
      batch.append(item) # Append items to batch
      if len(batch)==batch_size:
        yield batch
        itemid += batch_size # Increment the position in items
        batch = []
    yield batch # yield last bit


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x