Split dataframe into relatively even chunks according to length

I have to create a function which would split provided dataframe into chunks of needed size. For instance if dataframe contains 1111 rows, I want to be able to specify chunk size of 400 rows, and get three smaller dataframes with sizes of 400, 400 and 311. Is there a convenience function to do the job? What would be the best way to store and iterate over sliced dataframe?

Example DataFrame

import numpy as np
import pandas as pd

test = pd.concat([pd.Series(np.random.rand(1111)), pd.Series(np.random.rand(1111))], axis = 1)

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

You can take the floor division of a sequence up to the amount of rows in the dataframe, and use it to groupby splitting the dataframe into equally sized chunks:

n = 400
for g, df in test.groupby(np.arange(len(test)) // n):
    print(df.shape)
# (400, 2)
# (400, 2)
# (311, 2)

Method 2

A more pythonic way to break large dataframes into smaller chunks based on fixed number of rows is to use list comprehension:

n = 400  #chunk row size
list_df = [test[i:i+n] for i in range(0,test.shape[0],n)]

[i.shape for i in list_df]

Output:

[(400, 2), (400, 2), (311, 2)]


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x