Sample each group after pandas groupby

I know this must have been answered some where but I just could not find it.

Problem: Sample each group after groupby operation.

import pandas as pd

df = pd.DataFrame({'a': [1,2,3,4,5,6,7],
                   'b': [1,1,1,0,0,0,0]})

grouped = df.groupby('b')

# now sample from each group, e.g., I want 30% of each group

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

Apply a lambda and call sample with param frac:

In [2]:
df = pd.DataFrame({'a': [1,2,3,4,5,6,7],
                   'b': [1,1,1,0,0,0,0]})
​
grouped = df.groupby('b')
grouped.apply(lambda x: x.sample(frac=0.3))

Out[2]:
     a  b
b        
0 6  7  0
1 2  3  1

Method 2

pandas >= 1.1: GroupBy.sample

This works like magic:

# np.random.seed(0)
df.groupby('b').sample(frac=.3) 

   a  b
5  6  0
0  1  1

pandas <= 1.0.X

You can use GroupBy.apply with sample. You do not need to use a lambda; apply accepts keyword arguments:

df.groupby('b', group_keys=False).apply(pd.DataFrame.sample, frac=.3)

   a  b
5  6  0
0  1  1


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x