I’m trying to select a random subset of a pd.DataFrame and set a value to a certain column. Here’s a toy example:
import pandas as pd
df = pd.DataFrame({
'species': ['platypus', 'monkey', 'possum'],
'name': ['mike', 'paul', 'doug'],
'group': ['control', 'control', 'control']
})
species name group 0 platypus mike control 1 monkey paul control 2 possum doug control
I tried the follow, to randomly assign two people to the experimental group, but it won’t work:
df.sample(2)['group'] = 'experimental'
This won’t work either, in fact:
df.iloc[[0, 1]]['group'] = 'experimental'
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
You can use df.sample(2).index to get the indexes in your df of the randomly sampled data, you can then pass this into .loc to set the group column for those indexes to be ‘experimental’ as below:
df.loc[df.sample(2).index, 'group'] = 'experimental'
Output:
species name group 0 platypus mike experimental 1 monkey paul experimental 2 possum doug control
Method 2
Here is something that picks random indexes, random number of times.
import pandas as pd
import random
def custom_randomizer(df, col):
total_randoms = random.choice(df.index) + 1
for _ in range(total_randoms):
df.loc[random.choice(df.index), col] = 'expiremental'
return df
df = pd.DataFrame({
'species': ['platypus', 'monkey', 'possum'],
'name': ['mike', 'paul', 'doug'],
'group': ['control', 'control', 'control']
})
df = custom_randomizer(df, 'group')
print(df)
Method 3
df['group'].iloc[[0, 1]] = 'experimental'
Output
species name group 0 platypus mike experimental 1 monkey paul experimental 2 possum doug control
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0