I would like to create a new column with a numerical value based on the following conditions:
a. if gender is male & pet1=pet2, points = 5
b. if gender is female & (pet1 is 'cat' or pet1='dog'), points = 5
c. all other combinations, points = 0
gender pet1 pet2 0 male dog dog 1 male cat cat 2 male dog cat 3 female cat squirrel 4 female dog dog 5 female squirrel cat 6 squirrel dog cat
I would like the end result to be as follows:
gender pet1 pet2 points 0 male dog dog 5 1 male cat cat 5 2 male dog cat 0 3 female cat squirrel 5 4 female dog dog 5 5 female squirrel cat 0 6 squirrel dog cat 0
How do I accomplish this?
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
numpy.select
This is a perfect case for np.select where we can create a column based on multiple conditions and it’s a readable method when there are more conditions:
conditions = [
df['gender'].eq('male') & df['pet1'].eq(df['pet2']),
df['gender'].eq('female') & df['pet1'].isin(['cat', 'dog'])
]
choices = [5,5]
df['points'] = np.select(conditions, choices, default=0)
print(df)
gender pet1 pet2 points
0 male dog dog 5
1 male cat cat 5
2 male dog cat 0
3 female cat squirrel 5
4 female dog dog 5
5 female squirrel cat 0
6 squirrel dog cat 0
Method 2
You can do this using np.where, the conditions use bitwise & and | for and and or with parentheses around the multiple conditions due to operator precedence. So where the condition is true 5 is returned and 0 otherwise:
In [29]:
df['points'] = np.where( ( (df['gender'] == 'male') & (df['pet1'] == df['pet2'] ) ) | ( (df['gender'] == 'female') & (df['pet1'].isin(['cat','dog'] ) ) ), 5, 0)
df
Out[29]:
gender pet1 pet2 points
0 male dog dog 5
1 male cat cat 5
2 male dog cat 0
3 female cat squirrel 5
4 female dog dog 5
5 female squirrel cat 0
6 squirrel dog cat 0
Method 3
using apply.
def f(x): if x['gender'] == 'male' and x['pet1'] == x['pet2']: return 5 elif x['gender'] == 'female' and (x['pet1'] == 'cat' or x['pet1'] == 'dog'): return 5 else: return 0 data['points'] = data.apply(f, axis=1)
Method 4
The apply method described by @RuggeroTurra takes a lot longer for 500k rows. I ended up using something like
df['result'] = ((df.a == 0) & (df.b != 1)).astype(int) * 2 +
((df.a != 0) & (df.b != 1)).astype(int) * 3 +
((df.a == 0) & (df.b == 1)).astype(int) * 4 +
((df.a != 0) & (df.b == 1)).astype(int) * 5
where the apply method took 25 seconds and this method above took about 18ms.
Method 5
You can also use the apply function. For example:
def myfunc(gender, pet1, pet2):
if gender=='male' and pet1==pet2:
myvalue=5
elif gender=='female' and (pet1=='cat' or pet1=='dog'):
myvalue=5
else:
myvalue=0
return myvalue
And then using the apply function by setting axis=1
df['points'] = df.apply(lambda x: myfunc(x['gender'], x['pet1'], x['pet2']), axis=1)
We get:
gender pet1 pet2 points 0 male dog dog 5 1 male cat cat 5 2 male dog cat 0 3 female cat squirrel 5 4 female dog dog 5 5 female squirrel cat 0 6 squirrel dog cat 0
Method 6
One option is with case_when from pyjanitor; under the hood it uses pd.Series.mask.
The basic idea is a pairing of condition and expected value; you can pass as many pairings as required, followed by a default value and a target column name:
# pip install pyjanitor
import pandas as pd
import janitor
df.case_when(
# condition, value
df.gender.eq('male') & df.pet1.eq(df.pet2), 5,
df.gender.eq('female') & df.pet1.isin(['cat', 'dog']), 5,
0, # default
column_name = 'points')
gender pet1 pet2 points
0 male dog dog 5
1 male cat cat 5
2 male dog cat 0
3 female cat squirrel 5
4 female dog dog 5
5 female squirrel cat 0
6 squirrel dog cat 0
You could use strings for the conditions, as long as they can be evaluated by pd.eval on the parent dataframe – note that speed wise, this can be slower for small datasets:
df.case_when(
"gender == 'male' and pet1 == pet2", 5,
"gender == 'female' and pet2 == ['cat', 'dog']", 5,
0,
column_name = 'points')
gender pet1 pet2 points
0 male dog dog 5
1 male cat cat 5
2 male dog cat 0
3 female cat squirrel 0
4 female dog dog 5
5 female squirrel cat 5
6 squirrel dog cat 0
Anonymous functions are also possible, which can be handy in chained operations:
df.case_when(
lambda df: df.gender.eq('male') & df.pet1.eq(df.pet2), 5,
lambda df: df.gender.eq('female') & df.pet1.isin(['cat', 'dog']), 5,
0, # default
column_name = 'points')
gender pet1 pet2 points
0 male dog dog 5
1 male cat cat 5
2 male dog cat 0
3 female cat squirrel 5
4 female dog dog 5
5 female squirrel cat 0
6 squirrel dog cat 0
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0