Pandas: How do I assign values based on multiple conditions for existing columns?

I would like to create a new column with a numerical value based on the following conditions:

a. if gender is male & pet1=pet2, points = 5

b. if gender is female & (pet1 is 'cat' or pet1='dog'), points = 5

c. all other combinations, points = 0

    gender    pet1      pet2
0   male      dog       dog
1   male      cat       cat
2   male      dog       cat
3   female    cat       squirrel
4   female    dog       dog
5   female    squirrel  cat
6   squirrel  dog       cat

I would like the end result to be as follows:

    gender    pet1      pet2      points
0   male      dog       dog       5
1   male      cat       cat       5
2   male      dog       cat       0
3   female    cat       squirrel  5
4   female    dog       dog       5
5   female    squirrel  cat       0
6   squirrel  dog       cat       0

How do I accomplish this?

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

numpy.select

This is a perfect case for np.select where we can create a column based on multiple conditions and it’s a readable method when there are more conditions:

conditions = [
    df['gender'].eq('male') & df['pet1'].eq(df['pet2']),
    df['gender'].eq('female') & df['pet1'].isin(['cat', 'dog'])
]

choices = [5,5]

df['points'] = np.select(conditions, choices, default=0)

print(df)
     gender      pet1      pet2  points
0      male       dog       dog       5
1      male       cat       cat       5
2      male       dog       cat       0
3    female       cat  squirrel       5
4    female       dog       dog       5
5    female  squirrel       cat       0
6  squirrel       dog       cat       0

Method 2

You can do this using np.where, the conditions use bitwise & and | for and and or with parentheses around the multiple conditions due to operator precedence. So where the condition is true 5 is returned and 0 otherwise:

In [29]:
df['points'] = np.where( ( (df['gender'] == 'male') & (df['pet1'] == df['pet2'] ) ) | ( (df['gender'] == 'female') & (df['pet1'].isin(['cat','dog'] ) ) ), 5, 0)
df

Out[29]:
     gender      pet1      pet2  points
0      male       dog       dog       5
1      male       cat       cat       5
2      male       dog       cat       0
3    female       cat  squirrel       5
4    female       dog       dog       5
5    female  squirrel       cat       0
6  squirrel       dog       cat       0

Method 3

using apply.

def f(x):
  if x['gender'] == 'male' and x['pet1'] == x['pet2']: return 5
  elif x['gender'] == 'female' and (x['pet1'] == 'cat' or x['pet1'] == 'dog'): return 5
  else: return 0

data['points'] = data.apply(f, axis=1)

Method 4

The apply method described by @RuggeroTurra takes a lot longer for 500k rows. I ended up using something like

df['result'] = ((df.a == 0) & (df.b != 1)).astype(int) * 2 + 
               ((df.a != 0) & (df.b != 1)).astype(int) * 3 + 
               ((df.a == 0) & (df.b == 1)).astype(int) * 4 + 
               ((df.a != 0) & (df.b == 1)).astype(int) * 5

where the apply method took 25 seconds and this method above took about 18ms.

Method 5

You can also use the apply function. For example:

def myfunc(gender, pet1, pet2):
    if gender=='male' and pet1==pet2:
        myvalue=5
    elif gender=='female' and (pet1=='cat' or pet1=='dog'):
        myvalue=5
    else:
        myvalue=0
    return myvalue

And then using the apply function by setting axis=1

df['points'] = df.apply(lambda x: myfunc(x['gender'], x['pet1'], x['pet2']), axis=1)

We get:

     gender      pet1      pet2  points
0      male       dog       dog       5
1      male       cat       cat       5
2      male       dog       cat       0
3    female       cat  squirrel       5
4    female       dog       dog       5
5    female  squirrel       cat       0
6  squirrel       dog       cat       0

Method 6

One option is with case_when from pyjanitor; under the hood it uses pd.Series.mask.

The basic idea is a pairing of condition and expected value; you can pass as many pairings as required, followed by a default value and a target column name:

# pip install pyjanitor
import pandas as pd
import janitor
df.case_when(
    # condition, value
    df.gender.eq('male') & df.pet1.eq(df.pet2), 5,
    df.gender.eq('female') & df.pet1.isin(['cat', 'dog']), 5,
    0, # default
    column_name = 'points')

     gender      pet1      pet2  points
0      male       dog       dog       5
1      male       cat       cat       5
2      male       dog       cat       0
3    female       cat  squirrel       5
4    female       dog       dog       5
5    female  squirrel       cat       0
6  squirrel       dog       cat       0

You could use strings for the conditions, as long as they can be evaluated by pd.eval on the parent dataframe – note that speed wise, this can be slower for small datasets:

df.case_when(
   "gender == 'male' and pet1 == pet2", 5,
   "gender == 'female' and pet2 == ['cat', 'dog']", 5,
   0,
   column_name = 'points')

     gender      pet1      pet2  points
0      male       dog       dog       5
1      male       cat       cat       5
2      male       dog       cat       0
3    female       cat  squirrel       0
4    female       dog       dog       5
5    female  squirrel       cat       5
6  squirrel       dog       cat       0

Anonymous functions are also possible, which can be handy in chained operations:

df.case_when(
    lambda df: df.gender.eq('male') & df.pet1.eq(df.pet2), 5,
    lambda df: df.gender.eq('female') & df.pet1.isin(['cat', 'dog']), 5,
    0, # default
    column_name = 'points')

     gender      pet1      pet2  points
0      male       dog       dog       5
1      male       cat       cat       5
2      male       dog       cat       0
3    female       cat  squirrel       5
4    female       dog       dog       5
5    female  squirrel       cat       0
6  squirrel       dog       cat       0


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x