Applying function with multiple arguments to create a new pandas column

I want to create a new column in a pandas data frame by applying a function to two existing columns. Following this answer I’ve been able to create a new column when I only need one column as an argument:

import pandas as pd
df = pd.DataFrame({"A": [10,20,30], "B": [20, 30, 10]})

def fx(x):
    return x * x

print(df)
df['newcolumn'] = df.A.apply(fx)
print(df)

However, I cannot figure out how to do the same thing when the function requires multiple arguments. For example, how do I create a new column by passing column A and column B to the function below?

def fxy(x, y):
    return x * y

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

You can go with @greenAfrican example, if it’s possible for you to rewrite your function. But if you don’t want to rewrite your function, you can wrap it into anonymous function inside apply, like this:

>>> def fxy(x, y):
...     return x * y

>>> df['newcolumn'] = df.apply(lambda x: fxy(x['A'], x['B']), axis=1)
>>> df
    A   B  newcolumn
0  10  20        200
1  20  30        600
2  30  10        300

Method 2

Alternatively, you can use numpy underlying function:

>>> import numpy as np
>>> df = pd.DataFrame({"A": [10,20,30], "B": [20, 30, 10]})
>>> df['new_column'] = np.multiply(df['A'], df['B'])
>>> df
    A   B  new_column
0  10  20         200
1  20  30         600
2  30  10         300

or vectorize arbitrary function in general case:

>>> def fx(x, y):
...     return x*y
...
>>> df['new_column'] = np.vectorize(fx)(df['A'], df['B'])
>>> df
    A   B  new_column
0  10  20         200
1  20  30         600
2  30  10         300

Method 3

This solves the problem:

df['newcolumn'] = df.A * df.B

You could also do:

def fab(row):
  return row['A'] * row['B']

df['newcolumn'] = df.apply(fab, axis=1)

Method 4

If you need to create multiple columns at once:

  1. Create the dataframe:
    import pandas as pd
    df = pd.DataFrame({"A": [10,20,30], "B": [20, 30, 10]})
  2. Create the function:
    def fab(row):                                                  
        return row['A'] * row['B'], row['A'] + row['B']
  3. Assign the new columns:
    df['newcolumn'], df['newcolumn2'] = zip(*df.apply(fab, axis=1))

Method 5

One more dict style clean syntax:

df["new_column"] = df.apply(lambda x: x["A"] * x["B"], axis = 1)

or,

df["new_column"] = df["A"] * df["B"]

Method 6

This will dynamically give you desired result. It works even if you have more than two arguments

df['anothercolumn'] = df[['A', 'B']].apply(lambda x: fxy(*x), axis=1)
print(df)


    A   B  newcolumn  anothercolumn
0  10  20        100            200
1  20  30        400            600
2  30  10        900            300


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x