My df looks as follows:
Index Country Val1 Val2 ... Val10 1 Australia 1 3 ... 5 2 Bambua 12 33 ... 56 3 Tambua 14 34 ... 58
I’d like to substract Val10 from Val1 for each country, so output looks like:
Country Val10-Val1 Australia 4 Bambua 23 Tambua 24
So far I’ve got:
def myDelta(row):
data = row[['Val10', 'Val1']]
return pd.Series({'Delta': np.subtract(data)})
def runDeltas():
myDF = getDF()
.apply(myDelta, axis=1)
.sort_values(by=['Delta'], ascending=False)
return myDF
runDeltas results in this error:
ValueError: ('invalid number of arguments', u'occurred at index 9')
What’s the proper way to fix this?
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
Given the following dataframe:
df = pd.DataFrame([["Australia", 1, 3, 5],
["Bambua", 12, 33, 56],
["Tambua", 14, 34, 58]
], columns=["Country", "Val1", "Val2", "Val10"]
)
It comes down to a simple broadcasting operation:
>>> df["Val1"] - df["Val10"] 0 -4 1 -44 2 -44 dtype: int64
Method 2
Using this as the df:
df = pd.DataFrame([["Australia", 1, 3, 5],
["Bambua", 12, 33, 56],
["Tambua", 14, 34, 58]
], columns=["Country", "Val1", "Val2", "Val10"]
)
You can also do the subtraction and put it into a new column as follows.
>>>df['Val_Diff'] = df['Val10'] - df['Val1']
Country Val1 Val2 Val10 Val_Diff
0 Australia 1 3 5 4
1 Bambua 12 33 56 44
2 Tambua 14 34 58 44
Method 3
You can do this by using lambda function and assign to new column.
df['Val10-Val1'] = df.apply(lambda x: x['Val10'] - x['Val1'], axis=1) print df
Method 4
You can also use pandas.DataFrame.assign function: e,g
import numpy as np
import pandas as pd
df = pd.DataFrame([["Australia", 1, 3, 5],
["Bambua", 12, 33, 56],
["Tambua", 14, 34, 58]
], columns=["Country", "Val1", "Val2", "Val10"]
)
df = df.assign(Val10_minus_Val1 = df['Val10'] - df['Val1'])
The best part of assign is you can add as many assignments as you wish. e.g. getting both the difference and then the log of it
df = df.assign(Val10_minus_Val1 = df['Val10'] - df['Val1'], log_result = lambda x: np.log(x.Val10_minus_Val1) )
Method 5
Though it’s an old question but pandas allows subtracting two DataFrames or Seriess using pandas.DataFrame.subtract
import pandas as pd
df = pd.DataFrame([["Australia", 1, 3, 5],
["Bambua", 12, 33, 56],
["Tambua", 14, 34, 58]
], columns=["Country", "Val1", "Val2", "Val10"]
)
df["Val1"].subtract(df["Val2"])
Output:
0 -2 1 -21 2 -20 dtype: int64
Method 6
What I have faced today, makes me ambitious to share it with you. As people mentioned above you can used easily:
df['Val10-Val1'] = df['Val10']-df['Val1']
but sometimes you might need to use apply function, so you might use the following line:
df['Val10-Val1'] = df.apply(lambda row: row['Val10']-row['Val1'])
Method 7
You can also use eval here:
In [12]: df.eval('Val10_minus_Val1 = Val10-Val1', inplace=True)
In [13]: df
Out[13]:
Country Val1 Val2 Val10 Val10_minus_Val1
0 Australia 1 3 5 4
1 Bambua 12 33 56 44
2 Tambua 14 34 58 44
Since inplace=True you don’t have to assign it back to df.
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0
