I need to divide all but the first columns in a DataFrame by the first column.
Here’s what I’m doing, but I wonder if this isn’t the “right” pandas way:
df = pd.DataFrame(np.random.rand(10,3), columns=list('ABC'))
df[['B', 'C']] = (df.T.iloc[1:] / df.T.iloc[0]).T
Is there a way to do something like df[['B','C']] / df['A']? (That just gives a 10×12 dataframe of nan.)
Also, after reading some similar questions on SO, I tried df['A'].div(df[['B', 'C']]) but that gives a broadcast error.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
I believe df[['B','C']].div(df.A, axis=0) and df.iloc[:,1:].div(df.A, axis=0) work.
Method 2
do: df.iloc[:,1:] = df.iloc[:,1:].div(df.A, axis=0)
This will divide all columns other than the 1st column with the ‘A’ column used as divisor.
Results are 1st column + all columns after / 'divisor column'.
Method 3
You are actually doing a matrix multiplication (Apparently numpy understands that “/” operator multiplies by the inverse), so you need the shapes to match (see here).
e.g.
df['A'].shape –> (10,)
df[['B','C']].shape –> (10,2)
You should make them match as (2,10)(10,):
df[['B','C']].T.shape, df['A'].shape –>((2, 10), (10,))
But then your resulting matrix is:
( df[['B','C']].T / df['A'] ).shape –> (2,10)
Therefore:
( df[['B','C']].T / df['A'] ).T
Shape is (10,2). It gives you the results that you wanted!
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0