So I have a dataframe, df1, that looks like the following:
A B C 1 foo 12 California 2 foo 22 California 3 bar 8 Rhode Island 4 bar 32 Rhode Island 5 baz 15 Ohio 6 baz 26 Ohio
I want to group by column A and then sum column B while keeping the value in column C. Something like this:
A B C 1 foo 34 California 2 bar 40 Rhode Island 3 baz 41 Ohio
The issue is, when I say
df.groupby('A').sum()
column C gets removed, returning
B A bar 40 baz 41 foo 34
How can I get around this and keep column C when I group and sum?
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
The only way to do this would be to include C in your groupby (the groupby function can accept a list).
Give this a try:
df.groupby(['A','C'])['B'].sum()
One other thing to note, if you need to work with df after the aggregation you can also use the as_index=False option to return a dataframe object. This one gave me problems when I was first working with Pandas. Example:
df.groupby(['A','C'], as_index=False)['B'].sum()
Method 2
If you don’t care what’s in your column C and just want the nth value, you could just do this:
df.groupby('A').agg({'B' : 'sum',
'C' : lambda x: x.iloc[n]})
Method 3
Another option is to use groupby.agg and use the first method on column "C".
out = df.groupby('A', as_index=False, sort=False).agg({'B':'sum', 'C':'first'})
Output:
A B C 0 foo 34 California 1 bar 40 Rhode Island 2 baz 41 Ohio
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0