Pandas: sum DataFrame rows for given columns

I have the following DataFrame:

In [1]:
df = pd.DataFrame({'a': [1, 2, 3],
                   'b': [2, 3, 4],
                   'c': ['dd', 'ee', 'ff'],
                   'd': [5, 9, 1]})

df
Out [1]:
   a  b   c  d
0  1  2  dd  5
1  2  3  ee  9
2  3  4  ff  1

I would like to add a column 'e' which is the sum of columns 'a', 'b' and 'd'.

Going across forums, I thought something like this would work:

df['e'] = df[['a', 'b', 'd']].map(sum)

But it didn’t.

I would like to know the appropriate operation with the list of columns ['a', 'b', 'd'] and df as inputs.

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

You can just sum and set param axis=1 to sum the rows, this will ignore none numeric columns:

In [91]:

df = pd.DataFrame({'a': [1,2,3], 'b': [2,3,4], 'c':['dd','ee','ff'], 'd':[5,9,1]})
df['e'] = df.sum(axis=1)
df
Out[91]:
   a  b   c  d   e
0  1  2  dd  5   8
1  2  3  ee  9  14
2  3  4  ff  1   8

If you want to just sum specific columns then you can create a list of the columns and remove the ones you are not interested in:

In [98]:

col_list= list(df)
col_list.remove('d')
col_list
Out[98]:
['a', 'b', 'c']
In [99]:

df['e'] = df[col_list].sum(axis=1)
df
Out[99]:
   a  b   c  d  e
0  1  2  dd  5  3
1  2  3  ee  9  5
2  3  4  ff  1  7

Method 2

If you have just a few columns to sum, you can write:

df['e'] = df['a'] + df['b'] + df['d']

This creates new column e with the values:

   a  b   c  d   e
0  1  2  dd  5   8
1  2  3  ee  9  14
2  3  4  ff  1   8

For longer lists of columns, EdChum’s answer is preferred.

Method 3

Create a list of column names you want to add up.

df['total']=df.loc[:,list_name].sum(axis=1)

If you want the sum for certain rows, specify the rows using ‘:’

Method 4

This is a simpler way using iloc to select which columns to sum:

df['f']=df.iloc[:,0:2].sum(axis=1)
df['g']=df.iloc[:,[0,1]].sum(axis=1)
df['h']=df.iloc[:,[0,3]].sum(axis=1)

Produces:

   a  b   c  d   e  f  g   h
0  1  2  dd  5   8  3  3   6
1  2  3  ee  9  14  5  5  11
2  3  4  ff  1   8  7  7   4

I can’t find a way to combine a range and specific columns that works e.g. something like:

df['i']=df.iloc[:,[[0:2],3]].sum(axis=1)
df['i']=df.iloc[:,[0:2,3]].sum(axis=1)

Method 5

You can simply pass your dataframe into the following function:

def sum_frame_by_column(frame, new_col_name, list_of_cols_to_sum):
    frame[new_col_name] = frame[list_of_cols_to_sum].astype(float).sum(axis=1)
    return(frame)

Example:

I have a dataframe (awards_frame) as follows:

enter image description here

…and I want to create a new column that shows the sum of awards for each row:

Usage:

I simply pass my awards_frame into the function, also specifying the name of the new column, and a list of column names that are to be summed:

sum_frame_by_column(awards_frame, 'award_sum', ['award_1','award_2','award_3'])

Result:

enter image description here

Method 6

Following syntax helped me when I have columns in sequence

awards_frame.values[:,1:4].sum(axis =1)

Method 7

The shortest and simplest way here is to use

df.eval('e = a + b + d')

Method 8

You can use the function aggragate or agg:

df[['a','b','d']].agg('sum', axis=1)

The advantage of agg is that you can use multiple aggregation functions:

df[['a','b','d']].agg(['sum', 'prod', 'min', 'max'], axis=1)

Output:

   sum  prod  min  max
0    8    10    1    5
1   14    54    2    9
2    8    12    1    4


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x