I have a data frame with 9 columns (my real data is very big). I want to consider 4 by 4 columns and build a
new dataframe with 2 columns which shows the summation of those 4 columns. Here is a simple example:
I want to have the id column.
import pandas as pd df = pd.DataFrame() df['id'] = [1, 2, 3, 4] df['a'] = [10, 0, 1, 3] df['b'] = [-10, 0, 2, 2] df['c'] = [0, 1, 3, 3] df['d'] = [0, 0, 4, 4] df['e'] = [10, 0, 1, 3] df['f'] = [10, 0, 2, 2] df['g'] = [0, -1, 0, 0] df['h'] = [0, 0, 0, 0] df
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
You can use the underlying numpy array for an easy way to reshape:
a = df.drop(columns='id').to_numpy()
N = 4 # number of columns to group
df2 = pd.DataFrame(a.reshape((len(df), -1, N)).sum(2),
columns=[f'value{x+1}' for x in range(a.shape[1]//N)],
index=df['id']).reset_index()
output:
id value1 value2 0 1 0 20 1 2 1 -1 2 3 10 3 3 4 12 5
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0
