I have a pandas data frame like this:
Column1 Column2 Column3 Column4 Column5 0 a 1 2 3 4 1 a 3 4 5 2 b 6 7 8 3 c 7 7
What I want to do now is getting a new dataframe containing Column1 and a new columnA. This columnA should contain all values from columns 2 -(to) n (where n is the number of columns from Column2 to the end of the row) like this:
Column1 ColumnA 0 a 1,2,3,4 1 a 3,4,5 2 b 6,7,8 3 c 7,7
How could I best approach this issue? Any advice would be helpful. Thanks in advance!
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
You can call apply pass axis=1 to apply row-wise, then convert the dtype to str and join:
In [153]:
df['ColumnA'] = df[df.columns[1:]].apply(
lambda x: ','.join(x.dropna().astype(str)),
axis=1
)
df
Out[153]:
Column1 Column2 Column3 Column4 Column5 ColumnA
0 a 1 2 3 4 1,2,3,4
1 a 3 4 5 NaN 3,4,5
2 b 6 7 8 NaN 6,7,8
3 c 7 7 NaN NaN 7,7
Here I call dropna to get rid of the NaN, however we need to cast again to int so we don’t end up with floats as str.
Method 2
I propose to use .assign
df2 = df.assign(ColumnA = df.Column2.astype(str) + ', ' + df.Column3.astype(str) + ', ' df.Column4.astype(str) + ', ' df.Column4.astype(str) + ', ' df.Column5.astype(str))
it’s simple, maybe long but it worked for me
Method 3
If you have lot of columns say – 1000 columns in dataframe and you want to merge few columns based on particular column name e.g. –Column2 in question and arbitrary no. of columns after that column (e.g. here 3 columns after 'Column2 inclusive of Column2 as OP asked).
We can get position of column using .get_loc() – as answered here
source_col_loc = df.columns.get_loc('Column2') # column position starts from 0
df['ColumnA'] = df.iloc[:,source_col_loc+1:source_col_loc+4].apply(
lambda x: ",".join(x.astype(str)), axis=1)
df
Column1 Column2 Column3 Column4 Column5 ColumnA
0 a 1 2 3 4 1,2,3,4
1 a 3 4 5 NaN 3,4,5
2 b 6 7 8 NaN 6,7,8
3 c 7 7 NaN NaN 7,7
To remove NaN, use .dropna() or .fillna()
Hope it helps!
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0