I have a pandas.Dataframe with the following columns:
a_1 ab_1 ac_1 a_2 ab_2 ac_2 2 3 4 5 6 7
How do I convert it into the following?
a ab ac 2 3 4 5 6 7
I was trying to use pandas melt to convert from wide to long format, but not sure of the syntax.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
You can replace the columns by a multi-index and stack:
df.columns = pd.MultiIndex.from_tuples(df.columns.str.split('_').map(tuple))
df = df.stack()
Method 2
Here is one way to do that:
Code:
df.columns = pd.MultiIndex.from_tuples(
[c.split('_') for c in df.columns], names=['col', 'row'])
df.melt().pivot(index='row', columns='col', values='value')
How?
-
Create a
pandas.MultiIndexfor the columns by splitting on_. -
meltthe data frame and thenpivoton the elements from the original column names.
Test Code:
df = pd.DataFrame(
data=[range(2, 8)],
columns='a_1 ab_1 ac_1 a_2 ab_2 ac_2'.split()
)
print(df)
df.columns = pd.MultiIndex.from_tuples(
[c.split('_') for c in df.columns], names=['col', 'row'])
print(df.melt().pivot(index='row', columns='col', values='value'))
Results:
a_1 ab_1 ac_1 a_2 ab_2 ac_2 0 2 3 4 5 6 7 col a ab ac row 1 2 3 4 2 5 6 7
pandas < 0.20.0
If using pandas prior to 0.20.0, melt() like:
print(pd.melt(df).pivot(index='row', columns='col', values='value'))
Method 3
You can use split for MultiIndex and then reshape by stack and last use reset_index for remove MultiIndex:
df.columns = df.columns.str.split('_', expand=True)
df = df.stack().reset_index(drop=True)
print (df)
a ab ac
0 2 3 4
1 5 6 7
df = df.stack().reset_index(level=0, drop=True) print (df) a ab ac 1 2 3 4 2 5 6 7
Method 4
You can split the dataframe into two, then rename the columns, and finally concatenate them:
cols = ['a', 'ab', 'ac'] df1 = df[["a_1", "ab_1", "ac_1"]] df2 = df[["a_2", "ab_2", "ac_2"]] df1.columns = cols df2.columns = cols df3 = pd.concat([df1, df2], ignore_index=True)
Method 5
If you want to use pnd.melt you should probably use the value_vars and value_name parameters:
df_a = pnd.melt(df, value_vars=['a_1', 'a_2'], value_name='a')[['a']] df_ab = pnd.melt(df, value_vars=['ab_1', 'ab_2'], value_name='ab')[['ab']] df_ac = pnd.melt(df, value_vars=['ac_1', 'ac_2'], value_name='ac')[['ac']] df_final = df_a.join(df_b).join(df_c)
Alternatively, taking a more functional approach:
col_prefixes = ['a', 'ab', 'ac'] df_cuts = map(lambda x: pnd.melt(df, value_vars=['%s_1' % x, '%s_2' % x], value_name=x)[[x]], col_prefixes) df_final = reduce(lambda x, y: x.join(y), df_cuts)
Method 6
There is builtin function wide_to_long for more see in documentation:
In [115]: df
Out[115]:
a_1 ab_1 ac_1 a_2 ab_2 ac_2
0 2 3 4 5 6 7
In [116]: df['id'] = df.index
In [117]: df
Out[117]:
a_1 ab_1 ac_1 a_2 ab_2 ac_2 id
0 2 3 4 5 6 7 0
In [118]: pd.wide_to_long(df, ['a','ab','ac'],i='id',j='num',sep='_')
Out[118]:
a ab ac
id num
0 1 2 3 4
2 5 6 7
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0