Convert pandas dataframe from wide to long

I have a pandas.Dataframe with the following columns:

a_1  ab_1  ac_1    a_2   ab_2   ac_2
2      3     4      5     6      7

How do I convert it into the following?

a    ab    ac 
2     3     4
5     6     7

I was trying to use pandas melt to convert from wide to long format, but not sure of the syntax.

Contents hide

Answers:

Method 1

Method 2

Code:

How?

Test Code:

Results:

pandas < 0.20.0

Method 3

Method 4

Method 5

Method 6

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

You can replace the columns by a multi-index and stack:

df.columns = pd.MultiIndex.from_tuples(df.columns.str.split('_').map(tuple))
df = df.stack()

Method 2

Here is one way to do that:

Code:

df.columns = pd.MultiIndex.from_tuples(
    [c.split('_') for c in df.columns], names=['col', 'row'])

df.melt().pivot(index='row', columns='col', values='value')

How?

Create a pandas.MultiIndex for the columns by splitting on _.
melt the data frame and then pivot on the elements from the original column names.

Test Code:

df = pd.DataFrame(
    data=[range(2, 8)],
    columns='a_1 ab_1 ac_1 a_2 ab_2 ac_2'.split()
)
print(df)

df.columns = pd.MultiIndex.from_tuples(
    [c.split('_') for c in df.columns], names=['col', 'row'])

print(df.melt().pivot(index='row', columns='col', values='value'))

Results:

   a_1  ab_1  ac_1  a_2  ab_2  ac_2
0    2     3     4    5     6     7

col  a  ab  ac
row           
1    2   3   4
2    5   6   7

pandas < 0.20.0

If using pandas prior to 0.20.0, melt() like:

print(pd.melt(df).pivot(index='row', columns='col', values='value'))

Method 3

You can use split for MultiIndex and then reshape by stack and last use reset_index for remove MultiIndex:

df.columns = df.columns.str.split('_', expand=True)
df = df.stack().reset_index(drop=True)
print (df)
   a  ab  ac
0  2   3   4
1  5   6   7

df = df.stack().reset_index(level=0, drop=True)
print (df)
   a  ab  ac
1  2   3   4
2  5   6   7

Method 4

You can split the dataframe into two, then rename the columns, and finally concatenate them:

cols = ['a', 'ab', 'ac']
df1 = df[["a_1", "ab_1", "ac_1"]]
df2 = df[["a_2", "ab_2", "ac_2"]]
df1.columns = cols
df2.columns = cols
df3 = pd.concat([df1, df2], ignore_index=True)

Method 5

If you want to use pnd.melt you should probably use the value_vars and value_name parameters:

df_a = pnd.melt(df, value_vars=['a_1', 'a_2'], value_name='a')[['a']]
df_ab = pnd.melt(df, value_vars=['ab_1', 'ab_2'], value_name='ab')[['ab']]
df_ac = pnd.melt(df, value_vars=['ac_1', 'ac_2'], value_name='ac')[['ac']]
df_final = df_a.join(df_b).join(df_c)

Alternatively, taking a more functional approach:

col_prefixes = ['a', 'ab', 'ac']
df_cuts = map(lambda x: pnd.melt(df, value_vars=['%s_1' % x, '%s_2' % x], value_name=x)[[x]], col_prefixes)
df_final = reduce(lambda x, y: x.join(y), df_cuts)

Method 6

There is builtin function wide_to_long for more see in documentation:

In [115]: df
Out[115]: 
   a_1  ab_1  ac_1  a_2  ab_2  ac_2
0    2     3     4    5     6     7

In [116]: df['id'] = df.index

In [117]: df
Out[117]: 
   a_1  ab_1  ac_1  a_2  ab_2  ac_2  id
0    2     3     4    5     6     7   0
In [118]: pd.wide_to_long(df, ['a','ab','ac'],i='id',j='num',sep='_')
Out[118]: 
        a  ab  ac
id num           
0  1    2   3   4
   2    5   6   7

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes

Article Rating