I cannot figure out how to do “reverse melt” using Pandas in python.
This is my starting data
import pandas as pd
from StringIO import StringIO
origin = pd.read_table(StringIO('''label type value
x a 1
x b 2
x c 3
y a 4
y b 5
y c 6
z a 7
z b 8
z c 9'''))
origin
Out[5]:
label type value
0 x a 1
1 x b 2
2 x c 3
3 y a 4
4 y b 5
5 y c 6
6 z a 7
7 z b 8
8 z c 9
This is the output I would like to have:
label a b c
x 1 2 3
y 4 5 6
z 7 8 9
I’m sure there is an easy way to do this, but I don’t know how.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
there are a few ways;
using .pivot:
>>> origin.pivot(index='label', columns='type')['value'] type a b c label x 1 2 3 y 4 5 6 z 7 8 9 [3 rows x 3 columns]
using pivot_table:
>>> origin.pivot_table(values='value', index='label', columns='type')
value
type a b c
label
x 1 2 3
y 4 5 6
z 7 8 9
[3 rows x 3 columns]
or .groupby followed by .unstack:
>>> origin.groupby(['label', 'type'])['value'].aggregate('mean').unstack()
type a b c
label
x 1 2 3
y 4 5 6
z 7 8 9
[3 rows x 3 columns]
Method 2
DataFrame.set_index + DataFrame.unstack
df.set_index(['label','type'])['value'].unstack() type a b c label x 1 2 3 y 4 5 6 z 7 8 9
simplifying the passing of pivot arguments
df.pivot(*df) type a b c label x 1 2 3 y 4 5 6 z 7 8 9
[*df] #['label', 'type', 'value']
For expected output we need DataFrame.reset_index and DataFrame.rename_axis
df.pivot(*df).rename_axis(columns = None).reset_index() label a b c 0 x 1 2 3 1 y 4 5 6 2 z 7 8 9
if there are duplicates in a,b columns we could lose information so we need GroupBy.cumcount
print(df) label type value 0 x a 1 1 x b 2 2 x c 3 3 y a 4 4 y b 5 5 y c 6 6 z a 7 7 z b 8 8 z c 9 0 x a 1 1 x b 2 2 x c 3 3 y a 4 4 y b 5 5 y c 6 6 z a 7 7 z b 8 8 z c 9
df.pivot_table(index = ['label',
df.groupby(['label','type']).cumcount()],
columns = 'type',
values = 'value')
type a b c
label
x 0 1 2 3
1 1 2 3
y 0 4 5 6
1 4 5 6
z 0 7 8 9
1 7 8 9
Or:
(df.assign(type_2 = df.groupby(['label','type']).cumcount())
.set_index(['label','type','type_2'])['value']
.unstack('type'))
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0