I currently have a DataFrame laid out as:
Jan Feb Mar Apr ... 2001 1 12 12 19 2002 9 ... 2003 ...
and I would like to “unpivot” the data to look like:
Date Value Jan 2001 1 Feb 2001 1 Mar 2001 12 ... Jan 2002 9
What is the best way to accomplish this using pandas/NumPy?
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
You just have to do df.unstack() and that will create a MultiIndexed Series with month as a first level and the year as the second level index. If you want them to be columns then just call reset_index() after that.
>>> df
Jan Feb
2001 3 4
2002 2 7
>>> df.unstack()
Jan 2001 3
2002 2
Feb 2001 4
2002 7
>>> df = df.unstack().reset_index(name='value')
>>> df
level_0 level_1 value
0 Jan 2001 3
1 Jan 2002 2
2 Feb 2001 4
3 Feb 2002 7
>>> df.rename(columns={'level_0': 'month', 'level_1': 'year'}, inplace=True)
>>> df
month year value
0 Jan 2001 3
1 Jan 2002 2
2 Feb 2001 4
3 Feb 2002 7
Method 2
Another solution would be to use pandas.melt to avoid unnecessary creation of a MultiIndex, though this isn’t that expensive if your frame is small and with my solution you still have to create a temporary for the “molten” data. The guts of melt suggest that both id_vars and value are copied since id_vars creation uses tile and value creation uses df.values.ravel('F') which I believe makes a copy if your data are not in Fortran order.
EDIT: I’m not exactly sure when a copy is made when ravel is called since the order parameter only indicates how you want your data read and the docstring says a copy is made only when needed.
In [99]: mons
Out[99]:
['Jan',
'Feb',
'Mar',
'Apr',
'May',
'Jun',
'Jul',
'Aug',
'Sep',
'Oct',
'Nov',
'Dec']
In [100]: df = DataFrame(randn(201, len(mons)), columns=mons, index=map(str, arange(1901, 2102)))
In [101]: df.head()
Out[101]:
Jan Feb Mar Apr May Jun Jul Aug Sep Oct
1901 1.141 -0.270 0.329 0.214 -1.030 0.324 -1.448 2.003 -0.061 0.477
1902 0.136 0.151 0.447 -0.493 1.329 1.410 0.020 -0.705 0.870 0.478
1903 -0.000 0.689 1.768 -0.057 -1.471 0.515 -0.315 0.703 2.511 0.592
1904 1.199 1.246 -0.255 0.182 -0.454 -0.452 1.074 0.178 2.495 -0.543
1905 1.073 1.375 -1.837 1.048 -0.139 -0.273 -0.958 -1.164 -1.012 0.950
Nov Dec
1901 0.102 0.122
1902 2.941 0.654
1903 0.347 -1.636
1904 -0.047 0.457
1905 1.277 -0.284
In [102]: df.reset_index(inplace=True)
In [103]: df.head()
Out[103]:
index Jan Feb Mar Apr May Jun Jul Aug Sep Oct
0 1901 1.141 -0.270 0.329 0.214 -1.030 0.324 -1.448 2.003 -0.061 0.477
1 1902 0.136 0.151 0.447 -0.493 1.329 1.410 0.020 -0.705 0.870 0.478
2 1903 -0.000 0.689 1.768 -0.057 -1.471 0.515 -0.315 0.703 2.511 0.592
3 1904 1.199 1.246 -0.255 0.182 -0.454 -0.452 1.074 0.178 2.495 -0.543
4 1905 1.073 1.375 -1.837 1.048 -0.139 -0.273 -0.958 -1.164 -1.012 0.950
Nov Dec
0 0.102 0.122
1 2.941 0.654
2 0.347 -1.636
3 -0.047 0.457
4 1.277 -0.284
In [104]: res = pd.melt(df, id_vars=['index'], var_name=['months'])
In [105]: res['date'] = res['months'] + ' ' + res['index']
In [106]: res.head()
Out[106]:
index months value date
0 1901 Jan 1.141 Jan 1901
1 1902 Jan 0.136 Jan 1902
2 1903 Jan -0.000 Jan 1903
3 1904 Jan 1.199 Jan 1904
4 1905 Jan 1.073 Jan 1905
Method 3
df.reset_index().melt(...) should unpivot it (credit):
>>> df = pd.DataFrame({
... 'foo': ['one', 'one', 'one', 'two', 'two', 'two'],
... 'bar': ['A', 'B', 'C', 'A', 'B', 'C'],
... 'baz': [1, 2, 3, 4, 5, 6],
... })
>>> df
foo bar baz
0 one A 1
1 one B 2
2 one C 3
3 two A 4
4 two B 5
5 two C 6
>>> df_piv = df.pivot(index='foo', columns='bar', values='baz')
>>> df_piv
bar A B C
foo
one 1 2 3
two 4 5 6
>>> def_piv_unpiv = df_piv.reset_index().melt(id_vars='foo', var_name='bar', value_name='baz')
>>> def_piv_unpiv
foo bar baz
0 one A 1
1 two A 4
2 one B 2
3 two B 5
4 one C 3
5 two C 6
Edit: I later realized this is basically a more concise version of Phillip Cloud’s answer.
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0