Let’s assume that I have the following dataframe in pandas:
AA BB CC date 05/03 1 2 3 06/03 4 5 6 07/03 7 8 9 08/03 5 7 1
and I want to transform it to the following:
AA 05/03 1 AA 06/03 4 AA 07/03 7 AA 08/03 5 BB 05/03 2 BB 06/03 5 BB 07/03 8 BB 08/03 7 CC 05/03 3 CC 06/03 6 CC 07/03 9 CC 08/03 1
How can I do it?
The reason of the transformation from wide to long is that, in the next stage, I would like to merge this dataframe with another one, based on dates and the initial column names (AA, BB, CC).
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
Use pandas.melt to transform from wide to long:
df = pd.DataFrame({
'date' : ['05/03', '06/03', '07/03', '08/03'],
'AA' : [1, 4, 7, 5],
'BB' : [2, 5, 8, 7],
'CC' : [3, 6, 9, 1]
}).set_index('date')
df
AA BB CC
date
05/03 1 2 3
06/03 4 5 6
07/03 7 8 9
08/03 5 7 1
To convert, we just need to reset the index and then melt:
df = df.reset_index() pd.melt(df, id_vars='date', value_vars=['AA', 'BB', 'CC'])
this is the final result:
date variable value 0 05/03 AA 1 1 06/03 AA 4 2 07/03 AA 7 3 08/03 AA 5 4 05/03 BB 2 5 06/03 BB 5 6 07/03 BB 8 7 08/03 BB 7 8 05/03 CC 3 9 06/03 CC 6 10 07/03 CC 9 11 08/03 CC 1
Method 2
Update
As George Liu has shown in another answer, pd.melt is the idiomatic, flexible and fast solution to this problem. Do not use unstack for this.
unstack returns a series with a multiindex:
In [38]: df.unstack()
Out[38]:
date
AA 05/03 1
06/03 4
07/03 7
08/03 5
BB 05/03 2
06/03 5
07/03 8
08/03 7
CC 05/03 3
06/03 6
07/03 9
08/03 1
dtype: int64
You can call reset_index on the returning series:
In [39]: df.unstack().reset_index()
Out[39]:
level_0 date 0
0 AA 05-03 1
1 AA 06-03 4
2 AA 07-03 7
3 AA 08-03 5
4 BB 05-03 2
5 BB 06-03 5
6 BB 07-03 8
7 BB 08-03 7
8 CC 05-03 3
9 CC 06-03 6
10 CC 07-03 9
11 CC 08-03 1
Or construct a dataframe with a multiindex:
In [40]: pd.DataFrame(df.unstack())
Out[40]:
0
date
AA 05-03 1
06-03 4
07-03 7
08-03 5
BB 05-03 2
06-03 5
07-03 8
08-03 7
CC 05-03 3
06-03 6
07-03 9
08-03 1
Method 3
For my dummy test dfs (42 cols, 1k/100k/1M rows) .melt was 8 times faster than .unstack.reset_index()
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0