I am using to_csv to write a Multiindex DataFrame to csv files. The csv file has one column that contains the multiindexes in tuples, like:
('a', 'x')
('a', 'y')
('a', 'z')
('b', 'x')
('b', 'y')
('b', 'z')
However, I want to be able to output the Multiindex to two columns instead of one column of tuples, such as:
a, x , y , z b, x , y , z
It looks like tupleize_cols can achieve this for columns, but there is no such option for the rows. Is there a way to achieve this?
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
I think this will do it
In [3]: df = DataFrame(dict(A = 'foo', B = 'bar', value = 1),index=range(5)).set_index(['A','B'])
In [4]: df
Out[4]:
value
A B
foo bar 1
bar 1
bar 1
bar 1
bar 1
In [5]: df.to_csv('test.csv')
In [6]: !cat test.csv
A,B,value
foo,bar,1
foo,bar,1
foo,bar,1
foo,bar,1
foo,bar,1
In [7]: pd.read_csv('test.csv',index_col=[0,1])
Out[7]:
value
A B
foo bar 1
bar 1
bar 1
bar 1
bar 1
To write with the index duplication (kind of a hack though)
In [27]: x = df.reset_index()
In [28]: mask = df.index.to_series().duplicated()
In [29]: mask
Out[29]:
A B
foo bar False
bar True
bar True
bar True
bar True
dtype: bool
In [30]: x.loc[mask.values,['A','B']] = ''
In [31]: x
Out[31]:
A B value
0 foo bar 1
1 1
2 1
3 1
4 1
In [32]: x.to_csv('test.csv')
In [33]: !cat test.csv
,A,B,value
0,foo,bar,1
1,,,1
2,,,1
3,,,1
4,,,1
Read back is a bit tricky actually
In [37]: pd.read_csv('test.csv',index_col=0).ffill().set_index(['A','B'])
Out[37]:
value
A B
foo bar 1
bar 1
bar 1
bar 1
bar 1
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0