I have a dataframe, grouped
, with multiindex columns as below:
import pandas as pd codes = ["one","two","three"]; colours = ["black", "white"]; textures = ["soft", "hard"]; N= 100 # length of the dataframe df = pd.DataFrame({ 'id' : range(1,N+1), 'weeks_elapsed' : [random.choice(range(1,25)) for i in range(1,N+1)], 'code' : [random.choice(codes) for i in range(1,N+1)], 'colour': [random.choice(colours) for i in range(1,N+1)], 'texture': [random.choice(textures) for i in range(1,N+1)], 'size': [random.randint(1,100) for i in range(1,N+1)], 'scaled_size': [random.randint(100,1000) for i in range(1,N+1)] }, columns= ['id', 'weeks_elapsed', 'code','colour', 'texture', 'size', 'scaled_size']) grouped = df.groupby(['code', 'colour']).agg( {'size': [np.sum, np.average, np.size, pd.Series.idxmax],'scaled_size': [np.sum, np.average, np.size, pd.Series.idxmax]}).reset_index() >> grouped code colour size scaled_size sum average size idxmax sum average size idxmax 0 one black 1031 60.647059 17 81 185.153944 10.891408 17 47 1 one white 481 37.000000 13 53 204.139249 15.703019 13 53 2 three black 822 48.352941 17 6 123.269405 7.251141 17 31 3 three white 1614 57.642857 28 50 285.638337 10.201369 28 37 4 two black 523 58.111111 9 85 80.908912 8.989879 9 88 5 two white 669 41.812500 16 78 82.098870 5.131179 16 78 [6 rows x 10 columns]
How can I flatten/merge the column index levels as: “Level1|Level2”, e.g. size|sum
, scaled_size|sum
. etc? If this is not possible, is there a way to groupby()
as I did above without creating multi-index columns?
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
There is potentially a better way, more pythonic way to flatten MultiIndex columns.
1. Use map and join with string column headers:
grouped.columns = grouped.columns.map('|'.join).str.strip('|') print(grouped)
Output:
code colour size|sum size|average size|size size|idxmax 0 one black 862 53.875000 16 14 1 one white 554 46.166667 12 18 2 three black 842 49.529412 17 90 3 three white 740 56.923077 13 97 4 two black 1541 61.640000 25 50 scaled_size|sum scaled_size|average scaled_size|size scaled_size|idxmax 0 6980 436.250000 16 77 1 6101 508.416667 12 13 2 7889 464.058824 17 64 3 6329 486.846154 13 73 4 12809 512.360000 25 23
2. Use map with format for column headers that have numeric data types.
grouped.columns = grouped.columns.map('{0[0]}|{0[1]}'.format)
Output:
code| colour| size|sum size|average size|size size|idxmax 0 one black 734 52.428571 14 30 1 one white 1110 65.294118 17 88 2 three black 930 51.666667 18 3 3 three white 1140 51.818182 22 20 4 two black 656 38.588235 17 77 5 two white 704 58.666667 12 17 scaled_size|sum scaled_size|average scaled_size|size scaled_size|idxmax 0 8229 587.785714 14 57 1 8781 516.529412 17 73 2 10743 596.833333 18 21 3 10240 465.454545 22 26 4 9982 587.176471 17 16 5 6537 544.750000 12 49
3. Use list comprehension with f-string for Python 3.6+:
grouped.columns = [f'{i}|{j}' if j != '' else f'{i}' for i,j in grouped.columns]
Output:
code colour size|sum size|average size|size size|idxmax 0 one black 1003 43.608696 23 76 1 one white 1255 59.761905 21 66 2 three black 777 45.705882 17 39 3 three white 630 52.500000 12 23 4 two black 823 54.866667 15 33 5 two white 491 40.916667 12 64 scaled_size|sum scaled_size|average scaled_size|size scaled_size|idxmax 0 12532 544.869565 23 27 1 13223 629.666667 21 13 2 8615 506.764706 17 92 3 6101 508.416667 12 43 4 7661 510.733333 15 42 5 6143 511.916667 12 49
Method 2
you could always change the columns:
grouped.columns = ['%s%s' % (a, '|%s' % b if b else '') for a, b in grouped.columns]
Method 3
Based on Scott Boston’s answer,
little update(it will be work for 2 or more levels column):
temp.columns.map(lambda x: '|'.join([str(i) for i in x]))
Thank you, Boston!
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0