Pandas dataframe with multiindex column – merge levels

I have a dataframe, grouped, with multiindex columns as below:

import pandas as pd
codes = ["one","two","three"];
colours = ["black", "white"];
textures = ["soft", "hard"];
N= 100 # length of the dataframe
df = pd.DataFrame({ 'id' : range(1,N+1),
                    'weeks_elapsed' : [random.choice(range(1,25)) for i in range(1,N+1)],
                    'code' : [random.choice(codes) for i in range(1,N+1)],
                    'colour': [random.choice(colours) for i in range(1,N+1)],
                    'texture': [random.choice(textures) for i in range(1,N+1)],
                    'size': [random.randint(1,100) for i in range(1,N+1)],
                    'scaled_size': [random.randint(100,1000) for i in range(1,N+1)]
                   },  columns= ['id', 'weeks_elapsed', 'code','colour', 'texture', 'size', 'scaled_size'])
grouped = df.groupby(['code', 'colour']).agg( {'size': [np.sum, np.average, np.size, pd.Series.idxmax],'scaled_size': [np.sum, np.average, np.size, pd.Series.idxmax]}).reset_index()

>> grouped
    code colour     size                           scaled_size                         
                    sum    average  size  idxmax            sum    average  size  idxmax
0    one  black    1031  60.647059    17      81     185.153944  10.891408    17      47
1    one  white     481  37.000000    13      53     204.139249  15.703019    13      53
2  three  black     822  48.352941    17       6     123.269405   7.251141    17      31
3  three  white    1614  57.642857    28      50     285.638337  10.201369    28      37
4    two  black     523  58.111111     9      85      80.908912   8.989879     9      88
5    two  white     669  41.812500    16      78      82.098870   5.131179    16      78
[6 rows x 10 columns]

How can I flatten/merge the column index levels as: “Level1|Level2”, e.g. size|sum, scaled_size|sum. etc? If this is not possible, is there a way to groupby() as I did above without creating multi-index columns?

Contents hide

Answers:

Method 1

There is potentially a better way, more pythonic way to flatten MultiIndex columns.

1. Use map and join with string column headers:

2. Use map with format for column headers that have numeric data types.

3. Use list comprehension with f-string for Python 3.6+:

Method 2

Method 3

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

There is potentially a better way, more pythonic way to flatten MultiIndex columns.

1. Use map and join with string column headers:

grouped.columns = grouped.columns.map('|'.join).str.strip('|')

print(grouped)

Output:

   code  colour   size|sum  size|average  size|size  size|idxmax  
0    one   black       862     53.875000         16           14   
1    one   white       554     46.166667         12           18   
2  three   black       842     49.529412         17           90   
3  three   white       740     56.923077         13           97   
4    two   black      1541     61.640000         25           50   

   scaled_size|sum  scaled_size|average  scaled_size|size  scaled_size|idxmax  
0             6980           436.250000                16                  77  
1             6101           508.416667                12                  13  
2             7889           464.058824                17                  64  
3             6329           486.846154                13                  73  
4            12809           512.360000                25                  23

2. Use map with format for column headers that have numeric data types.

grouped.columns = grouped.columns.map('{0[0]}|{0[1]}'.format)

Output:

   code| colour|  size|sum  size|average  size|size  size|idxmax  
0    one   black       734     52.428571         14           30   
1    one   white      1110     65.294118         17           88   
2  three   black       930     51.666667         18            3   
3  three   white      1140     51.818182         22           20   
4    two   black       656     38.588235         17           77   
5    two   white       704     58.666667         12           17   

   scaled_size|sum  scaled_size|average  scaled_size|size  scaled_size|idxmax  
0             8229           587.785714                14                  57  
1             8781           516.529412                17                  73  
2            10743           596.833333                18                  21  
3            10240           465.454545                22                  26  
4             9982           587.176471                17                  16  
5             6537           544.750000                12                  49

3. Use list comprehension with f-string for Python 3.6+:

grouped.columns = [f'{i}|{j}' if j != '' else f'{i}' for i,j in grouped.columns]

Output:

    code colour  size|sum  size|average  size|size  size|idxmax  
0    one  black      1003     43.608696         23           76   
1    one  white      1255     59.761905         21           66   
2  three  black       777     45.705882         17           39   
3  three  white       630     52.500000         12           23   
4    two  black       823     54.866667         15           33   
5    two  white       491     40.916667         12           64   

   scaled_size|sum  scaled_size|average  scaled_size|size  scaled_size|idxmax  
0            12532           544.869565                23                  27  
1            13223           629.666667                21                  13  
2             8615           506.764706                17                  92  
3             6101           508.416667                12                  43  
4             7661           510.733333                15                  42  
5             6143           511.916667                12                  49

Method 2

you could always change the columns:

grouped.columns = ['%s%s' % (a, '|%s' % b if b else '') for a, b in grouped.columns]

Method 3

Based on Scott Boston’s answer,
little update(it will be work for 2 or more levels column):

temp.columns.map(lambda x: '|'.join([str(i) for i in x]))

Thank you, Boston!

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes

Article Rating