Nested dictionary to multiindex dataframe where dictionary keys are column labels

Say I have a dictionary that looks like this:

dictionary = {'A' : {'a': [1,2,3,4,5],
                     'b': [6,7,8,9,1]},

              'B' : {'a': [2,3,4,5,6],
                     'b': [7,8,9,1,2]}}

and I want a dataframe that looks something like this:

     A   B
     a b a b
  0  1 6 2 7
  1  2 7 3 8
  2  3 8 4 9
  3  4 9 5 1
  4  5 1 6 2

Is there a convenient way to do this? If I try:

In [99]:

DataFrame(dictionary)

Out[99]:
     A               B
a   [1, 2, 3, 4, 5] [2, 3, 4, 5, 6]
b   [6, 7, 8, 9, 1] [7, 8, 9, 1, 2]

I get a dataframe where each element is a list. What I need is a multiindex where each level corresponds to the keys in the nested dict and the rows corresponding to each element in the list as shown above. I think I can work a very crude solution but I’m hoping there might be something a bit simpler.

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

Pandas wants the MultiIndex values as tuples, not nested dicts. The simplest thing is to convert your dictionary to the right format before trying to pass it to DataFrame:

>>> reform = {(outerKey, innerKey): values for outerKey, innerDict in dictionary.iteritems() for innerKey, values in innerDict.iteritems()}
>>> reform
{('A', 'a'): [1, 2, 3, 4, 5],
 ('A', 'b'): [6, 7, 8, 9, 1],
 ('B', 'a'): [2, 3, 4, 5, 6],
 ('B', 'b'): [7, 8, 9, 1, 2]}
>>> pandas.DataFrame(reform)
   A     B   
   a  b  a  b
0  1  6  2  7
1  2  7  3  8
2  3  8  4  9
3  4  9  5  1
4  5  1  6  2

[5 rows x 4 columns]

Method 2

This answer is a little late to the game, but…

You’re looking for the functionality in .stack:

df = pandas.DataFrame.from_dict(dictionary, orient="index").stack().to_frame()
# to break out the lists into columns
df = pd.DataFrame(df[0].values.tolist(), index=df.index)

Method 3

dict_of_df = {k: pd.DataFrame(v) for k,v in dictionary.items()}
df = pd.concat(dict_of_df, axis=1)

Note that the order of columns is lost for python < 3.6

Method 4

If lists in the dictionary are not of the same lenght, you can adapte the method of BrenBarn.

>>> dictionary = {'A' : {'a': [1,2,3,4,5],
                         'b': [6,7,8,9,1]},
                 'B' : {'a': [2,3,4,5,6],
                        'b': [7,8,9,1]}}

>>> reform = {(outerKey, innerKey): values for outerKey, innerDict in dictionary.items() for innerKey, values in innerDict.items()}
>>> reform
 {('A', 'a'): [1, 2, 3, 4, 5],
  ('A', 'b'): [6, 7, 8, 9, 1],
  ('B', 'a'): [2, 3, 4, 5, 6],
  ('B', 'b'): [7, 8, 9, 1]}

>>> pandas.DataFrame.from_dict(reform, orient='index').transpose()
>>> df.columns = pd.MultiIndex.from_tuples(df.columns)
   A     B   
   a  b  a  b
0  1  6  2  7
1  2  7  3  8
2  3  8  4  9
3  4  9  5  1
4  5  1  6  NaN
[5 rows x 4 columns]

Method 5

This recursive function should work:

def reform_dict(dictionary, t=tuple(), reform={}):
    for key, val in dictionary.items():
        t = t + (key,)
        if isinstance(val, dict):
            reform_dict(val, t, reform)
        else:
            reform.update({t: val})
        t = t[:-1]
    return reform

Method 6

This solution works for a larger dataframe, it fits what was requested

cols = df.columns
int_cols = len(cols)
col_subset_1 = [cols[x] for x in range(1,int(int_cols/2)+1)]
col_subset_2 = [cols[x] for x in range(int(int_cols/2)+1, int_cols)]

col_subset_1_label = list(zip(['A']*len(col_subset_1), col_subset_1))
col_subset_2_label = list(zip(['B']*len(col_subset_2), col_subset_2))
df.columns = pd.MultiIndex.from_tuples([('','myIndex'),*col_subset_1_label,*col_subset_2_label])

OUTPUT

                        A                      B
     myIndex    a              b          c          d
0   0.159710    1.472925    0.619508    -0.476738   0.866238
1   -0.665062   0.609273    -0.089719   0.730012    0.751615
2   0.215350    -0.403239   1.801829    -2.052797   -1.026114
3   -0.609692   1.163072    -1.007984   -0.324902   -1.624007
4   0.791321    -0.060026   -1.328531   -0.498092   0.559837
5   0.247412    -0.841714   0.354314    0.506985    0.425254
6   0.443535    1.037502    -0.433115   0.601754    -1.405284
7   -0.433744   1.514892    1.963495    -2.353169   1.285580


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x