I’m Looking for a generic way of turning a DataFrame to a nested dictionary
This is a sample data frame
name v1 v2 v3 0 A A1 A11 1 1 A A2 A12 2 2 B B1 B12 3 3 C C1 C11 4 4 B B2 B21 5 5 A A2 A21 6
The number of columns may differ and so does the column names.
like this :
{
'A' : {
'A1' : { 'A11' : 1 }
'A2' : { 'A12' : 2 , 'A21' : 6 }} ,
'B' : {
'B1' : { 'B12' : 3 } } ,
'C' : {
'C1' : { 'C11' : 4}}
}
What is best way to achieve this ?
closest I got was with the zip function but haven’t managed to make it work for more then one level (two columns).
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
I don’t understand why there isn’t a B2 in your dict. I’m also not sure what you want to happen in the case of repeated column values (every one except the last, I mean.) Assuming the first is an oversight, we could use recursion:
def recur_dictify(frame):
if len(frame.columns) == 1:
if frame.values.size == 1: return frame.values[0][0]
return frame.values.squeeze()
grouped = frame.groupby(frame.columns[0])
d = {k: recur_dictify(g.ix[:,1:]) for k,g in grouped}
return d
which produces
>>> df
name v1 v2 v3
0 A A1 A11 1
1 A A2 A12 2
2 B B1 B12 3
3 C C1 C11 4
4 B B2 B21 5
5 A A2 A21 6
>>> pprint.pprint(recur_dictify(df))
{'A': {'A1': {'A11': 1}, 'A2': {'A12': 2, 'A21': 6}},
'B': {'B1': {'B12': 3}, 'B2': {'B21': 5}},
'C': {'C1': {'C11': 4}}}
It might be simpler to use a non-pandas approach, though:
def retro_dictify(frame):
d = {}
for row in frame.values:
here = d
for elem in row[:-2]:
if elem not in here:
here[elem] = {}
here = here[elem]
here<div class="su-row"></div>] = row[-1]
return d
Method 2
You can reconstruct your dictionary as easy as follows
result = {}
for lst in df.values:
leaf = result
for path in lst[:-2]:
leaf = leaf.setdefault(path, {})
leaf.setdefault(lst[-2], list()).append(lst[-1])
>>> result
{'A': {'A1': {'A11': [1]}, 'A2': {'A21': [6], 'A12': [2]}}, 'C': {'C1': {'C11': [4]}}, 'B': {'B1': {'B12': [3]}, 'B2': {'B21': [5]}}}
If you’re sure your leafs won’t overlap, replace last line
leaf.setdefault(lst[-2], list()).append(lst[-1])
with
leaf[lst[-2]] = lst[-1]
to get output you desired:
>>> result
{'A': {'A1': {'A11': 1}, 'A2': {'A21': 6, 'A12': 2}}, 'C': {'C1': {'C11': 4}}, 'B': {'B1': {'B12': 3}, 'B2': {'B21': 5}}}
Sample data used for tests:
import pandas as pd
data = {'name': ['A','A','B','C','B','A'],
'v1': ['A1','A2','B1','C1','B2','A2'],
'v2': ['A11','A12','B12','C11','B21','A21'],
'v3': [1,2,3,4,5,6]}
df = pd.DataFrame.from_dict(data)
Method 3
see here as their are some options that you can pass to get the output in several different forms.
In [5]: df
Out[5]:
name v1 v2 v3
0 A A1 A11 1
1 A A2 A12 2
2 B B1 B12 3
3 C C1 C11 4
4 B B2 B21 5
5 A A2 A21 6
In [6]: df.to_dict()
Out[6]:
{'name': {0: 'A', 1: 'A', 2: 'B', 3: 'C', 4: 'B', 5: 'A'},
'v1': {0: 'A1', 1: 'A2', 2: 'B1', 3: 'C1', 4: 'B2', 5: 'A2'},
'v2': {0: 'A11', 1: 'A12', 2: 'B12', 3: 'C11', 4: 'B21', 5: 'A21'},
'v3': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6}}
Here is a way to create a json format, then literally eval it to create an actual dict
In [11]: import ast In [15]: ast.literal_eval(df.to_json(orient='values')) Out[15]: [['A', 'A1', 'A11', 1], ['A', 'A2', 'A12', 2], ['B', 'B1', 'B12', 3], ['C', 'C1', 'C11', 4], ['B', 'B2', 'B21', 5], ['A', 'A2', 'A21', 6]]
Method 4
data.groupby(by='name', sort=False).apply(lambda x: x.to_dict(orient='records'))
Should help and is the simplest way.
Method 5
Here is another solution using defaultdict
df = pd.DataFrame({'name': {0: 'A', 1: 'A', 2: 'B', 3: 'C', 4: 'B', 5: 'A'},
'v1': {0: 'A1', 1: 'A2', 2: 'B1', 3: 'C1', 4: 'B2', 5: 'A2'},
'v2': {0: 'A11', 1: 'A12', 2: 'B12', 3: 'C11', 4: 'B21', 5: 'A21'},
'v3': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6}})
output = defaultdict(dict)
for lst in df.values:
try:
output[lst[0]][lst[1]].update({lst[2]:lst[3]})
except KeyError:
output[lst[0]][lst[1]] = {}
finally:
output[lst[0]][lst[1]].update({lst[2]:lst[3]})
output
or:
output = defaultdict(dict)
for row in df.values:
item1,item2 = row[0:2]
if output.get(item1, {}).get(item2) == None:
output[item1][item2] = {}
output[item1][item2].update({row[2]:row[3]})
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0