Contents
hide
This question is specific to using flatten_json from GitHub Repo: flatten
- The package is on pypi flatten-json 0.1.7 and can be installed with
pip install flatten-json - This question is specific to the following component of the package:
def flatten_json(nested_json: dict, exclude: list=[''], sep: str='_') -> dict:
"""
Flatten a list of nested dicts.
"""
out = dict()
def flatten(x: (list, dict, str), name: str='', exclude=exclude):
if type(x) is dict:
for a in x:
if a not in exclude:
flatten(x[a], f'{name}{a}{sep}')
elif type(x) is list:
i = 0
for a in x:
flatten(a, f'{name}{i}{sep}')
i += 1
else:
out[name[:-1]] = x
flatten(nested_json)
return out
Use recursion to flatten nested dicts
How nested can data be?:
flatten_jsonhas been used to unpack a file that ended up being over 100000 columns
Can the flattened JSON, be unflattened?:
- Yes, this question doesn’t cover that. However, if you install the
flattenpackage, there is anunflattenmethod, but I haven’t tested it.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
How to flatten a JSON or dict is a common question, to which there are many answers.
- This answer focuses on using
flatten_jsonto recursively flatten a nesteddictorJSON.
Assumptions:
- This answer assumes you already have the
JSONordictloaded into some variable (e.g. file, api, etc.)- In this case we will use
data
- In this case we will use
How is data loaded into flatten_json:
- It accepts a
dict, as shown by the function type hint.
The most common forms of data:
- Just a dict:
{}flatten_json(data)
- List of dicts:
[{}, {}, {}][flatten_json(x) for x in data]
- JSON with with top level keys, where the values repeat:
{1: {}, 2: {}, 3: {}}[flatten_json(data[key]) for key in data]
- Other
{'key': [{}, {}, {}]}:[flatten_json(x) for x in data['key']]
Practical Examples:
- I typically flatten
datainto apandas.DataFramefor further analysis.- Load
pandaswithimport pandas as pd
- Load
flatten_jsonreturns adict, which can be saved directly using thecsvpackages.
Data 1:
{
"id": 1,
"class": "c1",
"owner": "myself",
"metadata": {
"m1": {
"value": "m1_1",
"timestamp": "d1"
},
"m2": {
"value": "m1_2",
"timestamp": "d2"
},
"m3": {
"value": "m1_3",
"timestamp": "d3"
},
"m4": {
"value": "m1_4",
"timestamp": "d4"
}
},
"a1": {
"a11": [
]
},
"m1": {},
"comm1": "COMM1",
"comm2": "COMM21529089656387",
"share": "xxx",
"share1": "yyy",
"hub1": "h1",
"hub2": "h2",
"context": [
]
}
Flatten 1:
df = pd.DataFrame([flatten_json(data)])
id class owner metadata_m1_value metadata_m1_timestamp metadata_m2_value metadata_m2_timestamp metadata_m3_value metadata_m3_timestamp metadata_m4_value metadata_m4_timestamp comm1 comm2 share share1 hub1 hub2
1 c1 myself m1_1 d1 m1_2 d2 m1_3 d3 m1_4 d4 COMM1 COMM21529089656387 xxx yyy h1 h2
Data 2:
[{
'accuracy': 17,
'activity': [{
'activity': [{
'confidence': 100,
'type': 'STILL'
}
],
'timestampMs': '1542652'
}
],
'altitude': -10,
'latitudeE7': 3777321,
'longitudeE7': -122423125,
'timestampMs': '1542654',
'verticalAccuracy': 2
}, {
'accuracy': 17,
'activity': [{
'activity': [{
'confidence': 100,
'type': 'STILL'
}
],
'timestampMs': '1542652'
}
],
'altitude': -10,
'latitudeE7': 3777321,
'longitudeE7': -122423125,
'timestampMs': '1542654',
'verticalAccuracy': 2
}, {
'accuracy': 17,
'activity': [{
'activity': [{
'confidence': 100,
'type': 'STILL'
}
],
'timestampMs': '1542652'
}
],
'altitude': -10,
'latitudeE7': 3777321,
'longitudeE7': -122423125,
'timestampMs': '1542654',
'verticalAccuracy': 2
}
]
Flatten 2:
df = pd.DataFrame([flatten_json(x) for x in data])
accuracy activity_0_activity_0_confidence activity_0_activity_0_type activity_0_timestampMs altitude latitudeE7 longitudeE7 timestampMs verticalAccuracy
17 100 STILL 1542652 -10 3777321 -122423125 1542654 2
17 100 STILL 1542652 -10 3777321 -122423125 1542654 2
17 100 STILL 1542652 -10 3777321 -122423125 1542654 2
Data 3:
{
"1": {
"VENUE": "JOEBURG",
"COUNTRY": "HAE",
"ITW": "XAD",
"RACES": {
"1": {
"NO": 1,
"TIME": "12:35"
},
"2": {
"NO": 2,
"TIME": "13:10"
},
"3": {
"NO": 3,
"TIME": "13:40"
},
"4": {
"NO": 4,
"TIME": "14:10"
},
"5": {
"NO": 5,
"TIME": "14:55"
},
"6": {
"NO": 6,
"TIME": "15:30"
},
"7": {
"NO": 7,
"TIME": "16:05"
},
"8": {
"NO": 8,
"TIME": "16:40"
}
}
},
"2": {
"VENUE": "FOOBURG",
"COUNTRY": "ABA",
"ITW": "XAD",
"RACES": {
"1": {
"NO": 1,
"TIME": "12:35"
},
"2": {
"NO": 2,
"TIME": "13:10"
},
"3": {
"NO": 3,
"TIME": "13:40"
},
"4": {
"NO": 4,
"TIME": "14:10"
},
"5": {
"NO": 5,
"TIME": "14:55"
},
"6": {
"NO": 6,
"TIME": "15:30"
},
"7": {
"NO": 7,
"TIME": "16:05"
},
"8": {
"NO": 8,
"TIME": "16:40"
}
}
}
}
Flatten 3:
df = pd.DataFrame([flatten_json(data[key]) for key in data])
VENUE COUNTRY ITW RACES_1_NO RACES_1_TIME RACES_2_NO RACES_2_TIME RACES_3_NO RACES_3_TIME RACES_4_NO RACES_4_TIME RACES_5_NO RACES_5_TIME RACES_6_NO RACES_6_TIME RACES_7_NO RACES_7_TIME RACES_8_NO RACES_8_TIME
JOEBURG HAE XAD 1 12:35 2 13:10 3 13:40 4 14:10 5 14:55 6 15:30 7 16:05 8 16:40
FOOBURG ABA XAD 1 12:35 2 13:10 3 13:40 4 14:10 5 14:55 6 15:30 7 16:05 8 16:40
Other Examples:
- Python Pandas – Flatten Nested JSON
- handling nested json in pandas
- How to flatten a nested JSON from the NASA Weather Insight API in Python
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0