I am a beginner in Python. What I want to do is load a json file of forex historical price data by Pandas and do statistic with the data. I have go through many topics on Pandas and parsing json file.
I want to pass a json file with extra value and nested list to a pandas data frame. I got a problem stuck here.
I got a json file ‘EUR_JPY_H8.json’
First I import the lib that required,
import pandas as pd import json from pandas.io.json import json_normalize
Then load the json file,
with open('EUR_JPY_H8.json') as data_file:
data = json.load(data_file)
I got a list below:
[{u'complete': True,
u'mid': {u'c': u'119.743',
u'h': u'119.891',
u'l': u'119.249',
u'o': u'119.341'},
u'time': u'1488319200.000000000',
u'volume': 14651},
{u'complete': True,
u'mid': {u'c': u'119.893',
u'h': u'119.954',
u'l': u'119.552',
u'o': u'119.738'},
u'time': u'1488348000.000000000',
u'volume': 10738},
{u'complete': True,
u'mid': {u'c': u'119.946',
u'h': u'120.221',
u'l': u'119.840',
u'o': u'119.888'},
u'time': u'1488376800.000000000',
u'volume': 10041}]
Then I pass the list to json_normalize.
Try to get price which is in the nested list under ‘mid’
result = json_normalize(data,'time',['time','volume','complete',['mid','h'],['mid','l'],['mid','c'],['mid','o']])
But I got such result,
json_normalize output
The ‘time’ data got breakdown into each integer row by row.
I have checked related document. I have to pass a string or list object to the 2nd parameter of json_normalize. How can I pass the timestamp there without breaking down.
My expected output is:
column = index | time | volumn | completed | mid.h | mid.l | mid.c | mid.o
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
You could just pass data without any extra params.
df = pd.io.json.json_normalize(data) df complete mid.c mid.h mid.l mid.o time volume 0 True 119.743 119.891 119.249 119.341 1488319200.000000000 14651 1 True 119.893 119.954 119.552 119.738 1488348000.000000000 10738 2 True 119.946 120.221 119.840 119.888 1488376800.000000000 10041
If you want to change the column order, use df.reindex:
df = df.reindex(columns=['time', 'volume', 'complete', 'mid.h', 'mid.l', 'mid.c', 'mid.o'])
df
time volume complete mid.h mid.l mid.c mid.o
0 1488319200.000000000 14651 True 119.891 119.249 119.743 119.341
1 1488348000.000000000 10738 True 119.954 119.552 119.893 119.738
2 1488376800.000000000 10041 True 120.221 119.840 119.946 119.888
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0