How to normalize json correctly by Python Pandas

I am a beginner in Python. What I want to do is load a json file of forex historical price data by Pandas and do statistic with the data. I have go through many topics on Pandas and parsing json file.
I want to pass a json file with extra value and nested list to a pandas data frame. I got a problem stuck here.

I got a json file ‘EUR_JPY_H8.json’

First I import the lib that required,

import pandas as pd
import json
from pandas.io.json import json_normalize

Then load the json file,

with open('EUR_JPY_H8.json') as data_file:    
data = json.load(data_file)

I got a list below:

[{u'complete': True,
u'mid': {u'c': u'119.743',
  u'h': u'119.891',
  u'l': u'119.249',
  u'o': u'119.341'},
u'time': u'1488319200.000000000',
u'volume': 14651},
{u'complete': True,
u'mid': {u'c': u'119.893',
  u'h': u'119.954',
  u'l': u'119.552',
  u'o': u'119.738'},
u'time': u'1488348000.000000000',
u'volume': 10738},
{u'complete': True,
u'mid': {u'c': u'119.946',
  u'h': u'120.221',
  u'l': u'119.840',
  u'o': u'119.888'},
u'time': u'1488376800.000000000',
u'volume': 10041}]

Then I pass the list to json_normalize.
Try to get price which is in the nested list under ‘mid’

result = json_normalize(data,'time',['time','volume','complete',['mid','h'],['mid','l'],['mid','c'],['mid','o']])

But I got such result,
json_normalize output

The ‘time’ data got breakdown into each integer row by row.
I have checked related document. I have to pass a string or list object to the 2nd parameter of json_normalize. How can I pass the timestamp there without breaking down.

My expected output is:

column = 
  index  |  time  | volumn  |  completed  |  mid.h  |  mid.l  |  mid.c  |  mid.o

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

You could just pass data without any extra params.

df = pd.io.json.json_normalize(data)
df

   complete    mid.c    mid.h    mid.l    mid.o                  time  volume
0      True  119.743  119.891  119.249  119.341  1488319200.000000000   14651
1      True  119.893  119.954  119.552  119.738  1488348000.000000000   10738
2      True  119.946  120.221  119.840  119.888  1488376800.000000000   10041

If you want to change the column order, use df.reindex:

df = df.reindex(columns=['time', 'volume', 'complete', 'mid.h', 'mid.l', 'mid.c', 'mid.o'])
df

                   time  volume  complete    mid.h    mid.l    mid.c    mid.o
0  1488319200.000000000   14651      True  119.891  119.249  119.743  119.341
1  1488348000.000000000   10738      True  119.954  119.552  119.893  119.738
2  1488376800.000000000   10041      True  120.221  119.840  119.946  119.888


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x