I needed to parse files generated by other tool, which unconditionally outputs json file with UTF-8 BOM header (EFBBBF). I soon found that this was the problem, as Python 2.7 module can’t seem to parse it:
>>> import json
>>> data = json.load(open('sample.json'))
ValueError: No JSON object could be decoded
Removing BOM, solves it, but I wonder if there is another way of parsing json file with BOM header?
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
You can open with codecs:
import json
import codecs
json.load(codecs.open('sample.json', 'r', 'utf-8-sig'))
or decode with utf-8-sig yourself and pass to loads:
json.loads(open('sample.json').read().decode('utf-8-sig'))
Method 2
Simple! You don’t even need to import codecs.
with open('sample.json', encoding='utf-8-sig') as f:
data = json.load(f)
Method 3
Since json.load(stream) uses json.loads(stream.read()) under the hood, it won’t be that bad to write a small hepler function that lstrips the BOM:
from codecs import BOM_UTF8
def lstrip_bom(str_, bom=BOM_UTF8):
if str_.startswith(bom):
return str_[len(bom):]
else:
return str_
json.loads(lstrip_bom(open('sample.json').read()))
In other situations where you need to wrap a stream and fix it somehow you may look at inheriting from codecs.StreamReader.
Method 4
you can also do it with keyword with
import codecs
with codecs.open('samples.json', 'r', 'utf-8-sig') as json_file:
data = json.load(json_file)
or better:
import io
with io.open('samples.json', 'r', encoding='utf-8-sig') as json_file:
data = json.load(json_file)
Method 5
If this is a one-off, a very simple super high-tech solution that worked for me…
- Open the JSON file in your favorite text editor.
- Select-all
- Create a new file
- Paste
- Save.
BOOM, BOM header gone!
Method 6
I removed the BOM manually with Linux command.
First I check if there are efbb bf bytes for the file, with head i_have_BOM | xxd.
Then I run dd bs=1 skip=3 if=i_have_BOM.json of=I_dont_have_BOM.json.
bs=1 process 1 byte each time, skip=3, skip the first 3 bytes.
Method 7
I’m using utf-8-sig just with import json
with open('estados.json', encoding='utf-8-sig') as json_file:
data = json.load(json_file)
print(data)
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0