I am retrieving Twitter data with a Python tool and dump these in JSON format to my disk. I noticed an unintended escaping of the entire data-string for a tweet being enclosed in double quotes. Furthermore, all double quotes of the actual JSON formatting are escaped with a backslash.
They look like this:
“{”created_at”:”Fri Aug 08 11:04:40 +0000
2014”,”id”:497699913925292032,
How do I avoid that? It should be:
{“created_at”:”Fri Aug 08 11:04:40 +0000 2014″ …..
My file-out code looks like this:
with io.open('data'+self.timestamp+'.txt', 'a', encoding='utf-8') as f:
f.write(unicode(json.dumps(data, ensure_ascii=False)))
f.write(unicode('n'))
The unintended escaping causes problems when reading in the JSON file in a later processing step.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
You are double encoding your JSON strings. data is already a JSON string, and doesn’t need to be encoded again:
>>> import json
>>> not_encoded = {"created_at":"Fri Aug 08 11:04:40 +0000 2014"}
>>> encoded_data = json.dumps(not_encoded)
>>> print encoded_data
{"created_at": "Fri Aug 08 11:04:40 +0000 2014"}
>>> double_encode = json.dumps(encoded_data)
>>> print double_encode
"{"created_at": "Fri Aug 08 11:04:40 +0000 2014"}"
Just write these directly to your file:
with open('data{}.txt'.format(self.timestamp), 'a') as f:
f.write(data + 'n')
Method 2
Another situation where this unwanted escaping can happen is if you try to use json.dump() on the pre-processed output of json.dumps(). For example
import json, sys
json.dump({"foo": json.dumps([{"bar": 1}, {"baz": 2}])},sys.stdout)
will result in
{"foo": "[{"bar": 1}, {"baz": 2}]"}
To avoid this, you need to pass dictionaries rather than the output of json.dumps(), e.g.
json.dump({"foo": [{"bar": 1}, {"baz": 2}]},sys.stdout)
which outputs the desired
{"foo": [{"bar": 1}, {"baz": 2}]}
(Why would you pre-process the inner list with json.dumps(), you ask? Well, I had another function that was creating that inner list out of other stuff, and I thought it would make sense to return a json object from that function… Wrong.)
Method 3
Extending for others having similar issue, I used this to dump the JSON formatted data to file where the data came from an API call. Just an indicative example below, update as per your requirement
import json
# below is an example, this came for me from an API call
json_string = '{"address":{"city":"NY", "country":"USA"}}'
# dump the JSON data into file ( dont use json.dump as explained in other answers )
with open('direct_json.json','w') as direct_json:
direct_json.write(json_string)
direct_json.write("n")
# load as dict
json_dict = json.loads(json_string)
# pretty print
print(json.dumps(json_dict, indent = 1))
# write pretty JSON to file
with open('formatted.json','w') as formatted_file:
json.dump(json_dict, formatted_file, indent=4)
Method 4
Simple way to get around that, which worked for me is to use the json loads function before dumping, like the following :
import json
data = json.loads('{"foo": json.dumps([{"bar": 1}, {"baz": 2}])}')
with open('output.json','w') as f:
json.dump(data,f,indent=4)
Method 5
Set escape_forward_slashes=False to prevent escaping / characters
Solved:
ujson.dumps({"a":"aa//a/dfdf"}, escape_forward_slashes=False )
'{"a":"aa//a/dfdf"}'
Default:
ujson.dumps({"a":"aa//a/dfdf"}, escape_forward_slashes=True )
'{"a":"aa\/\/a\/dfdf"}'
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0