How to remove whitespaces and newlines from every value in a JSON file?

I have a JSON file that has the following structure:

{
    "name":[
        {
            "someKey": "nn   some Value   "
        },
        {
            "someKey": "another value    "
        }
    ],
    "anotherName":[
        {
            "anArray": [
                {
                    "key": "    valuenn",
                    "anotherKey": "  value"
                },
                {
                    "key": "    valuen",
                    "anotherKey": "value"
                }
            ]
        }
    ]
}

Now I want to strip off all he whitespaces and newlines for every value in the JSON file. Is there some way to iterate over each element of the dictionary and the nested dictionaries and lists?

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

Now I want to strip off all he whitespaces and newlines for every value in the JSON file

Using pkgutil.simplegeneric() to create a helper function get_items():

import json
import sys
from pkgutil import simplegeneric

@simplegeneric
def get_items(obj):
    while False: # no items, a scalar object
        yield None

@get_items.register(dict)
def _(obj):
    return obj.items() # json object. Edit: iteritems() was removed in Python 3

@get_items.register(list)
def _(obj):
    return enumerate(obj) # json array

def strip_whitespace(json_data):
    for key, value in get_items(json_data):
        if hasattr(value, 'strip'): # json string
            json_data[key] = value.strip()
        else:
            strip_whitespace(value) # recursive call


data = json.load(sys.stdin) # read json data from standard input
strip_whitespace(data)
json.dump(data, sys.stdout, indent=2)

Note: functools.singledispatch() function (Python 3.4+) would allow to use collectionsMutableMapping/MutableSequence instead of dict/list here.

Output

{
  "anotherName": [
    {
      "anArray": [
        {
          "anotherKey": "value", 
          "key": "value"
        }, 
        {
          "anotherKey": "value", 
          "key": "value"
        }
      ]
    }
  ], 
  "name": [
    {
      "someKey": "some Value"
    }, 
    {
      "someKey": "another value"
    }
  ]
}

Method 2

Parse the file using JSON:

import json
file = file.replace('n', '')    # do your cleanup here
data = json.loads(file)

then walk through the resulting data structure.

Method 3

This may not be the most efficient process, but it works. I copied that sample into a file named json.txt, then read it, deserialized it with json.loads(), and used a pair of functions to recursively clean it and everything inside it.

import json

def clean_dict(d):
    for key, value in d.iteritems():
        if isinstance(value, list):
            clean_list(value)
        elif isinstance(value, dict):
            clean_dict(value)
        else:
            newvalue = value.strip()
            d[key] = newvalue

def clean_list(l):
    for index, item in enumerate(l):
        if isinstance(item, dict):
            clean_dict(item)
        elif isinstance(item, list):
            clean_list(item)
        else:
            l[index] = item.strip()

# Read the file and send it to the dict cleaner
with open("json.txt") as f:
    data = json.load(f)

print "before..."
print data, "n"

clean_dict(data)

print "after..."
print data

The result…

before...
{u'anotherName': [{u'anArray': [{u'anotherKey': u'  value', u'key': u'    valuenn'}, {u'anotherKey': u'value', u'key': u'    valuen'}]}], u'name': [{u'someKey': u'nn   some Value   '}, {u'someKey': u'another value    '}]} 

after...
{u'anotherName': [{u'anArray': [{u'anotherKey': u'value', u'key': u'value'}, {u'anotherKey': u'value', u'key': u'value'}]}], u'name': [{u'someKey': u'some Value'}, {u'someKey': u'another value'}]}


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x