Parse JSON using Python?

I have a JSON file members.json as below.

{
   "took": 670,
   "timed_out": false,
   "_shards": {
      "total": 8,
      "successful": 8,
      "failed": 0
   },
   "hits": {
      "total": 74,
      "max_score": 1,
      "hits": [
         {
            "_index": "2000_270_0",
            "_type": "Medical",
            "_id": "02:17447847049147026174478:174159",
            "_score": 1,
            "_source": {
               "memberId": "0x7b93910446f91928e23e1043dfdf5bcf",
               "memberFirstName": "Uri",
               "memberMiddleName": "Prayag",
               "memberLastName": "Dubofsky"
            }
         }, 
         {
            "_index": "2000_270_0",
            "_type": "Medical",
            "_id": "02:17447847049147026174478:174159",
            "_score": 1,
            "_source": {
               "memberId": "0x7b93910446f91928e23e1043dfdf5bcG",
               "memberFirstName": "Uri",
               "memberMiddleName": "Prayag",
               "memberLastName": "Dubofsky"
            }
         }
      ]
   }
}

I want to parse it using bash script get only the list of field memberId.

The expected output is:

memberIds
----------- 
0x7b93910446f91928e23e1043dfdf5bcf
0x7b93910446f91928e23e1043dfdf5bcG

I tried adding following bash+python code to .bashrc:

function getJsonVal() {
   if [ ( $# -ne 1 ) -o ( -t 0 ) ]; then
       echo "Usage: getJsonVal 'key' < /tmp/file";
       echo "   -- or -- ";
       echo " cat /tmp/input | getJsonVal 'key'";
       return;
   fi;
   cat | python -c 'import json,sys;obj=json.load(sys.stdin);print obj["'$1'"]';
}

And then called:

$ cat members.json | getJsonVal "memberId"

But it throws:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
KeyError: 'memberId'

Reference

https://stackoverflow.com/a/21595107/432903

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

If you would use:

 $ cat members.json | 
     python -c 'import json,sys;obj=json.load(sys.stdin);print obj;'

you can inspect the structure of the nested dictonary obj and see that your original line should read:

$ cat members.json | 
    python -c 'import json,sys;obj=json.load(sys.stdin);print obj["hits"]["hits"][0]["_source"]["'$1'"]';

to the to that “memberId” element. This way you can keep the Python as a oneliner.

If there are multiple elements in the nested “hits” element, then you can do something like:

$ cat members.json | 
python -c '
import json, sys
obj=json.load(sys.stdin)
for y in [x["_source"]["'$1'"] for x in obj["hits"]["hits"]]:
    print y
'

Chris Down’s solution is better for finding a single value to (unique) keys at any level.

With my second example that prints out multiple values, you are hitting the limits of what you should try with a one liner, at that point I see little reason why to do half of the processing in bash, and would move to a complete Python solution.

Method 2

Another way to do this in bash is using jshon. Here is a solution to your problem using jshon:

$ jshon -e hits -e hits -a -e _source -e memberId -u < foo.json
0x7b93910446f91928e23e1043dfdf5bcf
0x7b93910446f91928e23e1043dfdf5bcG

The -e options extract values from the json. The -a iterates over the array and the -u decodes the final string.

Method 3

Well, your key is quite clearly not at the root of the object. Try something like this:

json_key() {
    python -c '
import json
import sys

data = json.load(sys.stdin)

for key in sys.argv[1:]:
    try:
        data = data[key]
    except TypeError:  # This is a list index
        data = data[int(key)]

print(data)' "<a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="e7c3a7">[email protected]</a>"
}

This has the benefit of not just simply injecting syntax into Python, which could cause breakage (or worse, arbitrary code execution).

You can then call it like this:

json_key hits hits 0 _source memberId < members.json

Method 4

Another alternative is jq:

$ cat members.json | jq -r '.hits|.hits|.[]|._source|.memberId'
0x7b93910446f91928e23e1043dfdf5bcf
0x7b93910446f91928e23e1043dfdf5bcG

Method 5

Try this:

$ cat json.txt | python -c 'import sys; import simplejson as json; 
print "n".join( [i["_source"]["memberId"] for i in json.loads( sys.stdin.read() )["hits"]["hits"]] )'

If you already have pretty printed json, why don’t you just grep it?

$ cat json.txt | grep memberId
               "memberId": "0x7b93910446f91928e23e1043dfdf5bcf",
               "memberId": "0x7b93910446f91928e23e1043dfdf5bcG",

You can always get a pretty printed format with simplejson python to grep it.

# cat json_raw.txt
{"hits": {"hits": [{"_score": 1, "_type": "Medical", "_id": "02:17447847049147026174478:174159", "_source": {"memberLastName": "Dubofsky", "memberMiddleName": "Prayag", "memberId": "0x7b93910446f91928e23e1043dfdf5bcf", "memberFirstName": "Uri"}, "_index": "2000_270_0"}, {"_score": 1, "_type": "Medical", "_id": "02:17447847049147026174478:174159", "_source": {"memberLastName": "Dubofsky", "memberMiddleName": "Prayag", "memberId": "0x7b93910446f91928e23e1043dfdf5bcG", "memberFirstName": "Uri"}, "_index": "2000_270_0"}], "total": 74, "max_score": 1}, "_shards": {"successful": 8, "failed": 0, "total": 8}, "took": 670, "timed_out": false}

Use dumps:

# cat json_raw.txt | python -c 'import sys; import simplejson as json; 
print json.dumps( json.loads( sys.stdin.read() ), sort_keys=True, indent=4); '

{
    "_shards": {
        "failed": 0,
        "successful": 8,
        "total": 8
    },
    "hits": {
        "hits": [
            {
                "_id": "02:17447847049147026174478:174159",
                "_index": "2000_270_0",
                "_score": 1,
                "_source": {
                    "memberFirstName": "Uri",
                    "memberId": "0x7b93910446f91928e23e1043dfdf5bcf",
                    "memberLastName": "Dubofsky",
                    "memberMiddleName": "Prayag"
                },
                "_type": "Medical"
            },
            {
                "_id": "02:17447847049147026174478:174159",
                "_index": "2000_270_0",
                "_score": 1,
                "_source": {
                    "memberFirstName": "Uri",
                    "memberId": "0x7b93910446f91928e23e1043dfdf5bcG",
                    "memberLastName": "Dubofsky",
                    "memberMiddleName": "Prayag"
                },
                "_type": "Medical"
            }
        ],
        "max_score": 1,
        "total": 74
    },
    "timed_out": false,
    "took": 670
}

Thereafter, simply grep result with ‘memberId’ pattern.

To be completely precise:

#!/bin/bash

filename="$1"
cat $filename | python -c 'import sys; import simplejson as json; 
print json.dumps( json.loads( sys.stdin.read() ), sort_keys=True, indent=4)' | 
grep memberId | awk '{print $2}' | sed -e 's/^"//g' | sed -e 's/",$//g'

Usage:

$ bash bash.sh json_raw.txt 
0x7b93910446f91928e23e1043dfdf5bcf
0x7b93910446f91928e23e1043dfdf5bcG

Method 6

Following this thread I’d use json.tool in python:

python -m json.tool members.json | awk -F'"' '/memberId/{print $4}'

Method 7

Using deepdiff you do not need to know the exact keys:

import json
from deepdiff import DeepSearch
DeepSearch(json.load(open("members.json", "r")), 'memberId', verbose_level=2)['matched_paths'].values()

Method 8

Here’s a bash solution.

  1. create file find_members.sh
  2. add the following line to file + save
    #!/bin/bash
    
    echo -e "nmemberIdsn---------"
    cat members.json | grep -E 'memberId'|awk '{print$2}' | cut -d '"' -f2
  3. chmod +x find_members.sh

Now run it:

$ ./find_members.sh

memberIds
----------------
0x7b93910446f91928e23e1043dfdf5bcf
0x7b93910446f91928e23e1043dfdf5bcG


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x