I have a JSON file members.json as below.
{
"took": 670,
"timed_out": false,
"_shards": {
"total": 8,
"successful": 8,
"failed": 0
},
"hits": {
"total": 74,
"max_score": 1,
"hits": [
{
"_index": "2000_270_0",
"_type": "Medical",
"_id": "02:17447847049147026174478:174159",
"_score": 1,
"_source": {
"memberId": "0x7b93910446f91928e23e1043dfdf5bcf",
"memberFirstName": "Uri",
"memberMiddleName": "Prayag",
"memberLastName": "Dubofsky"
}
},
{
"_index": "2000_270_0",
"_type": "Medical",
"_id": "02:17447847049147026174478:174159",
"_score": 1,
"_source": {
"memberId": "0x7b93910446f91928e23e1043dfdf5bcG",
"memberFirstName": "Uri",
"memberMiddleName": "Prayag",
"memberLastName": "Dubofsky"
}
}
]
}
}
I want to parse it using bash script get only the list of field memberId.
The expected output is:
memberIds ----------- 0x7b93910446f91928e23e1043dfdf5bcf 0x7b93910446f91928e23e1043dfdf5bcG
I tried adding following bash+python code to .bashrc:
function getJsonVal() {
if [ ( $# -ne 1 ) -o ( -t 0 ) ]; then
echo "Usage: getJsonVal 'key' < /tmp/file";
echo " -- or -- ";
echo " cat /tmp/input | getJsonVal 'key'";
return;
fi;
cat | python -c 'import json,sys;obj=json.load(sys.stdin);print obj["'$1'"]';
}
And then called:
$ cat members.json | getJsonVal "memberId"
But it throws:
Traceback (most recent call last): File "<string>", line 1, in <module> KeyError: 'memberId'
Reference
https://stackoverflow.com/a/21595107/432903
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
If you would use:
$ cat members.json |
python -c 'import json,sys;obj=json.load(sys.stdin);print obj;'
you can inspect the structure of the nested dictonary obj and see that your original line should read:
$ cat members.json |
python -c 'import json,sys;obj=json.load(sys.stdin);print obj["hits"]["hits"][0]["_source"]["'$1'"]';
to the to that “memberId” element. This way you can keep the Python as a oneliner.
If there are multiple elements in the nested “hits” element, then you can do something like:
$ cat members.json |
python -c '
import json, sys
obj=json.load(sys.stdin)
for y in [x["_source"]["'$1'"] for x in obj["hits"]["hits"]]:
print y
'
Chris Down’s solution is better for finding a single value to (unique) keys at any level.
With my second example that prints out multiple values, you are hitting the limits of what you should try with a one liner, at that point I see little reason why to do half of the processing in bash, and would move to a complete Python solution.
Method 2
Another way to do this in bash is using jshon. Here is a solution to your problem using jshon:
$ jshon -e hits -e hits -a -e _source -e memberId -u < foo.json 0x7b93910446f91928e23e1043dfdf5bcf 0x7b93910446f91928e23e1043dfdf5bcG
The -e options extract values from the json. The -a iterates over the array and the -u decodes the final string.
Method 3
Well, your key is quite clearly not at the root of the object. Try something like this:
json_key() {
python -c '
import json
import sys
data = json.load(sys.stdin)
for key in sys.argv[1:]:
try:
data = data[key]
except TypeError: # This is a list index
data = data[int(key)]
print(data)' "<a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="e7c3a7">[email protected]</a>"
}
This has the benefit of not just simply injecting syntax into Python, which could cause breakage (or worse, arbitrary code execution).
You can then call it like this:
json_key hits hits 0 _source memberId < members.json
Method 4
Another alternative is jq:
$ cat members.json | jq -r '.hits|.hits|.[]|._source|.memberId' 0x7b93910446f91928e23e1043dfdf5bcf 0x7b93910446f91928e23e1043dfdf5bcG
Method 5
Try this:
$ cat json.txt | python -c 'import sys; import simplejson as json; print "n".join( [i["_source"]["memberId"] for i in json.loads( sys.stdin.read() )["hits"]["hits"]] )'
If you already have pretty printed json, why don’t you just grep it?
$ cat json.txt | grep memberId
"memberId": "0x7b93910446f91928e23e1043dfdf5bcf",
"memberId": "0x7b93910446f91928e23e1043dfdf5bcG",
You can always get a pretty printed format with simplejson python to grep it.
# cat json_raw.txt
{"hits": {"hits": [{"_score": 1, "_type": "Medical", "_id": "02:17447847049147026174478:174159", "_source": {"memberLastName": "Dubofsky", "memberMiddleName": "Prayag", "memberId": "0x7b93910446f91928e23e1043dfdf5bcf", "memberFirstName": "Uri"}, "_index": "2000_270_0"}, {"_score": 1, "_type": "Medical", "_id": "02:17447847049147026174478:174159", "_source": {"memberLastName": "Dubofsky", "memberMiddleName": "Prayag", "memberId": "0x7b93910446f91928e23e1043dfdf5bcG", "memberFirstName": "Uri"}, "_index": "2000_270_0"}], "total": 74, "max_score": 1}, "_shards": {"successful": 8, "failed": 0, "total": 8}, "took": 670, "timed_out": false}
Use dumps:
# cat json_raw.txt | python -c 'import sys; import simplejson as json;
print json.dumps( json.loads( sys.stdin.read() ), sort_keys=True, indent=4); '
{
"_shards": {
"failed": 0,
"successful": 8,
"total": 8
},
"hits": {
"hits": [
{
"_id": "02:17447847049147026174478:174159",
"_index": "2000_270_0",
"_score": 1,
"_source": {
"memberFirstName": "Uri",
"memberId": "0x7b93910446f91928e23e1043dfdf5bcf",
"memberLastName": "Dubofsky",
"memberMiddleName": "Prayag"
},
"_type": "Medical"
},
{
"_id": "02:17447847049147026174478:174159",
"_index": "2000_270_0",
"_score": 1,
"_source": {
"memberFirstName": "Uri",
"memberId": "0x7b93910446f91928e23e1043dfdf5bcG",
"memberLastName": "Dubofsky",
"memberMiddleName": "Prayag"
},
"_type": "Medical"
}
],
"max_score": 1,
"total": 74
},
"timed_out": false,
"took": 670
}
Thereafter, simply grep result with ‘memberId’ pattern.
To be completely precise:
#!/bin/bash
filename="$1"
cat $filename | python -c 'import sys; import simplejson as json;
print json.dumps( json.loads( sys.stdin.read() ), sort_keys=True, indent=4)' |
grep memberId | awk '{print $2}' | sed -e 's/^"//g' | sed -e 's/",$//g'
Usage:
$ bash bash.sh json_raw.txt 0x7b93910446f91928e23e1043dfdf5bcf 0x7b93910446f91928e23e1043dfdf5bcG
Method 6
Following this thread I’d use json.tool in python:
python -m json.tool members.json | awk -F'"' '/memberId/{print $4}'
Method 7
Using deepdiff you do not need to know the exact keys:
import json
from deepdiff import DeepSearch
DeepSearch(json.load(open("members.json", "r")), 'memberId', verbose_level=2)['matched_paths'].values()
Method 8
Here’s a bash solution.
- create file
find_members.sh -
add the following line to file + save
#!/bin/bash echo -e "nmemberIdsn---------" cat members.json | grep -E 'memberId'|awk '{print$2}' | cut -d '"' -f2 -
chmod +x find_members.sh
Now run it:
$ ./find_members.sh memberIds ---------------- 0x7b93910446f91928e23e1043dfdf5bcf 0x7b93910446f91928e23e1043dfdf5bcG
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0