Bash regex capture group
I’m trying to match multiple alphanumeric values (this number could vary) from a string and save them to a bash capture group array. However, I’m only getting the first match:
mystring1='<link rel="self" href="/api/clouds/1/instances/1BBBBBB" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener"/> dsf <link rel="self" href="/api/clouds/1/instances/2AAAAAAA" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener"/>' regex='/instances/([A-Z0-9]+)' [[ $mystring1 =~ $regex ]] echo ${BASH_REMATCH[1]} 1BBBBBB echo ${BASH_REMATCH[2]}
As you can see- it matches the first value I’m looking for, but not the second.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
It’s a shame that you can’t do global matching in bash. You can do this:
global_rematch() { local s=$1 regex=$2 while [[ $s =~ $regex ]]; do echo "${BASH_REMATCH[1]}" s=${s#*"${BASH_REMATCH[1]}"} done } global_rematch "$mystring1" "$regex"
1BBBBBB 2AAAAAAA
This works by chopping the matched prefix off the string so the next part can be matched. It destroys the string, but in the function it’s a local variable, so who cares.
I would actually use that function to populate an array:
$ mapfile -t matches < <( global_rematch "$mystring1" "$regex" ) $ printf "%sn" "${matches[@]}" 1BBBBBB 2AAAAAAA
Method 2
To get the second array value, you need to have a second set of parentheses in the regex:
mystring1='<link rel="self" href="/api/clouds/1/instances/1BBBBBB" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener"/> dsf <link rel="self" href="/api/clouds/1/instances/2AAAAAAA" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener"/>' regex='/instances/([A-Z0-9]+).*/instances/([A-Z0-9]+)' [[ $mystring1 =~ $regex ]] $ echo ${BASH_REMATCH[1]} 1BBBBBB $ echo ${BASH_REMATCH[2]} 2AAAAAAA
Method 3
Python implementation :
import re def getall(mysentence): regex = re.compile(r'.*?/instances/([0-9A-Z]+)') result = regex.findall(mysentence) return result print(getall('<link rel="self" href="/api/clouds/1/instances/1BBBBBB"/> dsf <link rel="self" href="/api/clouds/1/instances/2AAAAAAA"/>'))
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0