Bash regex capture group

I’m trying to match multiple alphanumeric values (this number could vary) from a string and save them to a bash capture group array. However, I’m only getting the first match:

mystring1='<link rel="self" href="/api/clouds/1/instances/1BBBBBB" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener"/> dsf <link rel="self" href="/api/clouds/1/instances/2AAAAAAA" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener"/>'

regex='/instances/([A-Z0-9]+)'

[[ $mystring1 =~ $regex ]]

echo ${BASH_REMATCH[1]}
1BBBBBB

echo ${BASH_REMATCH[2]}

As you can see- it matches the first value I’m looking for, but not the second.

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

It’s a shame that you can’t do global matching in bash. You can do this:

global_rematch() { 
    local s=$1 regex=$2 
    while [[ $s =~ $regex ]]; do 
        echo "${BASH_REMATCH[1]}"
        s=${s#*"${BASH_REMATCH[1]}"}
    done
}
global_rematch "$mystring1" "$regex"

1BBBBBB
2AAAAAAA

This works by chopping the matched prefix off the string so the next part can be matched. It destroys the string, but in the function it’s a local variable, so who cares.

I would actually use that function to populate an array:

$ mapfile -t matches < <( global_rematch "$mystring1" "$regex" )
$ printf "%sn" "${matches[@]}"
1BBBBBB
2AAAAAAA

Method 2

To get the second array value, you need to have a second set of parentheses in the regex:

mystring1='<link rel="self" href="/api/clouds/1/instances/1BBBBBB" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener"/> dsf <link rel="self" href="/api/clouds/1/instances/2AAAAAAA" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener"/>'

regex='/instances/([A-Z0-9]+).*/instances/([A-Z0-9]+)'

[[ $mystring1 =~ $regex ]]

$ echo ${BASH_REMATCH[1]}
1BBBBBB
$ echo ${BASH_REMATCH[2]}
2AAAAAAA

Method 3

Python implementation :

import re

def getall(mysentence):
    regex = re.compile(r'.*?/instances/([0-9A-Z]+)')
    result = regex.findall(mysentence)
    return result

print(getall('<link rel="self" href="/api/clouds/1/instances/1BBBBBB"/> dsf <link rel="self" href="/api/clouds/1/instances/2AAAAAAA"/>'))


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments