Python Regex: Select everything between two brackets ignore brackets in between

I want to capture the entire sub-query irrespective of whether there’s a concat or substring function in between (i.e. ignore another bracket opening and closing within sub-query. (a) We don’t want to capture “join” as a word (b) “alias2” will not always be followed by “join” it can be anything (a word boundary, space, or “join” word).

Case 1: No concat or sub-string function in select

  • In: (select t1.col1 as alias1 from db.tb where t1.col1='val1') alias2 join
  • Out: (select t1.col1 as alias1 from db.tb where t1.col1='val1') alias2

Case 2: Concat function in select

  • In: (select concat(t1.col1, t2.col1, t3.col1) as alias1 from db.tb where t1.col1='val1') alias2 join
  • Out: (select concat(t1.col1, t2.col1, t3.col1) as alias1 from db.tb where t1.col1='val1') alias2

What I have tried:

Approach 1: re.findall('(select.*?)s[a-zA-Z0-9_]+', input statement)

Approach 2: As suggested by @TheFourthBird

import re
pat1 = '(select.*?)s[a-zA-Z0-9_]+'
pat2 = "(select [^()]*(?:(((?>[^()]+|(?1))*)))?[^()]*)[^()n]+"

string1 = "(select t1.col1 as alias1 from db.tb where t1.col1='val1') alias2"
string2 = "(select concat(t1.col1, t2.col1, t3.col1) as alias1 from db.tb where t1.col1='val1') alias2"

print(re.findall(pat1, string1))
print(re.findall(pat1, string2))

import regex as re
print(re.findall(pat2, string1))
print(re.findall(pat2, string2))

pattern = re.compile(pat2, re.UNICODE)
print([match.group(0) for match in pattern.finditer(string2)])

Output:

["(select t1.col1 as alias1 from db.tb where t1.col1='val1') alias2"]
['(select concat(t1.col1, t2.col1, t3.col1) as']
['']
['(t1.col1, t2.col1, t3.col1)']
["(select concat(t1.col1, t2.col1, t3.col1) as alias1 from db.tb where t1.col1='val1') alias2 join "]

What’s wrong with the above approach:

  • Approach 1: Works very well for case 1 but doesn’t for case 2.
  • Approach 2: Still doesn’t work! However, ["(select concat(t1.col1, t2.col1, t3.col1) as alias1 from db.tb where t1.col1='val1') alias2 join "] is the most closet to what’s expected. However, it shouldn’t capture what’s next to alias2.

Please help me!

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

You could first match (select and optionally match balanced parenthesis using the PyPi regex module.

At the end of the pattern, match a whitespace char and using your character class.

(select [^()]*(?:(((?>[^()]+|(?1))*)))?[^()]*)s[a-zA-Z0-9_]+

In parts, the pattern matches:

  • (select Match (select
  • [^()]* optionally match any char except ( and )
  • (?: Non capture group
    • ( Capture group 1
      • ((?>[^()]+|(?1))*) Match ( and use a recursive pattern recursing the first subgroup (capture group 1) and finally match the )
    • ) Close group 1
  • )? Close non capture group and make it optional
  • [^()]*) Optionally match any char except parenthesis, then match )
  • s[a-zA-Z0-9_]+ Match a whitespace char and 1+ of the listed in the character class

See a regex demo and a Python demo

For example, using re.finditer (as re.findall returns the capture group values):

import regex as re

pattern = r"(select [^()]*(?:(((?>[^()]+|(?1))*)))?[^()]*)s[a-zA-Z0-9_]+"

s = ("In: (select t1.col1 as alias1 from db.tb where t1.col1='val1') alias2nn"
    "Out: (select t1.col1 as alias1 from db.tb where t1.col1='val1') alias2nn"
    "In: (select concat(t1.col1, t2.col1, t3.col1) as alias1 from db.tb where t1.col1='val1') alias2nn"
    "Out: (select concat(t1.col1, t2.col1, t3.col1) as alias1 from db.tb where t1.col1='val1') alias2n")

matches = re.finditer(pattern, s)

for matchNum, match in enumerate(matches, start=1):
    
    print(match.group())

Output

["(select t1.col1 as alias1 from db.tb where t1.col1='val1') alias2"]
["(select concat(t1.col1, t2.col1, t3.col1) as alias1 from db.tb where t1.col1='val1') alias2"]

Note that matching SQL is error prone.


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x