Regex, select closest match

Assume the following word sequence

BLA text text text  text text text BLA text text text text LOOK text text text BLA text text BLA

What I would like to do is to extract the text from BLA to LOOK, but the BLA which is the closest to look. I.e. I would like to get

BLA text text text text LOOK

How should I do that using regular expressions? I got one solution which works, but which is exteremely inefficient.

BLA(?!.*?BLA.*?LOOK).*?LOOK

Is there a better and more performant way to achieve matching this pattern?

What I would like to do is: I would like to match BLA, then forward lookahead until either positive fordward lookahead with LOOK or negative lookahead with BLA. But I don’t know a way to put this into a regular expression.

As a engine I use re in python.

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

(?s)BLA(?:(?!BLA).)*?LOOK

Try this. See demo.

Alternatively, use

BLA(?:(?!BLA|LOOK)[sS])*LOOK

To be safer.

Method 2

Another way to extract the desired text is to use the tempered greedy token technique, which matches a series of individual characters that do not begin an unwanted string.

r'bBLAb(?:(?!bBLAb).)*bLOOKb'

Start your engine! | Python code

bBLAb        : match 'BLA' with word boundaries
(?:            : begin non-capture group
  (?!bBLAb)  : negative lookahead asserts following characters are not
                 'BLA' with word boundaries
  .            : match any character
)              : end non-capture group
*              : execute non-capture group 0+ times
bLOOKb       : match 'LOOK' with word boundaries

Word boundaries are included to avoid matching words such as BLACK and TRAILBLAZER.

Method 3

simply find text between LOOK and BLA without BLA

In : re.search(r'BLA [^(BLA)]+ LOOK', 'BLA text text text  text text text BLA text text text text LOOK text text text BLA text text BLA').group()
Out: 'BLA text text text text LOOK'

🙂


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x