Assume the following word sequence
BLA text text text text text text BLA text text text text LOOK text text text BLA text text BLA
What I would like to do is to extract the text from BLA to LOOK, but the BLA which is the closest to look. I.e. I would like to get
BLA text text text text LOOK
How should I do that using regular expressions? I got one solution which works, but which is exteremely inefficient.
BLA(?!.*?BLA.*?LOOK).*?LOOK
Is there a better and more performant way to achieve matching this pattern?
What I would like to do is: I would like to match BLA, then forward lookahead until either positive fordward lookahead with LOOK or negative lookahead with BLA. But I don’t know a way to put this into a regular expression.
As a engine I use re in python.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
(?s)BLA(?:(?!BLA).)*?LOOK
Try this. See demo.
Alternatively, use
BLA(?:(?!BLA|LOOK)[sS])*LOOK
To be safer.
Method 2
Another way to extract the desired text is to use the tempered greedy token technique, which matches a series of individual characters that do not begin an unwanted string.
r'bBLAb(?:(?!bBLAb).)*bLOOKb'
Start your engine! | Python code
bBLAb : match 'BLA' with word boundaries
(?: : begin non-capture group
(?!bBLAb) : negative lookahead asserts following characters are not
'BLA' with word boundaries
. : match any character
) : end non-capture group
* : execute non-capture group 0+ times
bLOOKb : match 'LOOK' with word boundaries
Word boundaries are included to avoid matching words such as BLACK and TRAILBLAZER.
Method 3
simply find text between LOOK and BLA without BLA
In : re.search(r'BLA [^(BLA)]+ LOOK', 'BLA text text text text text text BLA text text text text LOOK text text text BLA text text BLA').group() Out: 'BLA text text text text LOOK'
🙂
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0