Escaping regex string

I want to use input from a user as a regex pattern for a search over some text. It works, but how I can handle cases where user puts characters that have meaning in regex?

For example, the user wants to search for Word (s): regex engine will take the (s) as a group. I want it to treat it like a string "(s)" . I can run replace on user input and replace the ( with ( and the ) with ) but the problem is I will need to do replace for every possible regex symbol.

Do you know some better way ?

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

Use the re.escape() function for this:

4.2.3 re Module Contents

escape(string)

Return string with all non-alphanumerics backslashed; this is useful if you want to match an arbitrary literal string that may have regular expression metacharacters in it.

A simplistic example, search any occurence of the provided string optionally followed by ‘s’, and return the match object.

def simplistic_plural(word, text):
    word_or_plural = re.escape(word) + 's?'
    return re.match(word_or_plural, text)

Method 2

You can use re.escape():

re.escape(string)
Return string with all non-alphanumerics backslashed; this is useful if you want to match an arbitrary literal string that may have regular expression metacharacters in it.

>>> import re
>>> re.escape('^a.*$')
'\^a\.\*\$'

If you are using a Python version < 3.7, this will escape non-alphanumerics that are not part of regular expression syntax as well.

If you are using a Python version < 3.7 but >= 3.3, this will escape non-alphanumerics that are not part of regular expression syntax, except for specifically underscore (_).

Method 3

Unfortunately, re.escape() is not suited for the replacement string:

>>> re.sub('a', re.escape('_'), 'aa')
'\_\_'

A solution is to put the replacement in a lambda:

>>> re.sub('a', lambda _: '_', 'aa')
'__'

because the return value of the lambda is treated by re.sub() as a literal string.

Method 4

Please give a try:

Q and E as anchors

Put an Or condition to match either a full word or regex.

Ref Link : How to match a whole word that includes special characters in regex


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x