For one off string searches, is it faster to simply use str.find/rfind than to use re.match/search?
That is, for a given string, s, should I use:
if s.find('lookforme') > -1:
do something
or
if re.match('lookforme',s):
do something else
?
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
The question: which is faster is best answered by using timeit.
from timeit import timeit
import re
def find(string, text):
if string.find(text) > -1:
pass
def re_find(string, text):
if re.match(text, string):
pass
def best_find(string, text):
if text in string:
pass
print timeit("find(string, text)", "from __main__ import find; string='lookforme'; text='look'")
print timeit("re_find(string, text)", "from __main__ import re_find; string='lookforme'; text='look'")
print timeit("best_find(string, text)", "from __main__ import best_find; string='lookforme'; text='look'")
The output is:
0.441393852234 2.12302494049 0.251421928406
So not only should you use the in operator because it is easier to read, but because it is faster also.
Method 2
Use this:
if 'lookforme' in s:
do something
Regex need to be compiled first, which adds some overhead. Python’s normal string search is very efficient anyways.
If you search the same term a lot or when you do something more complex then regex become more useful.
Method 3
Just to complete the most up-voted answer concerns about regex compilation time, here is a version with precompiled pattern:
from timeit import timeit
import re
def find(string, text):
if string.find(text) > -1:
pass
def re_find(string, text_re):
if text_re.match(string):
pass
def best_find(string, text):
if text in string:
pass
print timeit("find(string, text)", "from __main__ import find; string='lookforme'; text='look'")
print timeit("re_find(string, text_re)", "from __main__ import re_find; string='lookforme'; import re; text_re=re.compile('look')")
print timeit("best_find(string, text)", "from __main__ import best_find; string='lookforme'; text='look'")
And my numbers:
0.189274072647 0.239935874939 0.0820939540863
Precompiled pattern improve numbers, but still, in is the faster.
Method 4
re.compile speeds up regexs a lot if you are searching for the same thing over and over. But I just got a huge speedup by using “in” to cull out bad cases before I match. Anecdotal, I know. ~Ben
Method 5
I’ve had the same problem. I used Jupyter’s %timeit to check:
import re
sent = "a sentence for measuring a find function"
sent_list = sent.split()
print("x in sentence")
%timeit "function" in sent
print("x in token list")
%timeit "function" in sent_list
print("regex search")
%timeit bool(re.match(".*function.*", sent))
print("compiled regex search")
regex = re.compile(".*function.*")
%timeit bool(regex.match(sent))
x in sentence 61.3 ns ± 3 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
x in token list 93.3 ns ± 1.26 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
regex search 772 ns ± 8.42 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
compiled regex search 420 ns ± 7.68 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
Compiling is fast but the simple in is better.
Method 6
Maybe someone is still interested.
The given answers seem fine but only look at a very short string.
In fact if you take a long string and the pattern you are looking for is roughly at the end then the performance changes in favor of regex!
import re
def find(string, text):
if string.find(text) > -1:
pass
def re_find(string, text):
if re.match(text, string):
pass
def best_find(string, text):
if text in string:
pass
very_long_string = 'sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd sasgda;dlaskjgasdlj sadlf;jsaf lkjvasdfa dsadkfldsfhsa svnsa;df adsfkj;ljkasdf asdf;lkjafd'
pattern = 'look'
print('pattern at the end of string')
print('find:', end=' ')
%timeit find(very_long_string + pattern, pattern)
print('regex:', end=' ')
%timeit re_find(very_long_string + pattern, pattern)
print('in:', end=' ')
%timeit best_find(very_long_string + pattern, pattern)
print('pattern in front of string')
print('find:', end=' ')
%timeit find(pattern + very_long_string, pattern)
print('regex:', end=' ')
%timeit re_find(pattern + very_long_string, pattern)
print('in:', end=' ')
%timeit best_find(pattern + very_long_string, pattern)
which gives the output:
pattern at the end of string find: 3.41 µs ± 74.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) regex: 1.93 µs ± 23.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) in: 3.32 µs ± 74.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) pattern in front of string find: 748 ns ± 15.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) regex: 2.03 µs ± 21.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) in: 589 ns ± 6.75 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
Summary: find and in depend on string length and location of pattern in the string while regex is somehow string-length independent and faster for very long strings with the pattern at the end.
Method 7
in addition to above answers, re.search() and re.match() takes the same runtime.
if(re.search(rf"b{re.escape(some_keyword)}b",some_sentence))
takes the same runtime as
if(re.search(rf"b{re.escape(some_keyword)}b",some_sentence))
and if your regex necessarily require some word match then it is rather a better option to reduce your regex comparison with “if” “in” search. For example the following is faster then then the above two and gives the same result:
if(some_keyword.lower() in some_sentence.lower()):
if(re.search(rf"b{re.escape(some_keyword)}b",some_sentence)):
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0