I want to find all consecutive, repeated character blocks in a string. For example, consider the following:
s = r'http://www.google.com/search=ooo-jjj'
What I want to find this: www, ooo and jjj.
I tried to do it like this:
m = re.search(r'(w)11', s)
But it doesn’t seem to work as I expect. Any ideas?
Also, how can I do it in Bash?
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
((w)2{2,}) matches 3 or more consecutive characters:
In [71]: import re
In [72]: s = r'http://www.google.com/search=ooo-jjjj'
In [73]: re.findall(r'((w)2{2,})', s)
Out[73]: [('www', 'w'), ('ooo', 'o'), ('jjjj', 'j')]
In [78]: [match[0] for match in re.findall(r'((w)2{2,})', s)]
Out[78]: ['www', 'ooo', 'jjjj']
(w) matches any alphanumeric character.
((w)2) matches any alphanumeric character followed by the same character, since 2 matches the contents of group number 2.
Since I nested the parentheses, group number 2 refers to the character matched by w.
Then putting it all together,
((w)2{2,}) matches any alphanumeric character, followed by the same character repeated 2 or more additional times.
In total, that means the regex require the character to be repeated 3 or more times.
Method 2
The following code should solve your problem:
s="abc def aaa bbb ccc def hhh"
for match in re.finditer(r"(w)11", s):
print s[match.start():match.end()]
Method 3
It works almost right, just replace search with finditer. It returns an iterator, not a match but…:
m = [(x.start(),x.end()) for x in re.finditer(r'(w)11', s)]
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0