I’m trying to find every 10 digit series of numbers within a larger series of numbers using re in Python 2.6.
I’m easily able to grab no overlapping matches, but I want every match in the number series. Eg.
in “123456789123456789”
I should get the following list:
[1234567891,2345678912,3456789123,4567891234,5678912345,6789123456,7891234567,8912345678,9123456789]
I’ve found references to a “lookahead”, but the examples I’ve seen only show pairs of numbers rather than larger groupings and I haven’t been able to convert them beyond the two digits.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
Use a capturing group inside a lookahead. The lookahead captures the text you’re interested in, but the actual match is technically the zero-width substring before the lookahead, so the matches are technically non-overlapping:
import re
s = "123456789123456789"
matches = re.finditer(r'(?=(d{10}))',s)
results = [int(match.group(1)) for match in matches]
# results:
# [1234567891,
# 2345678912,
# 3456789123,
# 4567891234,
# 5678912345,
# 6789123456,
# 7891234567,
# 8912345678,
# 9123456789]
Method 2
You can also try using the third-party regex module (not re), which supports overlapping matches.
>>> import regex as re
>>> s = "123456789123456789"
>>> matches = re.findall(r'd{10}', s, overlapped=True)
>>> for match in matches: print(match) # print match
...
1234567891
2345678912
3456789123
4567891234
5678912345
6789123456
7891234567
8912345678
9123456789
Method 3
I’m fond of regexes, but they are not needed here.
Simply
s = "123456789123456789" n = 10 li = [ s[i:i+n] for i in xrange(len(s)-n+1) ] print 'n'.join(li)
result
1234567891 2345678912 3456789123 4567891234 5678912345 6789123456 7891234567 8912345678 9123456789
Method 4
Piggybacking on the accepted answer, the following currently works as well
import re
s = "123456789123456789"
matches = re.findall(r'(?=(d{10}))',s)
results = [int(match) for match in matches]
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0