I want to extract 5 continuous digits from the string
code I have written.
re.findall(r"((D|^)*)ddddd((D|$)*)", s)
but it can not pass the string
"Helpdesk-Agenten (m/w) Kennziffer: 12966"
The expected result is:
12966
Example 2:
#input "Helpdesk-Agenten (m/w) Kennziffer: 12966abc" # expected 12966
Example 3:
#input "Helpdesk-Agenten (m/w) Kennziffer: 12966345" # expected "" (because the length of continuous digits is longer than 5)
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
Your current regex (((D|^)*)ddddd((D|$)*)) used with re.findall won’t return the digit chunks because they are not captured. More, the (D|^)* and
(D|$)* parts are optional and that means they do not do what they are supposed to do, the regex will find 5 digit chunks inside longer digits chunks.
If you must find 5 digit chunk not enclosed with other digits, use
re.findall(r"(?<!d)d{5}(?!d)", s)
See the regex demo
Details:
(?<!d)– no digit is allowed before the current locationd{5}– 5 digits(?!d)– no digit allowed after the current location.
Method 2
Using word boundary (b), which match at beginning / end of the word:
>>> re.findall(r"bdddddb", "Helpdesk-Agenten (m/w) Kennziffer: 12966") ['12966']
ddddd can be replaced with d{5}:
>>> re.findall(r"bd{5}b", "Helpdesk-Agenten (m/w) Kennziffer: 12966")
['12966']
UPDATE If you need to get 12966 out of 12966abc, see Wiktor Stribiżew’s answer which use negative lookaround assertions.
or
>>> [match.group(2) for match in re.finditer(r'(D|^)(d{5})(D|$)', '12345abc')]
['12345']
or combining simple regular expression with list comprehension:
>>> [match for match in re.findall(r'd+', '12345abc') if len(match) == 5] ['12345']
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0