Regex matching 5-digit substrings not enclosed with digits

I want to extract 5 continuous digits from the string

code I have written.

re.findall(r"((D|^)*)ddddd((D|$)*)", s)

but it can not pass the string

"Helpdesk-Agenten (m/w) Kennziffer: 12966"

The expected result is:

Example 2:

#input
"Helpdesk-Agenten (m/w) Kennziffer: 12966abc"
# expected
12966

Example 3:

#input
"Helpdesk-Agenten (m/w) Kennziffer: 12966345"
# expected
"" (because the length of continuous digits is longer than 5)

Contents hide

Answers:

Method 1

Method 2

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

Your current regex (((D|^)*)ddddd((D|$)*)) used with re.findall won’t return the digit chunks because they are not captured. More, the (D|^)* and
(D|$)* parts are optional and that means they do not do what they are supposed to do, the regex will find 5 digit chunks inside longer digits chunks.

If you must find 5 digit chunk not enclosed with other digits, use

re.findall(r"(?<!d)d{5}(?!d)", s)

See the regex demo

Details:

(?<!d) – no digit is allowed before the current location
d{5} – 5 digits
(?!d) – no digit allowed after the current location.

Method 2

Using word boundary (b), which match at beginning / end of the word:

>>> re.findall(r"bdddddb", "Helpdesk-Agenten (m/w) Kennziffer: 12966")
['12966']

ddddd can be replaced with d{5}:

>>> re.findall(r"bd{5}b", "Helpdesk-Agenten (m/w) Kennziffer: 12966")
['12966']

UPDATE If you need to get 12966 out of 12966abc, see Wiktor Stribiżew’s answer which use negative lookaround assertions.

>>> [match.group(2) for match in re.finditer(r'(D|^)(d{5})(D|$)', '12345abc')]
['12345']

or combining simple regular expression with list comprehension:

>>> [match for match in re.findall(r'd+', '12345abc') if len(match) == 5]
['12345']

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes

Article Rating