Python Regular Expression Match All 5 Digit Numbers but None Larger

I’m attempting to string match 5-digit coupon codes spread throughout a HTML web page. For example, 53232, 21032, 40021 etc… I can handle the simpler case of any string of 5 digits with [0-9]{5}, though this also matches 6, 7, 8… n digit numbers. Can someone please suggest how I would modify this regular expression to match only 5 digit numbers?

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

>>> import re
>>> s="four digits 1234 five digits 56789 six digits 012345"
>>> re.findall(r"D(d{5})D", s)
['56789']

if they can occur at the very beginning or the very end, it’s easier to pad the string than mess with special cases

>>> re.findall(r"D(d{5})D", " "+s+" ")

Method 2

Without padding the string for special case start and end of string, as in John La Rooy answer one can use the negatives lookahead and lookbehind to handle both cases with a single regular expression

>>> import re
>>> s = "88888 999999 3333 aaa 12345 hfsjkq 98765"
>>> re.findall(r"(?<!d)d{5}(?!d)", s)
['88888', '12345', '98765']

Method 3

full string: ^[0-9]{5}$

within a string: [^0-9][0-9]{5}[^0-9]

Method 4

Note: There is problem in using D since D matches any character that is not a digit , instead use b.
b is important here because it matches the word boundary but only at end or beginning of a word .

import re  

input = "four digits 1234 five digits 56789 six digits 01234,56789,01234"


re.findall(r"bd{5}b", input)  

result : ['56789', '01234', '56789', '01234']

but if one uses
re.findall(r”D(d{5})D”, s)
output : [‘56789’, ‘01234’]
D is unable to handle comma or any continuously entered numerals.

b is important part here it matches the empty string but only at end or beginning of a word .

More documentation: https://docs.python.org/2/library/re.html

More Clarification on usage of D vs b:

This example uses D but it doesn’t capture all the five digits number.

This example uses b while capturing all five digits number.

Cheers

Method 5

A very simple way would be to match all groups of digits, like with r'd+', and then skip every match that isn’t five characters long when you process the results.

Method 6

You probably want to match a non-digit before and after your string of 5 digits, like [^0-9]([0-9]{5})[^0-9]. Then you can capture the inner group (the actual string you want).

Method 7

You could try

Dd{5}D

or maybe

bd{5}b

I’m not sure how python treats line-endings and whitespace there though.

I believe ^d{5}$ would not work for you, as you likely want to get numbers that are somewhere within other text.


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x