I’m trying to match some sort of amount, here are all possibilities:
$5.6 million $4,1 million $8,1M $6.3M $333,333 $2 million $5 million
I have already this regex:
$d{1,3}(?:,d{3})*(?:s+(?:thousand|[mb]illion|[MB]illion)|[M])?
See online demo.
But I’m not able to match those ones:
$5.6 million $4,1 million $8,1M $6.3M
Any help would be appreciated.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
Let’s look at your regular expression:
$d{1,3}(?:,d{3})*(?:s+(?:thousand|[mb]illion|[MB]illion)|[M])?
$d{1,3} is fine. What follows? One way to answer that is to consider the following three possibilities.
The string to be matched ends ' million'
This string (which begins with a space, in case you missed that) is preceded by an empty string or a single digit preceded by a comma or period:
(?:[,.]d)? million
Evidently, “million” can be “thousand” or “billion”, and the first in last might be capitalized, so we change the expression to
(?:[,.]d)? (?:[MmBb]illion|thousand)
One potential problem is that this matches '$5.6 millionaire'. We can avoid that problem by tacking on a word boundary preventing the match to be followed by a word character:
(?:[,.]d)? (?:[MmBb]illion|thousand)b
The string ends 'M'
In this case the 'M' must be preceded by a single digit preceded by a comma or period:
[,.]dMb
You could accept 'B' as well by changing M to [MB].
The string ends with three digits preceded by a comma
Here we need
,d{3}b
Here the word boundary avoids matching, for example, $333,3333'. It will not match, however, '$333,333,333' or '$333,333,333,333'. If we want to match those we could change the expression to
(?:,d{3})+b
or to match '$333' as well, change it to
(?:,d{3})*b
Construct the alternation
We therefore can use the following regular expression.
$d{1,3}(?:(?:[,.]d)? (?:[MmBb]illion|thousand)b|[,.]dMb|,d{3}b)
Factoring out the end-of-string anchor we obtain
$d{1,3}(?:(?:[,.]d)? (?:[MmBb]illion|thousand)|[,.]dM|,d{3})b
Method 2
You can use
(?i)$d+(?:[.,]d+)*(?:s+(?:thousand|[mb]illion)|m)?
If you need to make sure you do not match m that is part of another word:
(?i)$d+(?:[.,]d+)*(?:s+(?:thousand|[mb]illion)|m)?b
See the regex demo. Details:
(?i)– case insensitive option$– a$chard+– one or more digits(?:[.,]d+)*– zero or more repetitions of.or,and then one or more digits(?:s+(?:thousand|[mb]illion)|m)?– an optional occurrence ofs+(?:thousand|[mb]illion)– one or more whitespaces and thenthousand,millionorbillion|– orm– anmchar
b– a word boundary.
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0