re.sub(“.*”, “, “(replacement)”, “text”) doubles replacement on Python 3.7

On Python 3.7 (tested on Windows 64 bits), the replacement of a string using the RegEx .* gives the input string repeated twice!

On Python 3.7.2:

>>> import re
>>> re.sub(".*", "(replacement)", "sample text")
'(replacement)(replacement)'

On Python 3.6.4:

>>> import re
>>> re.sub(".*", "(replacement)", "sample text")
'(replacement)'

On Python 2.7.5 (32 bits):

>>> import re
>>> re.sub(".*", "(replacement)", "sample text")
'(replacement)'

What is wrong? How to fix that?

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

This is not a bug, but a bug fix in Python 3.7 from the commit fbb490fd2f38bd817d99c20c05121ad0168a38ee.

In regex, a non-zero-width match moves the pointer position to the end of the match, so that the next assertion, zero-width or not, can continue to match from the position following the match. So in your example, after .* greedily matches and consumes the entire string, the fact that the pointer is then moved to the end of the string still actually leaves “room” for a zero-width match at that position, as can be evident from the following code, which behaves the same in Python 2.7, 3.6 and 3.7:

>>> re.findall(".*", 'sample text')
['sample text', '']

So the bug fix, which is about replacement of a zero-width match right after a non-zero-width match, now correctly replaces both matches with the replacement text.


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x