I have the following:
<text top="52" left="20" width="383" height="15" font="0"><b>test</b></text>
and I have the following:
fileText = re.sub("<b>(.*?)</b>", "1", fileText, flags=re.DOTALL)
In which fileText is the string I posted above. When I print out fileText after I run the regex replacement I get back
<text top="52" left="20" width="383" height="15" font="0"></text>
instead of the expected
<text top="52" left="20" width="383" height="15" font="0">test</text>
Now I am fairly proficient at regex and I know that it should work, in fact I know that it matches properly because I can see it in the groups when I do a search and print out the groups but I am new to python and am confused as to why its not working with back references properly
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
You need to use a raw-string here so that the backslash isn’t processed as an escape character:
>>> import re
>>> fileText = '<text top="52" left="20" width="383" height="15" font="0"><b>test</b></text>'
>>> fileText = re.sub("<b>(.*?)</b>", r"1", fileText, flags=re.DOTALL)
>>> fileText
'<text top="52" left="20" width="383" height="15" font="0">test</text>'
>>>
Notice how "1" was changed to r"1". Though it is a very small change (one character), it has a big effect. See below:
>>> "1" 'x01' >>> r"1" '\1' >>>
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0