Trying to get to grips with regular expressions in Python, I’m trying to output some HTML highlighted in part of a URL. My input is
images/:id/size
my output should be
images/<span>:id</span>/size
If I do this in Javascript
method = 'images/:id/size'; method = method.replace(/:([a-z]+)/, '<span>$1</span>') alert(method)
I get the desired result, but if I do this in Python
>>> method = 'images/:id/huge'
>>> re.sub(':([a-z]+)', '<span>$1</span>', method)
'images/<span>$1</span>/huge'
I don’t, how do I get Python to return the correct result rather than $1? Is re.sub even the right function to do this?
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
Simply use 1 instead of $1:
In [1]: import re In [2]: method = 'images/:id/huge' In [3]: re.sub(r'(:[a-z]+)', r'<span>1</span>', method) Out[3]: 'images/<span>:id</span>/huge'
Also note the use of raw strings (r'...') for regular expressions. It is not mandatory but removes the need to escape backslashes, arguably making the code slightly more readable.
Method 2
Use 1 instead of $1.
number Matches the contents of the group of the same number.
http://docs.python.org/library/re.html#regular-expression-syntax
Method 3
A backreference to the whole match value is g<0>, see re.sub documentation:
The backreference
g<0>substitutes in the entire substring matched by the RE.
See the Python demo:
import re
method = 'images/:id/huge'
print(re.sub(r':[a-z]+', r'<span>g<0></span>', method))
# => images/<span>:id</span>/huge
If you need to perform a case insensitive search, add flag=re.I:
re.sub(r':[a-z]+', r'<span>g<0></span>', method, flags=re.I)
Method 4
For the replacement portion, Python uses 1 the way sed and vi do, not $1 the way Perl, Java, and Javascript (amongst others) do. Furthermore, because 1 interpolates in regular strings as the character U+0001, you need to use a raw string or escape it.
Python 3.2 (r32:88445, Jul 27 2011, 13:41:33)
[GCC 4.0.1 (Apple Inc. build 5465)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> method = 'images/:id/huge'
>>> import re
>>> re.sub(':([a-z]+)', r'<span>1</span>', method)
'images/<span>id</span>/huge'
>>>
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0