re.sub replace with matched content

Trying to get to grips with regular expressions in Python, I’m trying to output some HTML highlighted in part of a URL. My input is

images/:id/size

my output should be

images/<span>:id</span>/size

If I do this in Javascript

method = 'images/:id/size';
method = method.replace(/:([a-z]+)/, '<span>$1</span>')
alert(method)

I get the desired result, but if I do this in Python

>>> method = 'images/:id/huge'
>>> re.sub(':([a-z]+)', '<span>$1</span>', method)
'images/<span>$1</span>/huge'

I don’t, how do I get Python to return the correct result rather than $1? Is re.sub even the right function to do this?

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

Simply use 1 instead of $1:

In [1]: import re

In [2]: method = 'images/:id/huge'

In [3]: re.sub(r'(:[a-z]+)', r'<span>1</span>', method)
Out[3]: 'images/<span>:id</span>/huge'

Also note the use of raw strings (r'...') for regular expressions. It is not mandatory but removes the need to escape backslashes, arguably making the code slightly more readable.

Method 2

Use 1 instead of $1.

number Matches the contents of the group of the same number.

http://docs.python.org/library/re.html#regular-expression-syntax

Method 3

A backreference to the whole match value is g<0>, see re.sub documentation:

The backreference g<0> substitutes in the entire substring matched by the RE.

See the Python demo:

import re
method = 'images/:id/huge'
print(re.sub(r':[a-z]+', r'<span>g<0></span>', method))
# => images/<span>:id</span>/huge

If you need to perform a case insensitive search, add flag=re.I:

re.sub(r':[a-z]+', r'<span>g<0></span>', method, flags=re.I)

Method 4

For the replacement portion, Python uses 1 the way sed and vi do, not $1 the way Perl, Java, and Javascript (amongst others) do. Furthermore, because 1 interpolates in regular strings as the character U+0001, you need to use a raw string or escape it.

Python 3.2 (r32:88445, Jul 27 2011, 13:41:33) 
[GCC 4.0.1 (Apple Inc. build 5465)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> method = 'images/:id/huge'
>>> import re
>>> re.sub(':([a-z]+)', r'<span>1</span>', method)
'images/<span>id</span>/huge'
>>>


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x