How do I return a string from a regex match in python?

I am running through lines in a text file using a python script.
I want to search for an img tag within the text document and return the tag as text.

When I run the regex re.match(line) it returns a _sre.SRE_MATCH object.
How do I get it to return a string?

import sys
import string
import re

f = open("sample.txt", 'r' )
l = open('writetest.txt', 'w')

count = 1

for line in f:
    line = line.rstrip()
    imgtag  = re.match(r'<img.*?>',line)
    print("yo it's a {}".format(imgtag))

When run it prints:

yo it's a None
yo it's a None
yo it's a None
yo it's a <_sre.SRE_Match object at 0x7fd4ea90e578>
yo it's a None
yo it's a <_sre.SRE_Match object at 0x7fd4ea90e578>
yo it's a None
yo it's a <_sre.SRE_Match object at 0x7fd4ea90e578>
yo it's a <_sre.SRE_Match object at 0x7fd4ea90e5e0>
yo it's a None
yo it's a None

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

You should use re.MatchObject.group(0). Like

imtag = re.match(r'<img.*?>', line).group(0)

Edit:

You also might be better off doing something like

imgtag  = re.match(r'<img.*?>',line)
if imtag:
    print("yo it's a {}".format(imgtag.group(0)))

to eliminate all the Nones.

Method 2

imgtag.group(0) or imgtag.group(). This returns the entire match as a string. You are not capturing anything else either.

http://docs.python.org/release/2.5.2/lib/match-objects.html

Method 3

Note that re.match(pattern, string, flags=0) only returns matches at the beginning of the string. If you want to locate a match anywhere in the string, use re.search(pattern, string, flags=0) instead (https://docs.python.org/3/library/re.html). This will scan the string and return the first match object. Then you can extract the matching string with match_object.group(0) as the folks suggested.

Method 4

Considering there might be several img tags I would recommend re.findall:

import re

with open("sample.txt", 'r') as f_in, open('writetest.txt', 'w') as f_out:
    for line in f_in:
        for img in re.findall('<img[^>]+>', line):
            print >> f_out, "yo it's a {}".format(img)


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x