I have an xml document in the following format:
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:openSearch="http://a9.com/-/spec/opensearchrss/1.0/" xmlns:gsa="http://schemas.google.com/gsa/2007">
...
<entry>
<id>https://ip.ad.dr.ess:8000/feeds/diagnostics/smb://ip.ad.dr.ess/path/to/file</id>
<updated>2011-11-07T21:32:39.795Z</updated>
<app:edited xmlns:app="http://purl.org/atom/app#">2011-11-07T21:32:39.795Z</app:edited>
<link rel="self" type="application/atom+xml" href="https://ip.ad.dr.ess:8000/feeds/diagnostics" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener"/>
<link rel="edit" type="application/atom+xml" href="https://ip.ad.dr.ess:8000/feeds/diagnostics" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener"/>
<gsa:content name="entryID">smb://ip.ad.dr.ess/path/to/directory</gsa:content>
<gsa:content name="numCrawledURLs">7</gsa:content>
<gsa:content name="numExcludedURLs">0</gsa:content>
<gsa:content name="type">DirectoryContentData</gsa:content>
<gsa:content name="numRetrievalErrors">0</gsa:content>
</entry>
<entry>
...
</entry>
...
</feed>
I need to retrieve all entry elements using xpath in lxml. My problem is that I can’t figure out how to use an empty namespace. I have tried the following examples, but none work. Please advise.
import lxml.etree as et tree=et.fromstring(xml)
The various things I have tried are:
for node in tree.xpath('//entry'):
or
namespaces = {None:"http://www.w3.org/2005/Atom" ,"openSearch":"http://a9.com/-/spec/opensearchrss/1.0/" ,"gsa":"http://schemas.google.com/gsa/2007"}
for node in tree.xpath('//entry', namespaces=ns):
or
for node in tree.xpath('//"{http://www.w3.org/2005/Atom}entry"'):
At this point I just don’t know what to try. Any help is greatly appreciated.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
Something like this should work:
import lxml.etree as et
ns = {"atom": "http://www.w3.org/2005/Atom"}
tree = et.fromstring(xml)
for node in tree.xpath('//atom:entry', namespaces=ns):
print node
See also http://lxml.de/xpathxslt.html#namespaces-and-prefixes.
Alternative:
for node in tree.xpath("//*[local-name() = 'entry']"):
print node
Method 2
Use findall method.
for item in tree.findall('{http://www.w3.org/2005/Atom}entry'):
print item
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0