How would I, using BeautifulSoup, search for tags containing ONLY the attributes I search for?
For example, I want to find all <td valign="top"> tags.
The following code:
raw_card_data = soup.fetch('td', {'valign':re.compile('top')})
gets all of the data I want, but also grabs any <td> tag that has the attribute valign:top
I also tried:
raw_card_data = soup.findAll(re.compile('<td valign="top">'))
and this returns nothing (probably because of bad regex)
I was wondering if there was a way in BeautifulSoup to say “Find <td> tags whose only attribute is valign:top“
UPDATE
FOr example, if an HTML document contained the following <td> tags:
<td valign="top">.....</td><br /> <td width="580" valign="top">.......</td><br /> <td>.....</td><br />
I would want only the first <td> tag (<td width="580" valign="top">) to return
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
As explained on the BeautifulSoup documentation
You may use this :
soup = BeautifulSoup(html)
results = soup.findAll("td", {"valign" : "top"})
EDIT :
To return tags that have only the valign=”top” attribute, you can check for the length of the tag attrs property :
from BeautifulSoup import BeautifulSoup
html = '<td valign="top">.....</td>
<td width="580" valign="top">.......</td>
<td>.....</td>'
soup = BeautifulSoup(html)
results = soup.findAll("td", {"valign" : "top"})
for result in results :
if len(result.attrs) == 1 :
print result
That returns :
<td valign="top">.....</td>
Method 2
You can use lambda functions in findAll as explained in documentation. So that in your case to search for td tag with only valign = "top" use following:
td_tag_list = soup.findAll(
lambda tag:tag.name == "td" and
len(tag.attrs) == 1 and
tag["valign"] == "top")
Method 3
if you want to only search with attribute name with any value
from bs4 import BeautifulSoup
import re
soup= BeautifulSoup(html.text,'lxml')
results = soup.findAll("td", {"valign" : re.compile(r".*")})
as per Steve Lorimer better to pass True instead of regex
results = soup.findAll("td", {"valign" : True})
Method 4
The easiest way to do this is with the new CSS style select method:
soup = BeautifulSoup(html)
results = soup.select('td[valign="top"]')
Method 5
Just pass it as an argument of findAll:
>>> from BeautifulSoup import BeautifulSoup
>>> soup = BeautifulSoup("""
... <html>
... <head><title>My Title!</title></head>
... <body><table>
... <tr><td>First!</td>
... <td valign="top">Second!</td></tr>
... </table></body><html>
... """)
>>>
>>> soup.findAll('td')
[<td>First!</td>, <td valign="top">Second!</td>]
>>>
>>> soup.findAll('td', valign='top')
[<td valign="top">Second!</td>]
Method 6
find using an attribute in any tag
<th class="team" data-sort="team">Team</th>
soup.find_all(attrs={"class": "team"})
<th data-sort="team">Team</th>
soup.find_all(attrs={"data-sort": "team"})
Method 7
Adding a combination of Chris Redford’s and Amr’s answer, you can also search for an attribute name with any value with the select command:
from bs4 import BeautifulSoup as Soup
html = '<td valign="top">.....</td>
<td width="580" valign="top">.......</td>
<td>.....</td>'
soup = Soup(html, 'lxml')
results = soup.select('td[valign]')
Method 8
If you are looking to pull all tags where a particular attribute is present at all, you can use the same code as the accepted answer, but instead of specifying a value for the tag, just put True.
soup = BeautifulSoup(html)
results = soup.findAll("td", {"valign" : True})
This will return all td tags that have valign attributes. This is useful if your project involves pulling info from a tag like div that is used all over, but can handle very specific attributes that you might be looking for.
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0