I’m trying to get the email from the city from http://www.comuni-italiani.it/110/index.html
I have the speceific child direction using xPath Finder which is /html/body/span[3]/table[2]/tbody/tr[1]/td[2]/table/tbody/tr[11]/td/b/a. Now I’m trying to retrieve the email from this page but I know very little of BeatifulSoup library (I’m just getting started). After reading several guides I managed to write the following code, but I’m not succesfull with indicating the child route correctly
from bs4 import BeautifulSoup
import requests
# sample web page
sample_web_page = 'http://www.comuni-italiani.it/110/index.html'
# call get method to request that page
page = requests.get(sample_web_page)
# with the help of beautifulSoup and html parser create soup
soup = BeautifulSoup(page.content, "html.parser")
child_soup = soup.find('span')
for i in child_soup.children:
print("child : ", i)
What am I doing wrong??
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
Please find my attempt to solve your problem below. It starts the same way as in your code, just has a bit of magic to find the email and print it out.
from bs4 import BeautifulSoup
import requests
sample_web_page = 'http://www.comuni-italiani.it/110/index.html'
page = requests.get(sample_web_page)
soup = BeautifulSoup(page.content, "html.parser")
email = soup.select_one('b > a[href^="mail"]')['href']
print(email.split(':')[1])
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0