I tried all the solutions mentionned here but none of it is working on my code.
My problem is i only want to get the text from spans tags which are children of h2 tags (and not h3 tags) on this Wikipedia page (https://fr.wikipedia.org/wiki/Manga)
This is my code :
import numbers
import urllib.request
from bs4 import BeautifulSoup
quote_page ='https://fr.wikipedia.org/wiki/Manga#:~:text=Un%20manga%20(%E6%BC%AB%E7%94%BB)%20est%20une,quelle%20que%20soit%20son%20origine.'
page = urllib.request.urlopen(quote_page)
soup = BeautifulSoup(page, 'html.parser')
spans = soup.find_all('h2 > span.mw-heading')
#not working, results show all spans in h2 AND h3
for span in spans :
print(span.text)
#div_span = soup.find_all('span', class_="mw-headline")
#for spans in div_span:
# print(spans.text) #or string ?
Is someone has the solution today, i would be thankfull to him 😉
(comments are working but taking spans tags with h3 tags in it :/)
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
You are close to your goal but mixing things in my opinion and should use select while operating with css selectors:
soup.select('h2 > span.mw-headline')
Another issue here is that the class is named mw-headline instead mw-heading.
Example
import urllib.request
from bs4 import BeautifulSoup
quote_page ='https://fr.wikipedia.org/wiki/Manga#:~:text=Un%20manga%20(%E6%BC%AB%E7%94%BB)%20est%20une,quelle%20que%20soit%20son%20origine.'
page = urllib.request.urlopen(quote_page)
soup = BeautifulSoup(page, 'html.parser')
for e in soup.select('h2 > span.mw-headline'):
print(e.text)
Output
Étymologie Genre et nombre du mot « manga » en français Histoire des mangas Caractéristiques du manga Diffusion Influence du manga Produits dérivés Notes et références Voir aussi
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0