I am following a tutorial to try to learn how to use BeautifulSoup. I am trying to remove names from the urls on a html page I downloaded. I have it working great to this point.
from bs4 import BeautifulSoup
soup = BeautifulSoup(open("43rd-congress.html"))
final_link = soup.p.a
final_link.decompose()
links = soup.find_all('a')
for link in links:
print link
but when I enter this next part
from bs4 import BeautifulSoup
soup = BeautifulSoup(open("43rd-congress.html"))
final_link = soup.p.a
final_link.decompose()
links = soup.find_all('a')
for link in links:
names = link.contents[0]
fullLink = link.get('href')
print names
print fullLink
I get this error
Traceback (most recent call last):
File "C:/Python27/python tutorials/soupexample.py", line 13, in <module>
print names
File "C:Python27libidlelibPyShell.py", line 1325, in write
return self.shell.write(s, self.tags)
File "C:Python27libidlelibrpc.py", line 595, in __call__
value = self.sockio.remotecall(self.oid, self.name, args, kwargs)
File "C:Python27libidlelibrpc.py", line 210, in remotecall
seq = self.asynccall(oid, methodname, args, kwargs)
File "C:Python27libidlelibrpc.py", line 225, in asynccall
self.putmessage((seq, request))
File "C:Python27libidlelibrpc.py", line 324, in putmessage
s = pickle.dumps(message)
File "C:Python27libcopy_reg.py", line 74, in _reduce_ex
getstate = self.__getstate__
RuntimeError: maximum recursion depth exceeded
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
This is a buggy interaction between IDLE and BeautifulSoup’s NavigableString objects (which subclass unicode). See issue 1757057; it’s been around for a while.
The work-around is to convert the object to a plain unicode value first:
print unicode(names)
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0