Parsing HTML in Python

What’s my best bet for parsing HTML if I can’t use BeautifulSoup or lxml? I’ve got some code that uses SGMLlib but it’s a bit low-level and it’s now deprecated. I would prefer if it could stomache a bit of malformed HTML although I’m pretty sure most of the input will be pretty clean. Answers: … Read more

Parsing HTML in python – lxml or BeautifulSoup? Which of these is better for what kinds of purposes?

From what I can make out, the two main HTML parsing libraries in Python are lxml and BeautifulSoup. I’ve chosen BeautifulSoup for a project I’m working on, but I chose it for no particular reason other than finding the syntax a bit easier to learn and understand. But I see a lot of people seem to favour lxml and I’ve heard that lxml is faster.