Python and BeautifulSoup encoding issues

I’m writing a crawler with Python using BeautifulSoup, and everything was going swimmingly till I ran into this site: http://www.elnorte.ec/ I’m getting the contents with the requests library: r = requests.get('http://www.elnorte.ec/') content = r.content If I do a print of the content variable at that point, all the spanish special characters seem to be working … Read more

Parsing HTML in python – lxml or BeautifulSoup? Which of these is better for what kinds of purposes?

From what I can make out, the two main HTML parsing libraries in Python are lxml and BeautifulSoup. I’ve chosen BeautifulSoup for a project I’m working on, but I chose it for no particular reason other than finding the syntax a bit easier to learn and understand. But I see a lot of people seem to favour lxml and I’ve heard that lxml is faster.