html-content-extraction Archives

Parse a .Net Page with Postbacks

August 28, 2022 by Magenaut

I need to read data from an online database that’s displayed using an aspx page from the UN. I’ve done HTML parsing before, but it was always by manipulating query-string values. In this case, the site uses asp.net postbacks. So, you click on a value in box one, then box two shows, click on a value in box 2 and click a button to get your results.

Extracting text from HTML file using Python

August 21, 2022 by Magenaut

I’d like to extract the text from an HTML file using Python. I want essentially the same output I would get if I copied the text from a browser and pasted it into notepad.

BeautifulSoup Grab Visible Webpage Text

August 21, 2022 by Magenaut

Basically, I want to use BeautifulSoup to grab strictly the visible text on a webpage. For instance, this webpage is my test case. And I mainly want to just get the body text (article) and maybe even a few tab names here and there. I have tried the suggestion in this SO question that returns lots of <script> tags and html comments which I don’t want. I can’t figure out the arguments I need for the function findAll() in order to just get the visible texts on a webpage.

Using BeautifulSoup to find a HTML tag that contains certain text

August 13, 2022 by Magenaut

I’m trying to get the elements in an HTML doc that contain the following pattern of text: #S{11}