web-crawler Archives

Pulling data from a webpage, parsing it for specific pieces, and displaying it

September 2, 2022 by Magenaut

I’ve been using this site for a long time to find answers to my questions, but I wasn’t able to find the answer on this one.

Asp.net Request.Browser.Crawler – Dynamic Crawler List?

August 31, 2022 by Magenaut

I learned Why Request.Browser.Crawler is Always False in C# (http://www.digcode.com/default.aspx?page=ed51cde3-d979-4daf-afae-fa6192562ea9&article=bc3a7a4f-f53e-4f88-8e9c-c9337f6c05a0).

How can I bring google-like recrawling in my application(web or console)

August 29, 2022 by Magenaut

How can I bring google-like recrawling in my application(web or console). I need only those pages to be recrawled which are updated after a particular date.

Is it possible crawl ASP.NET pages?

August 23, 2022 by Magenaut

Is there a way to crawl some ASP.NET pages that uses doPostBack as events calling?

How to continuously crawl a webpage for articles using Selenium in Python

August 22, 2022 by Magenaut

I’m trying to crawl bloomberg.com and find links for all English news articles. The problem with the below code is that, it does find a lot of articles from the first page but the it just goes into a loop that it does not return anything and goes once in a while.

Sending “User-agent” using Requests library in Python

August 21, 2022 by Magenaut

I want to send a value for "User-agent" while requesting a webpage using Python Requests. I am not sure is if it is okay to send this as a part of the header, as in the code below:

TypeError: can’t use a string pattern on a bytes-like object in re.findall()

August 18, 2022 by Magenaut

I am trying to learn how to automatically fetch urls from a page. In the following code I am trying to get the title of the webpage:

Scrapy – Reactor not Restartable

August 17, 2022 by Magenaut

“[…] starts a Twisted reactor, adjusts its pool size to REACTOR_THREADPOOL_MAXSIZE, and installs a DNS cache based on DNSCACHE_ENABLED and DNSCACHE_SIZE.”

Anyone know of a good Python based web crawler that I could use?

August 16, 2022 by Magenaut

I’m half-tempted to write my own, but I don’t really have enough time right now. I’ve seen the Wikipedia list of open source crawlers but I’d prefer something written in Python. I realize that I could probably just use one of the tools on the Wikipedia page and wrap it in Python. I might end … Read more

Click a Button in Scrapy

August 12, 2022 by Magenaut

I’m using Scrapy to crawl a webpage. Some of the information I need only pops up when you click on a certain button (of course also appears in the HTML code after clicking).