What is the fastest way to send 100,000 HTTP requests in Python?

I am opening a file which has 100,000 URL’s. I need to send an HTTP request to each URL and print the status code. I am using Python 2.6, and so far looked at the many confusing ways Python implements threading/concurrency. I have even looked at the python concurrence library, but cannot figure out how to write this program correctly. Has anyone come across a similar problem? I guess generally I need to know how to perform thousands of tasks in Python as fast as possible – I suppose that means ‘concurrently’.

How do you send a HEAD HTTP request in Python 2?

What I’m trying to do here is get the headers of a given URL so I can determine the MIME type. I want to be able to see if http://somedomain/foo/ will return an HTML document or a JPEG image for example. Thus, I need to figure out how to send a HEAD request so that I can read the MIME type without having to download the content. Does anyone know of an easy way of doing this?

How to use Python to login to a webpage and retrieve cookies for later usage?

I want to download and parse webpage using python, but to access it I need a couple of cookies set. Therefore I need to login over https to the webpage first. The login moment involves sending two POST params (username, password) to /login.php. During the login request I want to retrieve the cookies from the response header and store them so I can use them in the request to download the webpage /data.php.

Using an HTTP PROXY – Python

I familiar with the fact that I should set the HTTP_RPOXY environment variable to the proxy address. Generally urllib works fine, the problem is dealing with urllib2. >>> urllib2.urlopen("http://www.google.com").read() returns urllib2.URLError: <urlopen error [Errno 10061] No connection could be made because the target machine actively refused it> or urllib2.URLError: <urlopen error [Errno 11004] getaddrinfo failed> … Read more