How do I prevent Python’s urllib(2) from following a redirect

I am currently trying to log into a site using Python however the site seems to be sending a cookie and a redirect statement on the same page. Python seems to be following that redirect thus preventing me from reading the cookie send by the login page. How do I prevent Python’s urllib (or urllib2) urlopen from following the redirect?

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

You could do a couple of things:

  1. Build your own HTTPRedirectHandler that intercepts each redirect
  2. Create an instance of HTTPCookieProcessor and install that opener so that you have access to the cookiejar.

This is a quick little thing that shows both

import urllib2

#redirect_handler = urllib2.HTTPRedirectHandler()

class MyHTTPRedirectHandler(urllib2.HTTPRedirectHandler):
    def http_error_302(self, req, fp, code, msg, headers):
        print "Cookie Manip Right Here"
        return urllib2.HTTPRedirectHandler.http_error_302(self, req, fp, code, msg, headers)

    http_error_301 = http_error_303 = http_error_307 = http_error_302

cookieprocessor = urllib2.HTTPCookieProcessor()

opener = urllib2.build_opener(MyHTTPRedirectHandler, cookieprocessor)
urllib2.install_opener(opener)

response =urllib2.urlopen("WHEREEVER")
print response.read()

print cookieprocessor.cookiejar

Method 2

If all you need is stopping redirection, then there is a simple way to do it. For example I only want to get cookies and for a better performance I don’t want to be redirected to any other page. Also I hope the code is kept as 3xx. let’s use 302 for instance.

class MyHTTPErrorProcessor(urllib2.HTTPErrorProcessor):

    def http_response(self, request, response):
        code, msg, hdrs = response.code, response.msg, response.info()

        # only add this line to stop 302 redirection.
        if code == 302: return response

        if not (200 <= code < 300):
            response = self.parent.error(
                'http', request, response, code, msg, hdrs)
        return response

    https_response = http_response

cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj), MyHTTPErrorProcessor)

In this way, you don’t even need to go into urllib2.HTTPRedirectHandler.http_error_302()

Yet more common case is that we simply want to stop redirection (as required):

class NoRedirection(urllib2.HTTPErrorProcessor):

    def http_response(self, request, response):
        return response

    https_response = http_response

And normally use it this way:

cj = cookielib.CookieJar()
opener = urllib2.build_opener(NoRedirection, urllib2.HTTPCookieProcessor(cj))
data = {}
response = opener.open('http://www.example.com', urllib.urlencode(data))
if response.code == 302:
    redirection_target = response.headers['Location']

Method 3

urllib2.urlopen calls build_opener() which uses this list of handler classes:

handlers = [ProxyHandler, UnknownHandler, HTTPHandler,
HTTPDefaultErrorHandler, HTTPRedirectHandler,
FTPHandler, FileHandler, HTTPErrorProcessor]

You could try calling urllib2.build_opener(handlers) yourself with a list that omits HTTPRedirectHandler, then call the open() method on the result to open your URL. If you really dislike redirects, you could even call urllib2.install_opener(opener) to your own non-redirecting opener.

It sounds like your real problem is that urllib2 isn’t doing cookies the way you’d like. See also How to use Python to login to a webpage and retrieve cookies for later usage?

Method 4

This question was asked before here.

EDIT: If you have to deal with quirky web applications you should probably try out mechanize. It’s a great library that simulates a web browser. You can control redirecting, cookies, page refreshes… If the website doesn’t rely [heavily] on JavaScript, you’ll get along very nicely with mechanize.


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x