Changing user agent on urllib2.urlopen

How can I download a webpage with a user agent other than the default one on urllib2.urlopen?

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

I answered a similar question a couple weeks ago.

There is example code in that question, but basically you can do something like this: (Note the capitalization of User-Agent as of RFC 2616, section 14.43.)

opener = urllib2.build_opener()
opener.addheaders = [('User-Agent', 'Mozilla/5.0')]
response = opener.open('http://www.stackoverflow.com')

Method 2

headers = { 'User-Agent' : 'Mozilla/5.0' }
req = urllib2.Request('www.example.com', None, headers)
html = urllib2.urlopen(req).read()

Or, a bit shorter:

req = urllib2.Request('www.example.com', headers={ 'User-Agent': 'Mozilla/5.0' })
html = urllib2.urlopen(req).read()

Method 3

Setting the User-Agent from everyone’s favorite Dive Into Python.

The short story: You can use Request.add_header to do this.

You can also pass the headers as a dictionary when creating the Request itself, as the docs note:

headers should be a dictionary, and will be treated as if add_header() was called with each key and value as arguments. This is often used to “spoof” the User-Agent header, which is used by a browser to identify itself – some HTTP servers only allow requests coming from common browsers as opposed to scripts. For example, Mozilla Firefox may identify itself as "Mozilla/5.0 (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11", while urllib2‘s default user agent string is "Python-urllib/2.6" (on Python 2.6).

Method 4

For python 3, urllib is split into 3 modules…

import urllib.request
req = urllib.request.Request(url="http://localhost/", headers={'User-Agent':' Mozilla/5.0 (Windows NT 6.1; WOW64; rv:12.0) Gecko/20100101 Firefox/12.0'})
handler = urllib.request.urlopen(req)

Method 5

All these should work in theory, but (with Python 2.7.2 on Windows at least) any time you send a custom User-agent header, urllib2 doesn’t send that header. If you don’t try to send a User-agent header, it sends the default Python / urllib2

None of these methods seem to work for adding User-agent but they work for other headers:

opener = urllib2.build_opener(proxy)
opener.addheaders = {'User-agent':'Custom user agent'}
urllib2.install_opener(opener)

request = urllib2.Request(url, headers={'User-agent':'Custom user agent'})

request.headers['User-agent'] = 'Custom user agent'

request.add_header('User-agent', 'Custom user agent')

Method 6

For urllib you can use:

from urllib import FancyURLopener

class MyOpener(FancyURLopener, object):
    version = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11'

myopener = MyOpener()
myopener.retrieve('https://www.google.com/search?q=test', 'useragent.html')

Method 7

Another solution in urllib2 and Python 2.7:

req = urllib2.Request('http://www.example.com/')
req.add_unredirected_header('User-Agent', 'Custom User-Agent')
urllib2.urlopen(req)

Method 8

there are two properties of urllib.URLopener() namely:
addheaders = [('User-Agent', 'Python-urllib/1.17'), ('Accept', '*/*')] and
version = 'Python-urllib/1.17'.
To fool the website you need to changes both of these values to an accepted User-Agent. for e.g.
Chrome browser : 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.149 Safari/537.36'
Google bot : 'Googlebot/2.1'
like this

import urllib
page_extractor=urllib.URLopener()  
page_extractor.addheaders = [('User-Agent', 'Googlebot/2.1'), ('Accept', '*/*')]  
page_extractor.version = 'Googlebot/2.1'
page_extractor.retrieve(<url>, <file_path>)

changing just one property does not work because the website marks it as a suspicious request.

Method 9

Try this :

html_source_code = requests.get("http://www.example.com/",
                   headers={'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.107 Safari/537.36',
                            'Upgrade-Insecure-Requests': '1',
                            'x-runtime': '148ms'}, 
                   allow_redirects=True).content


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x