I have this program that check a website, and I want to know how can I check it via proxy in Python…
this is the code, just for example
while True:
try:
h = urllib.urlopen(website)
break
except:
print '['+time.strftime('%Y/%m/%d %H:%M:%S')+'] '+'ERROR. Trying again in a few seconds...'
time.sleep(5)
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
By default, urlopen uses the environment variable http_proxy to determine which HTTP proxy to use:
$ export http_proxy='http://myproxy.example.com:1234' $ python myscript.py # Using http://myproxy.example.com:1234 as a proxy
If you instead want to specify a proxy inside your application, you can give a proxies argument to urlopen:
proxies = {'http': 'http://myproxy.example.com:1234'}
print("Using HTTP proxy %s" % proxies['http'])
urllib.urlopen("http://www.google.com", proxies=proxies)
Edit: If I understand your comments correctly, you want to try several proxies and print each proxy as you try it. How about something like this?
candidate_proxies = ['http://proxy1.example.com:1234',
'http://proxy2.example.com:1234',
'http://proxy3.example.com:1234']
for proxy in candidate_proxies:
print("Trying HTTP proxy %s" % proxy)
try:
result = urllib.urlopen("http://www.google.com", proxies={'http': proxy})
print("Got URL using proxy %s" % proxy)
break
except:
print("Trying next proxy in 5 seconds")
time.sleep(5)
Method 2
Python 3 is slightly different here. It will try to auto detect proxy settings but if you need specific or manual proxy settings, think about this kind of code:
#!/usr/bin/env python3
import urllib.request
proxy_support = urllib.request.ProxyHandler({'http' : 'http://user:<a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="572736242417243225213225">[email protected]</a>:port',
'https': 'https://...'})
opener = urllib.request.build_opener(proxy_support)
urllib.request.install_opener(opener)
with urllib.request.urlopen(url) as response:
# ... implement things such as 'html = response.read()'
Refer also to the relevant section in the Python 3 docs
Method 3
Here example code guide how to use urllib to connect via proxy:
authinfo = urllib.request.HTTPBasicAuthHandler()
proxy_support = urllib.request.ProxyHandler({"http" : "http://ahad-haam:3128"})
# build a new opener that adds authentication and caching FTP handlers
opener = urllib.request.build_opener(proxy_support, authinfo,
urllib.request.CacheFTPHandler)
# install it
urllib.request.install_opener(opener)
f = urllib.request.urlopen('http://www.google.com/')
"""
Method 4
For http and https use:
proxies = {'http':'http://proxy-source-ip:proxy-port',
'https':'https://proxy-source-ip:proxy-port'}
more proxies can be added similarly
proxies = {'http':'http://proxy1-source-ip:proxy-port',
'http':'http://proxy2-source-ip:proxy-port'
...
}
usage
filehandle = urllib.urlopen( external_url , proxies=proxies)
Don’t use any proxies (in case of links within network)
filehandle = urllib.urlopen(external_url, proxies={})
Use proxies authentication via username and password
proxies = {'http':'http://username:<a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="c9b9a8bababea6bbad89b9bba6b1b0e4baa6bcbbaaace4a0b9">[email protected]</a>:proxy-port',
'https':'https://username:<a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="9dedfceeeeeaf2eff9ddedeff2e5e4b0eef2e8effef8b0f4ed">[email protected]</a>:proxy-port'}
Note: avoid using special characters such as
:,@in username and passwords
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0