I’m trying to use Python to download the HTML source code of a website but I’m receiving this error.
Traceback (most recent call last):
File "C:UsersSergio.TapiaDocumentsNetBeansProjectsDICParsersrcWebDownload.py", line 3, in <module>
file = urllib.urlopen("http://www.python.org")
AttributeError: 'module' object has no attribute 'urlopen'
I’m following the guide here: http://www.boddie.org.uk/python/HTML.html
import urllib
file = urllib.urlopen("http://www.python.org")
s = file.read()
f.close()
#I'm guessing this would output the html source code?
print(s)
I’m using Python 3.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
This works in Python 2.x.
For Python 3 look in the docs:
import urllib.request
with urllib.request.urlopen("http://www.python.org") as url:
s = url.read()
# I'm guessing this would output the html source code ?
print(s)
Method 2
A Python 2+3 compatible solution is:
import sys
if sys.version_info[0] == 3:
from urllib.request import urlopen
else:
# Not Python 3 - today, it is most likely to be Python 2
# But note that this might need an update when Python 4
# might be around one day
from urllib import urlopen
# Your code where you can use urlopen
with urlopen("http://www.python.org") as url:
s = url.read()
print(s)
Method 3
import urllib.request as ur
s = ur.urlopen("http://www.google.com")
sl = s.read()
print(sl)
In Python v3 the “urllib.request” is a module by itself, therefore “urllib” cannot be used here.
Method 4
To get ‘dataX = urllib.urlopen(url).read()‘ working in python3 (this would have been correct for python2) you must just change 2 little things.
1: The urllib statement itself (add the .request in the middle):
dataX = urllib.request.urlopen(url).read()
2: The import statement preceding it (change from ‘import urlib’ to:
import urllib.request
And it should work in python3 🙂
Method 5
Change TWO lines:
import urllib.request #line1
#Replace
urllib.urlopen("http://www.python.org")
#To
urllib.request.urlopen("http://www.python.org") #line2
If You got ERROR 403: Forbidden Error exception try this:
siteurl = "http://www.python.org"
req = urllib.request.Request(siteurl, headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.100 Safari/537.36'})
pageHTML = urllib.request.urlopen(req).read()
I hope your problem resolved.
Method 6
import urllib.request as ur
filehandler = ur.urlopen ('http://www.google.com')
for line in filehandler:
print(line.strip())
Method 7
For python 3, try something like this:
import urllib.request
urllib.request.urlretrieve('http://crcv.ucf.edu/THUMOS14/UCF101/UCF101/v_YoYo_g19_c02.avi', "video_name.avi")
It will download the video to the current working directory
Method 8
Solution for python3:
from urllib.request import urlopen url = 'http://www.python.org' file = urlopen(url) html = file.read() print(html)
Method 9
import urllib
import urllib.request
from bs4 import BeautifulSoup
with urllib.request.urlopen("http://www.newegg.com/") as url:
s = url.read()
print(s)
soup = BeautifulSoup(s, "html.parser")
all_tag_a = soup.find_all("a", limit=10)
for links in all_tag_a:
#print(links.get('href'))
print(links)
Method 10
One of the possible way to do it:
import urllib
...
try:
# Python 2
from urllib2 import urlopen
except ImportError:
# Python 3
from urllib.request import urlopen
Method 11
If your code uses Python version 2.x, you can do the following:
from urllib.request import urlopen
urlopen(url)
By the way, I suggest another module called requests, which is more friendly to use. You can use pip install it, and use it like this:
import requests
requests.get(url)
requests.post(url)
Method 12
Use the third-party six module to make your code compatible between Python2 and Python3.
from six.moves import urllib
urllib.request.urlopen("<your-url>")
Method 13
imgResp = urllib3.request.RequestMethods.urlopen(url)
Add this RequestMethods before using urlopen
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0