Stream large binary files with urllib2 to file

I use the following code to stream large files from the Internet into a local file:

fp = open(file, 'wb')
req = urllib2.urlopen(url)
for line in req:
    fp.write(line)
fp.close()

This works but it downloads quite slowly. Is there a faster way? (The files are large so I don’t want to keep them in memory.)

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

No reason to work line by line (small chunks AND requires Python to find the line ends for you!-), just chunk it up in bigger chunks, e.g.:

# from urllib2 import urlopen # Python 2
from urllib.request import urlopen # Python 3

response = urlopen(url)
CHUNK = 16 * 1024
with open(file, 'wb') as f:
    while True:
        chunk = response.read(CHUNK)
        if not chunk:
            break
        f.write(chunk)

Experiment a bit with various CHUNK sizes to find the “sweet spot” for your requirements.

Method 2

You can also use shutil:

import shutil
try:
    from urllib.request import urlopen # Python 3
except ImportError:
    from urllib2 import urlopen # Python 2

def get_large_file(url, file, length=16*1024):
    req = urlopen(url)
    with open(file, 'wb') as fp:
        shutil.copyfileobj(req, fp, length)

Method 3

I used to use mechanize module and its Browser.retrieve() method. In the past it took 100% CPU and downloaded things very slowly, but some recent release fixed this bug and works very quickly.

Example:

import mechanize
browser = mechanize.Browser()
browser.retrieve('http://www.kernel.org/pub/linux/kernel/v2.6/testing/linux-2.6.32-rc1.tar.bz2', 'Downloads/my-new-kernel.tar.bz2')

Mechanize is based on urllib2, so urllib2 can also have similar method… but I can’t find any now.

Method 4

You can use urllib.retrieve() to download files:

Example:

try:
    from urllib import urlretrieve # Python 2

except ImportError:
    from urllib.request import urlretrieve # Python 3

url = "http://www.examplesite.com/myfile"
urlretrieve(url,"./local_file")


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x