I’m uploading potentially large files to a web server. Currently I’m doing this:
import urllib2
f = open('somelargefile.zip','rb')
request = urllib2.Request(url,f.read())
request.add_header("Content-Type", "application/zip")
response = urllib2.urlopen(request)
However, this reads the entire file’s contents into memory before posting it. How can I have it stream the file to the server?
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
Reading through the mailing list thread linked to by systempuntoout, I found a clue towards the solution.
The mmap module allows you to open file that acts like a string. Parts of the file are loaded into memory on demand.
Here’s the code I’m using now:
import urllib2
import mmap
# Open the file as a memory mapped string. Looks like a string, but
# actually accesses the file behind the scenes.
f = open('somelargefile.zip','rb')
mmapped_file_as_string = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
# Do the request
request = urllib2.Request(url, mmapped_file_as_string)
request.add_header("Content-Type", "application/zip")
response = urllib2.urlopen(request)
#close everything
mmapped_file_as_string.close()
f.close()
Method 2
Have you tried with Mechanize?
from mechanize import Browser
br = Browser()
br.open(url)
br.form.add_file(open('largefile.zip'), 'application/zip', 'largefile.zip')
br.submit()
or, if you don’t want to use multipart/form-data, check this old post.
It suggests two options:
1. Use mmap, Memory Mapped file object 2. Patch httplib.HTTPConnection.send
Method 3
The documentation doesn’t say you can do this, but the code in urllib2 (and httplib) accepts any object with a read() method as data. So using an open file seems to do the trick.
You’ll need to set the Content-Length header yourself. If it’s not set, urllib2 will call len() on the data, which file objects don’t support.
import os.path
import urllib2
data = open(filename, 'r')
headers = { 'Content-Length' : os.path.getsize(filename) }
response = urllib2.urlopen(url, data, headers)
This is the relevant code that handles the data you supply. It’s from the HTTPConnection class in httplib.py in Python 2.7:
def send(self, data):
"""Send `data' to the server."""
if self.sock is None:
if self.auto_open:
self.connect()
else:
raise NotConnected()
if self.debuglevel > 0:
print "send:", repr(data)
blocksize = 8192
if hasattr(data,'read') and not isinstance(data, array):
if self.debuglevel > 0: print "sendIng a read()able"
datablock = data.read(blocksize)
while datablock:
self.sock.sendall(datablock)
datablock = data.read(blocksize)
else:
self.sock.sendall(data)
Method 4
Try pycurl. I don’t have anything setup will accept a large file that isn’t in a multipart/form-data POST, but here’s a simple example that reads the file as needed.
import os
import pycurl
class FileReader:
def __init__(self, fp):
self.fp = fp
def read_callback(self, size):
return self.fp.read(size)
c = pycurl.Curl()
c.setopt(pycurl.URL, url)
c.setopt(pycurl.UPLOAD, 1)
c.setopt(pycurl.READFUNCTION, FileReader(open(filename, 'rb')).read_callback)
filesize = os.path.getsize(filename)
c.setopt(pycurl.INFILESIZE, filesize)
c.perform()
c.close()
Method 5
Using the requests library you can do
with open('massive-body', 'rb') as f:
requests.post('http://some.url/streamed', data=f)
as mentioned here in their docs
Method 6
Below is the working example for both Python 2 / Python 3:
try:
from urllib2 import urlopen, Request
except:
from urllib.request import urlopen, Request
headers = { 'Content-length': str(os.path.getsize(filepath)) }
with open(filepath, 'rb') as f:
req = Request(url, data=f, headers=headers)
result = urlopen(req).read().decode()
The requests module is great, but sometimes you cannot install any extra modules…
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0