Fastest save and load options for a numpy array

I have a script that generates two-dimensional numpy arrays with dtype=float and shape on the order of (1e3, 1e6). Right now I’m using np.save and np.load to perform IO operations with the arrays. However, these functions take several seconds for each array. Are there faster methods for saving and loading the entire arrays (i.e., without making assumptions about their contents and reducing them)? I’m open to converting the arrays to another type before saving as long as the data are retained exactly.

How can I limit the output speed of stdout?

I’m running CentOS 5.7 and I have a backup utility that has the option of dumping its backup file to stdout. The backup file is rather large (multiple gigabytes). The target is an SSHFS filesystem. To ensure that I don’t hog the bandwidth and degrade the performance of the network, I would like to limit the speed with which data is written to the “disk”.

How can I monitor disk I/O in a particular directory?

I’ve got a few processes with a known name that all write to files in a single directory. I’d like to log the number of disk block reads and writes over a period (not just file access) to test whether a parameter change reduces the amount of I/O significantly. I’m currently using iostat -d -p, but that is limited to the whole partition.