Working with big data in python and numpy, not enough ram, how to save partial results on disc?
I am trying to implement algorithms for 1000-dimensional data with 200k+ datapoints in python. I want to use numpy, scipy, sklearn, networkx, and other useful libraries. I want to perform operations such as pairwise distance between all of the points and do clustering on all of the points. I have implemented working algorithms that perform what I want with reasonable complexity but when I try to scale them to all of my data I run out of RAM. Of course, I do, creating the matrix for pairwise distances on 200k+ data takes a lot of memory.