I’ve noticed that normally when packages are installed using various package managers (for python), they are installed in /home/user/anaconda3/envs/env_name/ on conda and in /home/user/anaconda3/envs/env_name/lib/python3.6/lib-packages/ using pip on conda.
But conda caches all the recently downloaded packages too.
So, my question is:
Why doesn’t conda install all the packages on a central location and then when installed in a specific environment create a link to the directory rather than installing it there?
I’ve noticed that environments grow quite big and that this method would probably be able to save a bit of space.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
Conda already does this. However, because it leverages hardlinks, it is easy to overestimate the space really being used, especially if one only looks at the size of a single env at a time.
To illustrate the case, let’s use du to inspect the real disk usage. First, if I count each environment directory individually, I get the uncorrected per env usage
$ for d in envs/*; do du -sh $d; done 2.4G envs/pymc36 1.7G envs/pymc3_27 1.4G envs/r-keras 1.7G envs/stan 1.2G envs/velocyto
which is what it might look like from a GUI.
Instead, if I let du count them together (i.e., correcting for the hardlinks), we get
$ du -sh envs/* 2.4G envs/pymc36 326M envs/pymc3_27 820M envs/r-keras 927M envs/stan 548M envs/velocyto
One can see that a significant amount of space is already being saved here.
Most of the hardlinks go back to the pkgs directory, so if we include that as well:
$ du -sh pkgs envs/* 8.2G pkgs 400M envs/pymc36 116M envs/pymc3_27 92M envs/r-keras 62M envs/stan 162M envs/velocyto
one can see that outside of the shared packages, the envs are fairly light. If you’re concerned about the size of my pkgs, note that I have never run conda clean on this system, so my pkgs directory is full of tarballs and superseded packages, plus some infrastructure I keep in base (e.g., Jupyter, Git, etc).
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0