I do a ton of file compression. Most of the stuff I am compressing is just code, so I need to use lossless compression.
I wondered if there was anything that offers a better size reduction than 7zip. It doesn’t matter how long it takes to compress or decompress; size is all that matters.
Does anyone know how the various tools and compression algorithms available in Linux compare for compressing text? Or is 7zip the best for compressing source code?
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
lrzip is what you’re really looking for, especially if you’re compressing source code!
Quoting the README:
This is a compression program optimised for large files. The larger
the file and the more memory you have, the better the compression
advantage this will provide, especially once the files are larger than
100MB. The advantage can be chosen to be either size (much smaller
than bzip2) or speed (much faster than bzip2).
[…]The unique feature of lrzip is that it tries to make the most of the available ram in your system at all times for maximum benefit.
lrzip works by first scanning for and removing any long-distance data redundancy with an rzip-based algorithm, then compressing the non-redundant data.
Con Kolivas provides a fantastic example in the Linux Kernel Mailing List; wherein he compresses a 10.3GB tarball of forty Linux Kernel releases down to 163.9MB (1.6%), and does so faster than xz. He wasn’t even using the most aggressive second-pass algorithm!
I’m sure you’ll have great results compressing massive tarballs of source code 🙂
sudo apt-get install lrzip
Example (using default for others options):
Ultra compression, dog slow:
lrzip -z file
For folders, just change lrzip for lrztar
Method 2
7zip is more a compactor (like PKZIP) than a compressor. It’s available for Linux, but it can only create compressed archives in regular files, it’s not able to compress a stream for instance. It’s not able to store most of Unix file attributes like ownership, ACLs, extended attributes, hard links…
On Linux, as a compressor, you’ve got xz that uses the same compression algorithm as 7zip (LZMA2). You can use it to compress tar archives.
Like for gzip and bzip2, there’s a parallel variant pixz that can leverage several processors to speed up the compression (xz can also do it natively since version 5.2.0 with the -T option). The pixz variant also supports indexing a compressed tar archive which means it’s able to extract a single file without having to uncompress the file from the start.
Method 3
If you’re looking for greatest size reduction regardless of compression speed, LZMA is likely your best option.
When comparing the various compressions, generally the tradeoff is time vs. size. gzip tends to compress and decompress relatively quickly while yielding a good compression ratio. bzip2 is somewhat slower than gzip both in compression and decompression time, but yields even greater compression ratios. LZMA has the longest compression time but yields the best ratios while also having a decompression rate outperforming that of bzip2.
http://tukaani.org/lzma/benchmarks.html
Method 4
(updated answer) If time doesn’t matter, use ZPAQ v1.10 (or newer) ex.:
zpaq pvc/usr/share/doc/zpaq/examples/max.cfg file.zpaq file.tar
(the max.cfg file location may vary, check on your installed package file list)
zpaq actually compressed more than kgb -9 newFileName.kgb yourFileName.tar.
That is based on older algorithm PAQ6, and is very slow…
I tested with all other compressors like 7zip, lrzip, bzip2, kgb.. and zpaq compressed most!
If kgb still interests you tho: (as it was my initial choice on this answer, so I am keeping the information here)
Ubuntu 14.04 has kgb 1.0b4, run sudo apt-get install kgb to install it.
Below is about a windows version that you can try to run/compile kgb on linux, but I did not succeed.
Version 2 beta2 can be found on SourceForge, but no Linux binaries are available. You can try to run it in console with wine kgb2_console.exe -a7 -m9 (method -a6 -m9 seems to be equivalent to the best method in 1.0b4, -a7 is new in 2 beta2). Though I had better stability by installing .NET 2.0 with winetricks and running wine "KGB Archiver 2 .net.exe" (I don’t like a little bit doing that, so I will stick with native Linux 1.0b4 that has almost the same result as 2 beta2).
Anyway, version 2 beta2 seriously deserves a Linux native version too! Maybe something can be accomplished with MinGW, see this, but this command still fails badly: i586-mingw32msvc-g++ kgb2_console.cpp -o kgb. May be try to compile it with dmcs (Mono)? see this tip.
Method 5
7zip is no unique technology, but supports several different compression methods (see wikipedia 7z on that).
A set of tests was performed with different tools specially for C source files. I’m not sure which of the tools exist for Linux if they still exist. However, you may note that the best algorithm was PPM with modifications (PPMII, then PPMZ).
If you are interested in the tools, you can browse the site, it’s in Russian but google translate may help. There is a big deposit of binaries, which you may use (or won’t be able) from Linux with wine, if really needed.
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0