parallel-compression
Table of Contents
1 Many times faster (de)compression using multiple processors -
To effectively use a system with multiple processors to do file
(de)compression you need modified software. Most tools in common
use don't have this capability - which is often a big waste
considering nearly linear speedups can be achieved in most
cases.
Switching to a parallel (de)compressor will likely supress a
few minutes (possibly hours!) of time waiting from your life.
The list below can help.
1.1 gz format
http://zlib.net/pigz/ is "A parallel implementation of gzip for
modern multi-processor, multi-core machines", and also a replacement
(file-compatible) for the standard gzip/gunzip tools. Use pigz
-p<threads> or pigs -p <threads> (both forms are supported).
"For gz files compression can be parallelized effectively, but
decompression can't be parallelized, at least not without specially
prepared deflate streams for that purpose. As a result, pigz uses a
single thread (the main thread) for decompression, but will create
three other threads for reading, writing, and check calculation,
which can speed up decompression under some circumstances."
Full details here: http://zlib.net/pigz/pigz.pdf
1.2 bzip2 format
Parallel bzip2 at http://compression.ca/pbzip2/ is a drop-in
replacement for bzip2 that is capable of using multiple threads to
perform its work. Use the -p<threads> flag for this (no space).
As a side note, xz and lzip both provide better and faster
compression than bzip2 at the cost of a greater memory usage.
Some benchmarks: http://mattmahoney.net/dc/text.html
http://mattmahoney.net/dc/uiq/ (thanks, http://news.ycombinator.com/user?id=alecco)
1.3 lzma
There are multiple lzma-based
implementations & file formats.
http://lpar.ath0.com/2009/09/25/documentation-as-an-indicator-of-code-quality/ has some interesting notes about this.
1.3.1 lzip format (I am using this currently).
plzip (which can be fetched at http://lzip.nongnu.org/plzip.html) is
a drop-in replacement for http://lzip.nongnu.org/lzip.html that is
capable of using multiple threads for its work. Use the -n <threads>
flag.
1.3.2 xz format
- xz-utils
The 5.1.1alpha (2011-04-12) version of the standard xz-utils
(http://tukaani.org/xz/) has support for multiple threads of
compression using the -T <threads> flag.
- pxz
http://jnovy.fedorapeople.org/pxz/ also does the trick, but uses
temporary files and doesn't combine them until the whole file is
compressed.
- pixz
https://github.com/vasi/pixz does parallel compression.
gotcha: use the -t flag to avoid truncated output in a rare case.
1.4 pcompress, a new format build from scratch for parallel compression: http://moinakg.github.com/pcompress/
1.5 .zip file format.
I am not aware of any tools capable of generating or decompressing
.zip files in parallel. If you know of any please drop me a note at
igorhvr at (supress www from the domain name in this website).