parallel-compression

Table of Contents

1 Many times faster (de)compression using multiple processors - 2012-03-07 Wed

To effectively use a system with multiple processors to do file
(de)compression you need modified software. Most tools in common
use don't have this capability - which is often a big waste
considering nearly linear speedups can be achieved in most
cases.


Switching to a parallel (de)compressor will likely supress a
few minutes (possibly hours!) of time waiting from your life.


The list below can help.

1.1 gz format

http://zlib.net/pigz/ is "A parallel implementation of gzip for
modern multi-processor, multi-core machines", and also a replacement
(file-compatible) for the standard gzip/gunzip tools. Use pigz
-p<threads> or pigs -p <threads> (both forms are supported).


"For gz files compression can be parallelized effectively, but
decompression can't be parallelized, at least not without specially
prepared deflate streams for that purpose. As a result, pigz uses a
single thread (the main thread) for decompression, but will create
three other threads for reading, writing, and check calculation,
which can speed up decompression under some circumstances."


Full details here: http://zlib.net/pigz/pigz.pdf


1.2 bzip2 format

Parallel bzip2 at http://compression.ca/pbzip2/ is a drop-in
replacement for bzip2 that is capable of using multiple threads to
perform its work. Use the -p<threads> flag for this (no space).


As a side note, xz and lzip both provide better and faster
compression than bzip2 at the cost of a greater memory usage.


Some benchmarks: http://mattmahoney.net/dc/text.html
http://mattmahoney.net/dc/uiq/ (thanks, http://news.ycombinator.com/user?id=alecco)

1.3 lzma

There are multiple lzma-based
implementations & file formats.
http://lpar.ath0.com/2009/09/25/documentation-as-an-indicator-of-code-quality/ has some interesting notes about this.

1.3.1 lzip format (I am using this currently).

plzip (which can be fetched at http://lzip.nongnu.org/plzip.html) is
a drop-in replacement for http://lzip.nongnu.org/lzip.html that is
capable of using multiple threads for its work. Use the -n <threads>
flag.

1.3.2 xz format

  • xz-utils
    The 5.1.1alpha (2011-04-12) version of the standard xz-utils
    (http://tukaani.org/xz/) has support for multiple threads of
    compression using the -T <threads> flag.

1.4 pcompress, a new format build from scratch for parallel compression: http://moinakg.github.com/pcompress/

1.5 .zip file format.

I am not aware of any tools capable of generating or decompressing
.zip files in parallel. If you know of any please drop me a note at
igorhvr at (supress www from the domain name in this website).

Author: Igor Hjelmstrom Vinhas Ribeiro (igor.ribeiro@movile.com)

Date: 2013-01-09 09:28:25 BRST

Generated by Org version 7.8.02 with Emacs version 23

Validate XHTML 1.0