F21 System Wide Change: lbzip2 as default bzip2 implementation

Mikolaj Izdebski mizdebsk at redhat.com
Fri Apr 4 14:15:59 UTC 2014


As I promised, I prepared a benchmark of lbzip2 and bzip2.
I also added pbzip2 for comparison.


Basic information
=================

Test date:     2014-04-04
Tester:        Mikolaj Izdebski
Test subjects: lbzip2 2.5
               bzip2 1.0.6
               pbzip2 1.1.6
Test purpose:  compare performance, memory usage and compression
               ratio of lbzip2, bzip2 and pbzip2 in Fedora

CPU:           Haswell B0, Genuine Intel(R) CPU @ 2.20GHz
bogomips:      4389.60
Processors:    56
NUMA Nodes:    2
Memory:        31966 MB

System:        Fedora release 20 (Heisenbug)
Arch:          x86_64
Inst. method:  anaconda 20.25.15-1 (kickstart)

File system:   tmpfs (/dev/shm)


Methodology
===========

Compress and decompress different payloads:
 - Linux kernel sources.
 - tarball created from /usr

Linux source tarball was chosen because it is a quite big bz2 file
which can be easily downloaded from the Internet to reproduce test
results.  MD5 sums are provided for reproducibility.

  linux-3.12.6.tar      544061440  02d8601f28c519a9d4d0a2ae99bb597a
  linux-3.12.6.tar.bz2   91104346  2e1e42cf9c164d8c24bc1e33bb3c7b2b

Tarball created by running "tar cf payload.tar /usr" was chosen
because it contains different types of data: text files, executables,
uncompressible files, while it should still allow to reproduce the
results quite easily.

  payload.tar      1463183360
  payload.tar.bz2   424518771

Each compression and decompression was ran three times.  The run with
median of real time (wall clock) was chosen, other two were rejected.

Times and memory usage were measured using GNU time utility.


Results
=======

real        - elapsed real time (wall clock, seconds)
user        - elapsed user time (seconds)
sys         - elapsed system time (seconds)
memory      - maximum resident set size (kbytes)
compr. size - size of resulting compressed file (bytes)


Decompression of linux-3.12.6.tar.bz2
-------------------------------------

command    |   real |   user |  sys | memory
-----------+--------+--------+------+-------
lbzip2     |   0.79 |  30.72 | 1.70 | 448804
lbzip2 -u  |   5.85 |  18.62 | 1.83 |  80992
pbzip2     |  24.48 |  24.27 | 0.61 |  98444
bzip2      |  23.95 |  23.46 | 0.44 |   4212


Compression of linux-3.12.6.tar
-------------------------------

command    |   real |   user |  sys | memory | compr. size
-----------+--------+--------+------+--------+------------
lbzip2     |   1.30 |  61.45 | 2.35 | 360280 | 91383535
lbzip2 -u  |   2.51 |  44.11 | 1.43 | 211456 | 91084544
pbzip2     |   2.69 | 105.79 | 4.11 | 488840 | 91411005
bzip2      |  66.16 |  65.82 | 0.22 |   7996 | 91104346


Decompression of payload.tar.bz2
--------------------------------

command    |   real |   user |  sys | memory
-----------+--------+--------+------+-------
lbzip2     |   2.19 |  95.16 | 3.81 | 750548
lbzip2 -u  |  23.34 |  60.31 | 5.04 | 120140
pbzip2     |  69.55 |  69.07 | 1.92 | 139060
bzip2      |  68.30 |  66.93 | 1.27 |   4216


Compression of payload.tar
--------------------------

command    |   real |   user |  sys | memory | compr. size
-----------+--------+--------+------+--------+------------
lbzip2     |   3.36 | 170.07 | 6.38 | 380448 | 424676188
lbzip2 -u  |   6.45 | 123.14 | 3.80 | 255524 | 424518771
pbzip2     |   6.78 | 288.33 | 8.90 | 491644 | 425213134
bzip2      | 176.68 | 175.76 | 0.67 |   8000 | 425108407


Conclusions
===========

Memory usage depended on number of threads used.  Difference of
memory usage between parallel and non-parallel runs can be ignored
as even parallel tools can be run in non-parallel mode.

"lbzip2" was the fastest compressor and decompressor in all tests.
It the best command for interactive use.

"lbzip2 -u" always produced smallest files (even smaller than bzip2)
while consuming the least amount of resources (CPU power and memory).
This directly translates to lowest bills in cloud, which makes "lbzip2
-u" the best choice here.

"pbzip2" did not allow parallel decompression.  During compression it
was always the slowest, used highest amounts of memory and offered the
worst compression ratio.

You don't have to believe this report, you are free to try lbzip2
and see for yourself.


-- 
Mikolaj Izdebski
Software Engineer, Red Hat
IRC: mizdebsk


More information about the devel mailing list