F21 System Wide Change: lbzip2 as default bzip2 implementation
mizdebsk at redhat.com
Fri Apr 4 17:31:58 UTC 2014
On 04/04/2014 07:01 PM, Susi Lehtola wrote:
> On Fri, 4 Apr 2014 12:49:25 -0400
> Matthew Miller <mattdm at fedoraproject.org> wrote:
>> On Fri, Apr 04, 2014 at 04:15:59PM +0200, Mikolaj Izdebski wrote:
>>> "lbzip2 -u" always produced smallest files (even smaller than bzip2)
>>> while consuming the least amount of resources (CPU power and memory).
>>> This directly translates to lowest bills in cloud, which makes "lbzip2
>>> -u" the best choice here.
>> But... the size difference in your test cases appear to be 0.1% and
>> 0.02%. Am I reading that right? And, compressing linux-3.12.6.tar with xz
>> instead of bzip2 gives a 15.6%, or with xz -9, 19.7%. Of course, that's very
>> slow, and the other resource factors are important too. (And lbzip2 is
>> impressively fast.)
> Well, looking at the table, I calculate size differences of -0.10% and
> -0.14% for lbzip2 and lbzip2 -u, respectively, compared to bzip2 for
> compression of payload.tar.
In general lbzip2 has compression ratio very close to bzip2.
lbzip2 -u almost always produces marginally smaller files than bzip2.
Without -u it varies. Sometimes lbzip2 produces marginally bigger,
sometimes smaller bz2 files.
For some types of data bz2 compression works better than xz. Examples:
sparse disk images containing lots of zeroes, or genome DNA sequences.
$ dd if=/dev/zero of=zero bs=1000000 count=100
$ lbzip2 -ku zero
$ xz -k zero
-rw-rw-r--. 1 100000000 Apr 4 19:17 zero
-rw-rw-r--. 1 113 Apr 4 19:17 zero.bz2
-rw-rw-r--. 1 14676 Apr 4 19:17 zero.xz
xz doesn't allow parallel decompression in general. When restoring
backups you are under time pressure and fast decompression can come very
When xz file is damaged then all data succeeding the point of damage is
lost. But lbzrecover tool from lbzip2-utils allows easy recovery of
data from undamaged parts of any bz2 file.
Personally, for above reasons I recommend people to use lbzip2 for
backups rather than xz. But I admit xz is a better format for some use
Software Engineer, Red Hat
More information about the devel