F21 System Wide Change: lbzip2 as default bzip2 implementation

Mikolaj Izdebski mizdebsk at redhat.com
Thu Apr 3 18:29:47 UTC 2014


On 04/03/2014 06:08 PM, Miloslav Trmač wrote:
> Looking at http://lbzip2.org/news , lbzip2 is still fixing crashes during
> compression and decompression.  That's rather troubling: we need the bzip2
> implementation to be roughly as stable as file system*.*

They say that every non-trivial piece of software has at least one bug.
 bzip2 also has bugs, myself I am aware of a few of them.

> The Change page
> implies that bzip2 is not actively maintained; that may be true but looking
> at bugzilla.redhat.com, there has AFAICT never been a bug reporting that
> something can't be compressed or decompressed--that's a *very* high bar to
> match.  (I do appreciate that assertion failure and silent miscompression
> are not the same thing.)

Neither was for lbzip2.  And as a matter of fact, bzip2 is compiled with
most of assertions disabled.  But I understand the point.

It may be true that bzip2 is more stable, but that's because it has been
given a chance being included in popular operating systems and after
initial bugs were fixed it has been there without any changes for years.

I care about lbzip2 quality very much.  I run a test suite consisting of
over 320,000 automated test cases, I compile it with all possible
warnings enabled I test it with multiple static analysis tools.  Any
bugs that may be found during testing in Fedora will be taken care of
with high priority.

I believe that lbzip2 deserves to be given a chance and if for some
reason it turns out not to be ready, the Change can be reverted very
easily with a single spec file modification.

> Having the library implementation and the command-line implementation
> completely separate may frustrate debugging efforts when using an
> application-builtin compression and saving uncompressed and compressing
> manually may give different results.  That's not a deal-breaker but having
> a single implementation would certainly simplify things.

Users will still be able to run bzip2 explicitly if needed or configure
alternatives to used it as implementation of /usr/bin/bzip2 on their
systems.

Besides that I am willing to provide a library interface for lbzip2 in
future if there is demand.

> Ultimately the easiest way to make this implementation change happen, not
> only in Fedora but in all distributions, would be for the improvements to
> be integrated into the upstream bzip2 codebase; has that possibility been
> explored at all?

lbzip2 as it is now is a merger of 2 projects -- a parallel bzip2-like
tool using libbz2 by Laszlo Ersek (started in 2008) and improved
low-level bzip2 library by me (started in 2007).  The 2 projects were
merged in 2010 and since then I took maintenance of lbzip2.

While I would like the improvements and new features of lbzip2 to be
included in bzip2, I lost my hopes of this ever happening.  Both Laszlo
me and tried contacting Julian Seward (the author or bzip2) and
contributing to bzip2, but without any success.

Initially Julian admited that it would be desirable to parallelise
bzip2.  The plan was to evaluate existing implementations and decide
which one to integrate with bzip2, or start the work from scratch.  But
nothing of that happened.  The last conversation about improving bzip2
took place in 2009 and only a single patch was included in bzip2 since
then -- a fix for important security bug which I discovered
(CVE-2010-0405 [1], it took it 6 months to be applied in bzip2).

That said, I am still willing to cooperate with Julian and discuss
possibilities of merging some code or improving bzip2 in any way.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=CVE-2010-0405

-- 
Mikolaj Izdebski
Software Engineer, Red Hat
IRC: mizdebsk


More information about the devel mailing list