Software Management call for RFEs

Florian Weimer fweimer at redhat.com
Mon May 27 10:17:25 UTC 2013


On 05/27/2013 11:48 AM, Zdenek Pavlas wrote:
>> And there package diffs, which are ed-style diffs of the
>> Packages file I mentioned above.  This approach would work quite well
>> for primary.xml because it doesn't contain cross-references between
>> packages using non-natural keys.  It doesn't work for the SQLite
>> database, either in binary or SQL dump format, because of the reliance
>> on artificial primary keys (such as package IDs).
>
> I've once tried this. With about 10k packages in fedora-updates, the delta
> over 2-3 days was +491 -479. Assuming deletions are cheap, the delta should
> ideally be 5%. As expected, binary bsddiff yields much bigger (~29%) delta.

A line-wise diff is much smaller because dependencies and package 
descriptions mostly stay the same.  (This assumes consistent sorting of 
the primary.xml file.)

Can you point me to the primary.xml -> SQLite translation in yum?  I've 
got a fairly efficient primary.xml parser.  It might be interesting to 
see if it's possible to reduce the latency introduced by the SQLite 
conversion to close to zero.  (Decompression and INSERTs can be 
interleaved with downloading, and maybe the index creation improvements 
in SQLite are sufficient these days.)

>> However, for many users that follow unstable or testing, package diffs
>> are currently slower than downloading the full Packages file because the
>> diffs are incremental (i.e., they contain the changes from file version
>> N to N+1, and you have to apply all of them to get to the current
>> version) and apt-get can easily write 100 MB or more because the
>> Packages file is rewritten locally multiple times.
>
> Yes, patch chaining should be avoided.  I'd like to use N => 1 deltas,
> that could be applied to many recent snapshots.

The Debian package diffs could be combined efficiently in the client 
because it's possible to combine diffs for two adjacent versions without 
actually knowing what the old or new versions look like.  But this 
hasn't been implemented in APT because ABI impact (which is a bit 
puzzling, but anyway).  Instead, the diffs should soon be combined on 
the archive side.

-- 
Florian Weimer / Red Hat Product Security Team


More information about the devel mailing list