Software Management call for RFEs

Zdenek Pavlas zpavlas at redhat.com
Mon May 27 09:48:17 UTC 2013


> And there package diffs, which are ed-style diffs of the
> Packages file I mentioned above.  This approach would work quite well
> for primary.xml because it doesn't contain cross-references between
> packages using non-natural keys.  It doesn't work for the SQLite
> database, either in binary or SQL dump format, because of the reliance
> on artificial primary keys (such as package IDs).

I've once tried this. With about 10k packages in fedora-updates, the delta
over 2-3 days was +491 -479. Assuming deletions are cheap, the delta should
ideally be 5%. As expected, binary bsddiff yields much bigger (~29%) delta.

Very roughly, it's 5% that really describe new packages, plus an almost
constant 24% overhead to fix up the inevitable changes in surrogate keys.
Not as bad as I was afraid, but still not worth it (IMO).

So, we need *.xml deltas.  Yum can rebuild xml => .sqlite locally, but
this needs quite a lot of memory and takes TENS of seconds.  Add the time
needed to patch the quite large uncompressed xml file, and suddenly the
fact that you're downloading just 1/10th of data hardly pays off
(ignoring very specific use cases, like mobile data for a moment)

For DNF, it's different.  It has to rebuild xml => .solv anyway, so this
comes for free.

> However, for many users that follow unstable or testing, package diffs
> are currently slower than downloading the full Packages file because the
> diffs are incremental (i.e., they contain the changes from file version
> N to N+1, and you have to apply all of them to get to the current
> version) and apt-get can easily write 100 MB or more because the
> Packages file is rewritten locally multiple times.

Yes, patch chaining should be avoided.  I'd like to use N => 1 deltas,
that could be applied to many recent snapshots.


More information about the devel mailing list