Software Management call for RFEs
zpavlas at redhat.com
Mon May 27 09:48:17 UTC 2013
> And there package diffs, which are ed-style diffs of the
> Packages file I mentioned above. This approach would work quite well
> for primary.xml because it doesn't contain cross-references between
> packages using non-natural keys. It doesn't work for the SQLite
> database, either in binary or SQL dump format, because of the reliance
> on artificial primary keys (such as package IDs).
I've once tried this. With about 10k packages in fedora-updates, the delta
over 2-3 days was +491 -479. Assuming deletions are cheap, the delta should
ideally be 5%. As expected, binary bsddiff yields much bigger (~29%) delta.
Very roughly, it's 5% that really describe new packages, plus an almost
constant 24% overhead to fix up the inevitable changes in surrogate keys.
Not as bad as I was afraid, but still not worth it (IMO).
So, we need *.xml deltas. Yum can rebuild xml => .sqlite locally, but
this needs quite a lot of memory and takes TENS of seconds. Add the time
needed to patch the quite large uncompressed xml file, and suddenly the
fact that you're downloading just 1/10th of data hardly pays off
(ignoring very specific use cases, like mobile data for a moment)
For DNF, it's different. It has to rebuild xml => .solv anyway, so this
comes for free.
> However, for many users that follow unstable or testing, package diffs
> are currently slower than downloading the full Packages file because the
> diffs are incremental (i.e., they contain the changes from file version
> N to N+1, and you have to apply all of them to get to the current
> version) and apt-get can easily write 100 MB or more because the
> Packages file is rewritten locally multiple times.
Yes, patch chaining should be avoided. I'd like to use N => 1 deltas,
that could be applied to many recent snapshots.
More information about the devel