On Sat, Jan 29, 2005 at 02:23:25PM -0500, seth vidal wrote:
On Sat, 2005-01-29 at 17:11 -0200, Alexandre Oliva wrote:
> On Jan 29, 2005, Axel Thimm <Axel.Thimm(a)ATrpms.net> wrote:
> > How about having multiple repodatas, the base one and small
> > incremental ones, the incremental ones containing also package
> > cancelations? As a side effect this would also reduce download
> > bandwidth and thus make even clients/users happy (not only repo
> > maintainers).
> +1. Heck, make it +2.
> I agree with Axel, for a change :-)
How would it reduce bandwidth - you'd have to download and parse
multiple entries and you'd STILL have to do just a much work on the
repo-side b/c you'd have to check all the packages for changes.
For N packages the ballanced load are log_2 N bins. Adding M packages
touches only log_2 M bins. And the bins have a max size of 2^i
packages where i goes from 0 to N-1. And the good news is you touch
the bins with i < M, e.g. the small ones.
The statistical net effect is that for M package additions to
arbitrary N you get log_2 M downloads of a total of 2M packages.
In relevant numbers:
o N~=4000, log_2 N~=12
You have 12 bins.
o 10 security/bug fix updates, (statistically) only bins 0 to 4 are
changed amounting to 32 packages.
Clients download only 5 files worth of 32 packages in size.
Compare with the current situation, where you need to get the whole
lot of N packages for each update.
For this to work you need to
o introduce package cancelation (anti-packages ;)
o introduce multiple repodata components
o keep a manifest of the last state and feed the repo creation system
with the differences (packages lost, packages gained).
It's rather common and very efficient in high performance statistics
of large sums.
Axel.Thimm at ATrpms.net