Request for Comments: updating RPMs using binary deltas.

Lamar Owen lowen at pari.edu
Thu Jan 8 16:56:16 UTC 2004


On Thursday 08 January 2004 11:15 am, seth vidal wrote:
> > 1.)	Use rsync or something similar to generate an incremental backup of

> This will beat up mirror/repository servers pretty badly.

s/mirrorrepository servers/build farms/

The rpmdiff would be generated by the build process, and then the rpmdiff 
would be uploaded.

> What 'original distributed RPM'? There could be hundreds of iterations.
> You'd need a pile of these files.

The 'Original distributed RPM' would be whatever was on the 'Fedora Core 1' 
ISO's as released.  For the kernel, for instance, that would be 2.4.22-1.2115 
as opposed to the current 1.2138.  We would have be be very careful in 
picking the baseline.

> What if I've installed a local rpm of the same package name and I want
> to update. The concept of 'pristine' is gonna bite you.

Require the CD/ISO/as-released RPM to be used.  Check its signature, or even 
its MD5/SHA checksum to make sure.  You may get bit anyway with locally 
generated stuff and full RPM updates.

> And if you can't guarantee that then the mirrors are still going to have
> to carry all this data and you still lose.

Yes, making sure the original RPM is really the Original RPM is critical.  But 
this was true with the rhmask mechanism Red Hat used for stuff like the 
pnserver RPM distributed with Red Hat 5.  Then you had to use the RPM found 
on the CD or it wouldn't work.

Having a MD5/SHA sum of the reconstruction can help check the integrity of the 
final RPM to be installed.

> So you've got a dialup user who will have to go through N steps in order
> to get updates that they may or may not need?

No.  The update tool picks the updates that need to be installed (just like it 
does now), and then asks for the user to insert the CD's.  It pulls the 
necessary originals off the CD's into a cache area, applies the deltas, and 
installs the updated RPMs after an integrity check.  As far as the user is 
concerned the only additional step is inserting the original CD.

> > 8.)	The updates repository enjoys being able to service many more users
> > per hour, since each user takes less time and less bandwidth.  And
> > hundreds of GB are no longer required for a full mirror of all the
> > updates.

> yes they would - you'd still need the original rpm or you'd never be
> able to recreate all the data, not to mention you'd still want the srpm
> around.

But the original RPM is typically in a separate tree already on the mirror, 
outside the updates tree.  SRPMS could be delta'd too, if you'd like.   If 
the mirror was only a mirror of the updates tree, then, yes, if you want to 
mirror the distribution's pristine RPMs you would need more space.  But why 
couldn't you mirror just the update deltas? 

Good points, though.  That's why I wanted comments, because I know I can't 
think of everything.  But something needs to be done about the size of 
updates when the patch that generated the update is just a few dozens of 
bytes, but generates a 50MB download.  Something is critically wrong with 
that picture.
-- 
Lamar Owen
Director of Information Technology
Pisgah Astronomical Research Institute
1 PARI Drive
Rosman, NC  28772
(828)862-5554
www.pari.edu





More information about the devel mailing list