yum differential updates

Jon Burgess jburgess at uklinux.net
Mon Apr 10 18:52:42 UTC 2006


On Mon, 2006-04-10 at 13:35 -0400, Jesse Keating wrote:
> On Mon, 2006-04-10 at 19:19 +0200, Rudolf Kastl wrote:
> > 
> > well the reason for beeing able to mirror the update repos with a
> > permanently updated torrent would be simply that people are able to
> > share the bandwidth they have open. rsync causes lots of server load
> > afaik. mirroring is useful for a variety of reasons. 
> 
> But does torrent offer the ability that rsync does, to only grab the
> differences?  If you're re-torrenting the whole thing every day that
> seems less than optimal.

It only grabs the differences, but my understanding is that every
operation is done in units of one "piece". The pieces are all of a fixed
size which is set when the torrent file is created, e.g. 256kB in
bordeaux-DVD-i386.torrent. 

I think it would work as follows:-

1) RH create a torrent with all current updates and publish on tracker
and start seeding. 

2) A user starts off with no updates, downloads torrent. Downloads all
the files updates from the seed and other users (other users that have
been doing the same thing).

3) Some time later, RH publish a new torrent which has a mixture of some
of the old files, with some added and some removed.

4) User downloads new torrent. The user adds this to his torrent
program, making sure to select the same location as the previous
download (this is key).

5) The torrent software will go through every file listed in the new
torrent, some of which will be found and some will not.

6) Every "piece" in the new tracker will be part of one or more files.
If the user has all the bytes contained in the piece then the software
will checksum them to ensure they are correct and then note that this
piece is already downloaded. 

7) Pieces which have missing data, e.g. the piece contains data from a
file which the user doesn't have, then the software will ignore the
current contents of the piece and put it in the list of pieces which
need to be downloaded. 

8) The software proceeds to exchange pieces with other users and the
seeds to collect all pieces of the torrent. As each is received it
verifies the checksum and writes the contents out to the appropriate
files.

A long list of observations and thoughts:-

- The user must keep downloading to the same location to gain the
benefit (/var/cache/yum/update/packages might be good).

- The downloads are not as efficient as a delta-RPM since the torrent
will still need to download the complete contents of any new RPM. It
does however, reduce the load on the mirror system.

- The torrent will only exchange data with users running exactly the
same torrent file, so if you are the first one to download a new RH
torrent then there will be no-one else to get data from (except the
initial seed). 

- Due to the problem above, it probably makes sense to only update the
torrent infrequently (maybe once per week). The user should probably
rely on using the normal yum mechanisms to download the very latest
updates. Provided these get done to the same location as the torrent
download and are cached then they won't be downloaded again once the
torrent is updated (the user will immeadiately act as a seed for these
once he gets the updated torrent).

- Nothing will automatically remove old files the users download
location. "yum clean packages" would remove the downloaded files, but
the torrent would then have to download all the current updates again.

- The user may need to make available several GB of storage to hold all
the updates even though he might never install some of these RPMs on his
system.

- It would probably make sense to create separate torrents for the
normal and debug RPMs. I guess there should be 2 torrents per
ARCH/Release pair, plus maybe a SRPM torrent.

- Some users will be unable to use the torrent since they are behind
corporate firewalls which block it, it isn't a replacement for yum.

- The new "LAN peer mode" in Azureus may enable clients to exchange
pieces on a local network at high speeds minimising the need to download
from the Internet. This would be a useful addition to the current yum
behaviour.

- Yum may need to be a little smarter about making certain that RPMs
have the right checksum before using them. I know RPM does verify
checksums, but I don't think yum does right now. The torrent will
typically create all the new RPMS with 0 length and then use sparse
writes to reconstitute the file once piece at a time whenever it
receives some data. If the torrent download is ceased then many files
will not contain the complete data (even though the length may be
correct). Yum might like delete the file and re-download it.

- There might be scope for a specialised torrent client to automate some
of the behaviour above, e.g. pro-actively downloading new torrents,
perhaps only downloading "pieces" which are contain data relevant to
updates of RPMs currently installed (a client doesn't have to download
and store the complete torrent). Deleting files which are no longer
present in the latest torrent.

	Jon





More information about the test mailing list