InstantMirror Proposal Re: ApacheMirror.py for a site-local Fedora mirror

Ed Swierk eswierk at arastra.com
Tue Nov 20 16:21:49 UTC 2007


On 11/19/07, Warren Togami <wtogami at redhat.com> wrote:
> http://fedoraproject.org/wiki/Infrastructure/ProjectHosting/RequestingNewProject
> Could you please create an "upstream" project for it at
> hosted.fedoraproject.org?  I think there are a number of improvements
> that can be made.

Done. I like the name InstantMirror.

> I didn't read deeply into your code yet, but I imagine that it needs
> improvement to handle unique synchronization and expiration issues that
> yum repos and rawhide install trees create when file contents change
> without changing filenames.

If a requested file already exists in the local mirror, the handler
compares the Last-Modified time of the upstream file with the local
file, and downloads the file if the upstream version is newer. I'm not
familiar with rawhide, but this seems to work okay for the updates
repos where metadata files are frequently regenerated. It doesn't
remove files that no longer exist upstream, of course.

> Perhaps a separate, asynchronous daemon can monitor upstream (via HTTP
> or whatever) for repomd.xml changes.  It should then parse the
> repomd.xml so it knows when to expire the repodata/* files.  Then it
> should parse the .xml files in repodata/ to compare it to local storage,
> and intelligently expire the packages if any changed (as happens during
> signing).  It can then know exactly which files to delete from the local
> cache because they are no longer in the upstream.  This daemon interacts
> with ApacheMirror.py only in deleting files from the local directories,
> effectively expiring the cache.  Very simple.
>
> That daemon could be configured to handle intelligent expiry of various
> parts of the mirror tree in different ways.  For example:
> - development (rawhide) repo changes at least once per day.  It also
> contains install images (boot.iso, bootdisk.img, stage2, etc.) that need
> to be expired every time the tree changes.  (We might need to add a
> hashes file to the mirror tree to allow the tool to monitor these changes.)
> - Released distros never change, so don't need to monitor their
> repomd.xml for changes.

An even simpler approach is to have the daemon iterate through every
local file, checking whether the file exists upstream and deleting the
local copy if it doesn't. This requres no repodata parsing, but
flooding the upstream server with HEAD requests might be considered
unfriendly.

> The default definitions for mirroring download.fedoraproject.org could
> be included in a Fedora/EPEL package that requires ApacheMirror.py and
> the monitor/expiry daemon.  That way a sysadmin who wants to create an
> instant Fedora mirror need only install that package and enable it in
> /etc/httpd/conf.d/.  yum update handles pulling in updates for tree
> changes (repo locations, how often to poll for repomd.xml changes, etc.)
>
> Example:
> yum install InstantMirror-fedora
> vim /etc/httpd/conf.d/InstantMirror-fedora.conf
> #(enable stuff)
> service httpd restart
> # http://fedora.localdomain.com
> Instant Fedora mirror!
>
> InstantMirror-fedora.noarch.rpm    : instant Fedora mirror
> InstantMirror-centos.noarch.rpm    : instant CentOS mirror
> InstantMirror-rpmfusion.noarch.rpm : instant RPMFusion mirror
> InstantMirror-foo.noarch.rpm       : instant Foo mirror

Sounds good.

> p.p.s.
> Another idea before I forget about it:
> Later add configurable fallbacks to a different upstream if
> download.fp.org is down.  mirrors.kernel.org might be a good alternative
> for default, for example.

Yes, it would be easy to configure a list of upstream servers instead
of a single one, and hit them either in priority order or randomly.

--Ed




More information about the devel mailing list