repo-mirrorlist quality control?

Wed Feb 25 17:22:08 UTC 2015

On Wed, 25 Feb 2015 17:47:47 +0100
Ralf Corsepius <rc040203 at freenet.de> wrote:

> The user-side of this problem is quite simple:
> 
> The metalinks/mirrorlists as being used in yum or mock configs are 
> pointing to mirrors, which are broken or out of sync. I.e. the 
> metalinks/mirrorlist are incorrect.

Sure. Although in many cases there's also cached repodata causing this,
so the mirrors are ok, but the client is requesting old data that none
of them have anymore. 

> I don't know how fedora mirror works, esp. whether they are polling
> or whether they served with pushes.

They pull at whatever schedule they like. Many mirrors also mirror other
distros or open source projects. Many of them could be running some
other linux, or not even linux at all. 

> In another (much smaller) project, when pushing updates, we had
> removed all mirrors from our mirrorlists and had let the server poll 
> "repodata/repomd.xml" from the mirrors to re-adde a mirror to the 
> mirrorlists if this file matched. Of course, this is a primitive 
> heuristic, but it had worked quite well for us.

yeah, sadly we don't have any ability to push to mirrors, they have to
pull from us. The mirrormanager crawler pulls repomd.xml from each
mirror when it crawls and compares it. However, as noted we can't poll
all mirrors all the time. 

> >> It is a provable matter of fact that it points users (yum,
> >> mock, ...) to broken and out of sync mirrors.
> >
> > There will be such times, sure.
> Here (Germany), in recent times, they happen almost daily. My guess
> is the "fedora flavors", the launch of f22 and the mass-rebuild on
> rawhide are showing their nasty side.

So, to be clear you see daily where 'yum update' gives you all mirrors
erroring out and you cannot get a update list? And 'yum clean all'
doesn't help?

The next time this happens can you file a ticket with the output? We
can try and see if we can see whats happening and how to improve it.

> > I don't see any way of eliminating
> > that. We can reduce it as much as we can with the resources we have.
> 
> IMO, the issues yum/mock/dnf have, partially need to be worked-around
> by heuristics in yum/mock/dnf.

The caching part for sure. 

> I sometimes observe yum and mock seemingly to lock up to dead mirrors 
> (downloaded metadata is older than previous one) and not to try a
> more recent mirror.

Yeah, that sounds like a bug indeed. 

> I also occasionally see consecutive yum/mock runs to re-iterate and
> fail over apparently the same mirrorlists (I feel, it retries by
> priority and therefore always stumbles of the same broken/dead
> mirrors).
> 
> Having a past of scientific work on Genetic Algorithms, my first try 
> would be to introduce some "randomness" into (mirror) selection - It 
> would help to breakout of deterministic causes :-)

There should be randomness in mirrorlists/metalinks. It has a weight
and mixes up the list so it's not always hitting the highest weight
ones. 

(BTW, I would advise everyone to use metalinks (the default) over plain
mirrorlists). 

> > There are some things we can do:
> >
> > * If there's a mirror that is reporting that it's up to date, but is
> >    not, we can remove it from the list and ask mirror admins to
> > check it.
> >
> > * In mirrorlists and metalinks we provide a list of mirrors. If some
> >    are out of sync, yum/dnf should move on to the next and retry.
> My feel is, this currently doesn't work. The worst cases seem to be
> yum alternating between several high-priorized broken/dead mirrors.

We should then figure out whats causing this and notify the causing
component. ;) 

> > It
> >    sounds like you might see some cases where your metadata is
> > updated locally and all mirrors fail? Please report such bugs.
> > https://fedorahosted.org/mirrormanager/
> Yes, this had happened for a longer period (>24 hours) some time last 
> week (IIRC, Thursday) and for a shorter period (2-4 hours) yesterday.
> 
> My guess is, all EU f22/rawhide mirrors were out of sync during these
> times.

Odd. mirrorlists are one of our most visible services. 

On those rare occasions we have broken them, we get users reporting the
issue in minutes. If all EU mirrors were not working I would expect a
number of reports. Also, we have a number of tier1 mirrors in EU that
stay pretty in sync normally. 

> > * We can try and urge more mirrors to sync based on fedmsg (using
> >    last-sync) instead of just randomly N times a day. They would
> > then reduce load on master mirrors and get content faster:
> > https://fedoraproject.org/wiki/Infrastructure/Mirroring#Mirror_Frequency
> >
> > * We can finish deploying our mirrormanager2 re-write. It doesn't
> > add many/any new features, but it moves mirrormanager to a codebase
> > we can work on more easily and bugfix/add features to.
> > https://github.com/fedora-infra/mirrormanager2
> Sorry, lack of knowledge on these details :(
> 
> > Constructive bugs, ideas or patches welcome.
> In this case, I am mostly a "p***ed user", who is struggling ;)

I understand. I'd love to make things better, but we need to track down
what part or parts of things isn't working right first I think. 

kevin

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://lists.fedoraproject.org/pipermail/devel/attachments/20150225/4cd44829/attachment.sig>