Thoughts and question about MM2's UMDL script

Adrian Reber adrian at lisas.de
Fri Jun 26 19:00:37 UTC 2015


On Fri, Jun 26, 2015 at 10:50:07AM -0600, Kevin Fenzi wrote:
> > Yesterday and today I spent a little time going over the UDML script
> > of MirrorManager2.
> > Going through it, I ended up with few questions regarding it.
> > 

[...]

> > * Readable status of directories
> > The Directory table has a 'readable' property, none of our
> > directories is not readable.
> > 
> > Question is: what is the use-case for this boolean?
> 
> Could it be for when we have a release about to come out, but it's not
> readable to the public yet? Typically we stage a release on friday and
> until the actual release on tuesday the directory isn't open, only
> mirrors with the acls can sync it. 

Yes, that's also what I remember about it. Although I am not sure if we
still need it. As we bitflip much earlier we have the chance to crawl
everything before the release. The last few releases we even didn't use
the data under releases but under development for the first few weeks.
So this functionality has become pretty useless with the current release
mechanism.

> > * Changes while running
> > Looking at the code, the UMDL seems to be very careful to handle
> > changes on the FS while it is running.
> > One hope I have is to speed up the UMDL run time, but I'm curious.
> > 
> > Question: Does anyone know if the FS changes often while the UMDL is
> > actually running?
> > Gaining speed of course does not mean being wreakless but I'm curious
> > as to how often this situation occurs. IIRC, we trigger the UMDL via
> > fedmsg now, right? So in theory, the FS shouldn't change too much
> > under the UMDL's feet.
> 
> Well, I can think of one common case: 
> 
> 1. Fedora updates push finishes, umdl starts. 
> 2. EPEL updates push finishes while umdl is in the middle of it's
> directories. 

Yes, the data definitely changes during umdl's runtime.

> We could of course fix this by making it crawl them seperately? 
> For each category?

+1

[...]

> I had one additional thought based out of recent issues we have had:
> 
> Right now when an updates push happens, umdl starts and crawls
> everything in all directory trees. Perhaps we could be much more
> targted here? if the fedmsg says 'rawhide' was updated, only crawl that
> area. If it says "Fedora 21" or "EPEL 7" only crawl those. 
> 
> And then of course we would need some way to have it crawl everything
> in order to add new releases like Fedora 23 Alpha or whatever, but it
> could just do that once a day? Or on demand?
> 
> Just a thought. 

I was having the same idea. Not as fine grained as yours but similar.
UMDL crawling EPEL is really fast and Fedora Others takes many hours.
For 'Fedora Linux' and 'Fedora EPEL' we have working fedmsg triggers so
we should definitely crawl each category separately and only on demand.

These is still this ticket https://fedorahosted.org/rel-eng/ticket/6157
If we could get a fedmsg trigger for Fedora Secondary we only would need
to crawl Fedora Other maybe once per day and Fedora Archive could even
be changed to manually after a release has been moved. This would reduce
the I/O load on the NFS and make changes to fast changing categories
like 'Fedora Linux' and 'Fedora EPEL' faster available in the database.

So this is something I will implement next week as far as it is now
possible.

One the main problem I see with MirrorManager is that the number of
mirrors and the data served by MirrorManager has grown enormously so
that we have to get much smarter at many places to scale reasonably.

		Adrian
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 811 bytes
Desc: not available
URL: <http://lists.fedoraproject.org/pipermail/infrastructure/attachments/20150626/4aed7c7f/attachment.sig>


More information about the infrastructure mailing list