MirrorManager, for what I really wanted to see by the Fedora 7 release, has been a success. But there are still several gotchas I'd like to iron out before Fedora 8.
* The mirrorlist mod_python applet consumes too much memory on the app servers. It basically reads in a 2MB mirrorlist_cache pickle file which is lists, by directory, of what mirrors hold what content. Handy to have, but in mod_python, that blows the RSS size out to ~27MB per process, times all the httpd processes that have run that code, each with their own private copy. Not pretty.
The mirrormanager TurboGears backend isn't fast enough to handle all the client requests for mirrorlists, hence I exported the data for mod_python to use. But the mod_python trick takes too much memory.
The way out? Split the mod_python applet into two pieces:
1) Yet another daemon, listening on a local UNIX socket, that has a copy of the mirrorlist cache. It calculates the answers to return.
2) The mod_python applet connects to the daemon, passes it's list of args, and gets back the answer list. It handles redirects too.
In this way, the daemon can fork() itself if necessary to handle the traffic, but those forks() use copy-on-write memory, and the children will never touch the pickle, so they'll all share mostly the same memory. One copy of the mirrorlist_cache, used by all children.
Since I'm saving so much RSS memory here, I can add back into the mirrorlist_cache all the directories which are being omitted now. So, we will be able to return the list for any dir or file that the public mirrors know about, not just a few as we do now.
I've got a stab at this, but am still working on the details. I'll want to do some time tests against the new code, to make sure it isn't too much slower for clients, but a quick swag shows it'll be OK; 0.3sec or so per request, even in parallel, which IMHO is "good enough".
* Mike's redirection stuff is included in the above already, so that'll be online as soon as the rest is.
Now, to find the time before F8...
Still to come, provided I find a lot of time (unlikely), or someone else steps up to help:
* Designate a way for mirrors to claim themselves to be always up-to-date. Probably will require a sysadmin to set this bit, as it's somewhat dangerous. But there are cases, e.g. a local out-of-line squid proxy, where it makes sense to do it. This change will change the schema, and has repercussions throughout the code, so I haven't wanted to make it lightly.
* Some people want metalink support. Conceptually it's possible, and even pretty easy once we've got the daemon above working right. But as noted on f-d-l, it's been 10 weeks since someone asked for it and even sent some code that doesn't quite integrate but was a starting point, and I haven't had time to get to it. It's not looking good for me to add that right now, but I'd be happy to review patches.
* I've wanted to add the libgeoip country->continent mappings, so we can fall back netblock -> country -> continent -> global but I don't know C->Python bindings code at all, and need that exported in python-GeoIP for mm to use.
* I've got pending a request to change the fedora.repo files to make yum treat the list as in priority order. I really want the continent mappings in place before doing that though...
Should we let countries with <3 mirrors return their own lists? Right now if a country has <3 mirrors, the users get the global list back.
Anything else people really need to see?
Thanks, Matt
Hi all. I may have introduced myself before, but please bear with me again :-)
I am very interested in assisting wherever I can. My strengths, however, do not include anything in the development sphere. I am a UNIX/Linux junkie, and long background in sysadmin and technical support environments (20+ years). I have not performed much in the programming field outside of a little PHP and shell script. I have a historical programming background, but it was long lost on the VAX and was mostly BASIC, COBOL and RPG :-)
Is there anything I can help with with regards to monitoring, sysadmin, technical architecture? If I can help, I would love to.
Scott Thistle
Scott Thistle wrote:
Hi all. I may have introduced myself before, but please bear with me again :-)
I am very interested in assisting wherever I can. My strengths, however, do not include anything in the development sphere. I am a UNIX/Linux junkie, and long background in sysadmin and technical support environments (20+ years). I have not performed much in the programming field outside of a little PHP and shell script. I have a historical programming background, but it was long lost on the VAX and was mostly BASIC, COBOL and RPG :-)
Is there anything I can help with with regards to monitoring, sysadmin, technical architecture? If I can help, I would love to.
Welcome, can you make it to our weekly meetings from time to time? http://fedoraproject.org/wiki/Infrastructure/Meetings If you can thats a great way to learn about whats going on so you can offer help on specific items. Also:
https://hosted.fedoraproject.org/projects/fedora-infrastructure/report/1
If any of those are interesting to you, make a comment in the ticket and see what can be done to help.
-Mike
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Matt, you've done a great job with this. I look forward to reserving some time to hack on it with you. Most likely not before the Fedora 8 release.
Jonathan Steffan daMaestro
On Mon, 2007-09-03 at 18:05 -0500, Matt Domsch wrote:
- I've got pending a request to change the fedora.repo files to make yum treat the list as in priority order. I really want the continent mappings in place before doing that though...
Que? You have a request into change fedora.repo for this? What's the request?
-sv
On Tue, Sep 04, 2007 at 07:24:06AM -0400, seth vidal wrote:
On Mon, 2007-09-03 at 18:05 -0500, Matt Domsch wrote:
- I've got pending a request to change the fedora.repo files to make yum treat the list as in priority order. I really want the continent mappings in place before doing that though...
Que? You have a request into change fedora.repo for this? What's the request?
BZ 243698 low low All Jesse Keating NEED fedora-release should use yum failovermethod=priority
Description of problem: By default, yum uses failovermethod=roundrobin to achieve load balancing. As soon as mirrormanager itself manages the mirrorlist and can return it in priority order with internal sub-list randomization, fedora-release should put failovermethod=priority into each [repository] section that also has a mirrorlist= line pointing at mirrors.fedoraproject.org.
Coordinate with Matt Domsch as to when mirrormanager will return results in priority order.
Matt Domsch wrote:
MirrorManager, for what I really wanted to see by the Fedora 7 release, has been a success.
You're right, it is! I've set up this preferred netblock and any of the roaming clients in my home network can just use stock fedora-*repo configuration (as can my servers but they use static configuration anyway) ;-)
Anything else people really need to see?
Is there a way to 'query' the mirrorlist and telling it explicitly to not use any preferred netblock?
Could we possible filter by protocol (http/ftp, rsync)?
I'm not sure this one is still current, but formerly I had to move around directories because some mirror I was syncing from did not use the exact same full tree the master mirror was; if it is still current, could that be flagged and filtered on-request by mirror-manager?
Thanks, Matt
Thank you!
On Tue, Sep 04, 2007 at 02:03:57PM +0200, Jeroen van Meeuwen wrote:
Matt Domsch wrote:
MirrorManager, for what I really wanted to see by the Fedora 7 release, has been a success.
You're right, it is! I've set up this preferred netblock and any of the roaming clients in my home network can just use stock fedora-*repo configuration (as can my servers but they use static configuration anyway) ;-)
Anything else people really need to see?
Is there a way to 'query' the mirrorlist and telling it explicitly to not use any preferred netblock?
I've added that to my working tree now. Append '&netblock=0' to disable the netblock code.
Could we possible filter by protocol (http/ftp, rsync)?
Uhh, hmm. Could we, yes. If a host serves both http and ftp right now, mm only returns the http URLs (faster setup time). To change that would be somewhat difficult, we'd have to know a lot more data at client lookup time than I currently keep track of in the mirrorlist_cache.
I'm not sure this one is still current, but formerly I had to move around directories because some mirror I was syncing from did not use the exact same full tree the master mirror was; if it is still current, could that be flagged and filtered on-request by mirror-manager?
They only need to have sub-trees (e.g. everything below pub/fedora/linux) the same. They can put that sub-tree anywhere they want as long as they tell mm about it. If they put things willy-nilly, then yes, mm won't find it.
Thanks, Matt
infrastructure@lists.fedoraproject.org