Hi.
I am getting strange data from the mirrorlist plugin while trying to update my rawhide machine.
The line for the development repo is mirrorlist=http://mirrors.fedoraproject.org/mirrorlist?repo=rawhide&arch=$basearch
The (current) result from this query is # repo = rawhide country = global arch = i386 http://download.fedoraproject.org/pub/fedora/linux/core/development/$ARCH/os... http://distro.ibiblio.org/pub/linux/distributions/fedora/linux/core/developm...
Nonetheless, the repodata I get is quite outdated, it lists dbus-0.62 and firefox 1.5.0.3, for example.
Since no amount of debugging will make yum tell which servers it is retrieving data from I can not tell where the bad data actually comes from.
On Sat, 2006-07-29 at 13:09 +0200, Ralf Ertzinger wrote:
Hi.
I am getting strange data from the mirrorlist plugin while trying to update my rawhide machine.
The line for the development repo is mirrorlist=http://mirrors.fedoraproject.org/mirrorlist?repo=rawhide&arch=$basearch
The (current) result from this query is # repo = rawhide country = global arch = i386 http://download.fedoraproject.org/pub/fedora/linux/core/development/$ARCH/os... http://distro.ibiblio.org/pub/linux/distributions/fedora/linux/core/developm...
Nonetheless, the repodata I get is quite outdated, it lists dbus-0.62 and firefox 1.5.0.3, for example.
Since no amount of debugging will make yum tell which servers it is retrieving data from I can not tell where the bad data actually comes from.
1. is there a proxy running somewhere which might be returning bogus results? 2. put each of those urls into your repo file as the one and sole baseurl - then you can tell which is which. 3. The mirrorlist cgi does the following: - take repomd.xml from the canonical mirror at redhat.com - get timestamp from inside file of the primary.xml.gz entry - downlowd repomd.xml from each of the mirrors and compare this value - only mirrors where the value is the same are kept
-sv
Hi.
seth vidal skvidal@linux.duke.edu wrote:
- is there a proxy running somewhere which might be returning bogus
results?
There is no proxy.
- put each of those urls into your repo file as the one and sole
baseurl - then you can tell which is which.
Of course this is no longer happening now that I am trying to figure it out.
- The mirrorlist cgi does the following:
- take repomd.xml from the canonical mirror at redhat.com - get
timestamp from inside file of the primary.xml.gz entry
- downlowd repomd.xml from each of the mirrors and compare this value
- only mirrors where the value is the same are kept
How often is this done?
Would it be possible to make yum output the URI of every file it tries to retrieve on some debug level?
On Sat, 2006-07-29 at 14:21 +0200, Ralf Ertzinger wrote:
Hi.
seth vidal skvidal@linux.duke.edu wrote:
- is there a proxy running somewhere which might be returning bogus
results?
There is no proxy.
- put each of those urls into your repo file as the one and sole
baseurl - then you can tell which is which.
Of course this is no longer happening now that I am trying to figure it out.
- The mirrorlist cgi does the following:
- take repomd.xml from the canonical mirror at redhat.com - get
timestamp from inside file of the primary.xml.gz entry
- downlowd repomd.xml from each of the mirrors and compare this value
- only mirrors where the value is the same are kept
How often is this done?
every hour.
Would it be possible to make yum output the URI of every file it tries to retrieve on some debug level?
probably. -sv
- The mirrorlist cgi does the following:
- take repomd.xml from the canonical mirror at redhat.com - get
timestamp from inside file of the primary.xml.gz entry
- downlowd repomd.xml from each of the mirrors and compare this value
- only mirrors where the value is the same are kept
this sounds interesting... would there be interest to also feed these results to a dns server, so that mirrors.fedora.<something> is a rotating dns between all "validated" mirrors? (this assumes the same location of the files on all these mirrors obviously)
dns probably scales better than a cgi ....
On Sat, 2006-07-29 at 14:22 +0200, Arjan van de Ven wrote:
- The mirrorlist cgi does the following:
- take repomd.xml from the canonical mirror at redhat.com - get
timestamp from inside file of the primary.xml.gz entry
- downlowd repomd.xml from each of the mirrors and compare this value
- only mirrors where the value is the same are kept
this sounds interesting... would there be interest to also feed these results to a dns server, so that mirrors.fedora.<something> is a rotating dns between all "validated" mirrors? (this assumes the same location of the files on all these mirrors obviously)
it also generates country-specific files - and I've generally found relying on dns servers to honor TTLs all over the world is a bad idea.
dns probably scales better than a cgi ....
not if you have to use the ip to respond with a proper list for that country, or, failing the existence of that country's list - respond with the global one.
I've found the relying on dns for anything frequently updating is just asking for agonizing caching issues you can do nothing about and irritated users.
-sv
dns probably scales better than a cgi ....
not if you have to use the ip to respond with a proper list for that country, or, failing the existence of that country's list - respond with the global one.
The check-script to only offer working mirros is great. I'd hope that also the old locations are updated with working mirrorlist files.
I still think we should just provide working mirrorlist files, but not add geoip into the servers within FedoraProject.org. I though we want to keep running on Open Source solutions here...
regards,
Florian La Roche
On 7/30/06, Florian La Roche laroche@redhat.com wrote:
I still think we should just provide working mirrorlist files, but not add geoip into the servers within FedoraProject.org. I though we want to keep running on Open Source solutions here...
Static mirrorlists do not track the current status of mirrors, as anyone who has got yum's metadata do not match checksums error can attest.
GeoIP is in Extras.
John
On Sun, 2006-07-30 at 20:12 +0200, Florian La Roche wrote:
dns probably scales better than a cgi ....
not if you have to use the ip to respond with a proper list for that country, or, failing the existence of that country's list - respond with the global one.
The check-script to only offer working mirros is great. I'd hope that also the old locations are updated with working mirrorlist files.
that's up to other folks than me.
I still think we should just provide working mirrorlist files, but not add geoip into the servers within FedoraProject.org. I though we want to keep running on Open Source solutions here...
What's not open source about geoip? The software is open at the very least, it's in extras.
I read through the rules there appears to be a functional free version - which is what we're using - the one that's in extras.
According to extras it is GPL'd.
-sv
On Sun, Jul 30, 2006 at 03:54:00PM -0400, seth vidal wrote:
On Sun, 2006-07-30 at 20:12 +0200, Florian La Roche wrote:
dns probably scales better than a cgi ....
not if you have to use the ip to respond with a proper list for that country, or, failing the existence of that country's list - respond with the global one.
The check-script to only offer working mirros is great. I'd hope that also the old locations are updated with working mirrorlist files.
that's up to other folks than me.
Ok, I'll have to ping some other people on this.
I still think we should just provide working mirrorlist files, but not add geoip into the servers within FedoraProject.org. I though we want to keep running on Open Source solutions here...
What's not open source about geoip? The software is open at the very least, it's in extras.
I read through the rules there appears to be a functional free version - which is what we're using - the one that's in extras.
From their website I didn't think they have a reasonable free version.
The Fedora Extras package mentions a 97% accurate database for the free version which does sound ok to me. Let's see if .cgi holds up or if e.g. we can use the timezone setting or other items to start with sane defaults.
According to extras it is GPL'd.
Yep.
Thanks a lot,
Florian La Roche
On Sun, 2006-07-30 at 22:16 +0200, Florian La Roche wrote:
On Sun, Jul 30, 2006 at 03:54:00PM -0400, seth vidal wrote:
On Sun, 2006-07-30 at 20:12 +0200, Florian La Roche wrote:
dns probably scales better than a cgi ....
not if you have to use the ip to respond with a proper list for that country, or, failing the existence of that country's list - respond with the global one.
The check-script to only offer working mirros is great. I'd hope that also the old locations are updated with working mirrorlist files.
that's up to other folks than me.
Ok, I'll have to ping some other people on this.
The only problem I have with updating the original mirror lists is that the mirrors are not always out of sync. sometimes they're only out of sync for a day or so. So fixing them would mean having to put back the broken mirror later if it suddenly resynchronized.
Now, if I got the time I wouldn't mind putting all the mirror results in a little database and tracking them over time. so that mirrors that are in sync the majority of the time don't get checked as often or some such thing.
From their website I didn't think they have a reasonable free version.
The Fedora Extras package mentions a 97% accurate database for the free version which does sound ok to me.
97% is much better than what we had before :)
Let's see if .cgi holds up or if e.g. we can use the timezone setting or other items to start with sane defaults.
the code is checked into /cvs/fedora/check-mirrors if you want to see if there are other ways of doing it.
-sv
Ok, I'll have to ping some other people on this.
The only problem I have with updating the original mirror lists is that the mirrors are not always out of sync. sometimes they're only out of sync for a day or so. So fixing them would mean having to put back the broken mirror later if it suddenly resynchronized.
Now, if I got the time I wouldn't mind putting all the mirror results in a little database and tracking them over time. so that mirrors that are in sync the majority of the time don't get checked as often or some such thing.
So this should be the following code then: for m in mirrors: if m.timestamps.has_key(arch): if m.timestamps[arch] == canon.timestamps[arch]: if debug: print 'adding %s' % m.url glob_urls.append(m.url) if m.country: if not country_specific.has_key(m.country): country_specific[m.country] = [] if debug: print 'adding to %s: %s' % (m.country, m.url) country_specific[m.country].append(m.url)
Items I could think off: - Timestamps are only checked for the big list, not for the per-country lists. - You could also allow all timestamps which are not older than 2 days in addition to the exact match (which will then depend on good time setting on the machine doing the checks). Another possibility would be to allow older timestamps if their delta to the current timestamp is not too high, but that would then depend on frequent updates to go out.
From their website I didn't think they have a reasonable free version.
The Fedora Extras package mentions a 97% accurate database for the free version which does sound ok to me.
97% is much better than what we had before :)
If that data is freely available, I'll start playing around with it. Much better than taking the ending of hostnames or similar.
Let's see if .cgi holds up or if e.g. we can use the timezone setting or other items to start with sane defaults.
the code is checked into /cvs/fedora/check-mirrors if you want to see if there are other ways of doing it.
Using the timezone sounds like another good guess, but will again need some database mapping AFAIK. But I think we could also use this to set a better default timezone selection based on the language selection within the installer. I'll put this on my list for post-FC6 items to check out.
regards,
Florian La Roche
On Mon, 2006-07-31 at 19:23 +0200, Florian La Roche wrote:
Ok, I'll have to ping some other people on this.
The only problem I have with updating the original mirror lists is that the mirrors are not always out of sync. sometimes they're only out of sync for a day or so. So fixing them would mean having to put back the broken mirror later if it suddenly resynchronized.
Now, if I got the time I wouldn't mind putting all the mirror results in a little database and tracking them over time. so that mirrors that are in sync the majority of the time don't get checked as often or some such thing.
So this should be the following code then: for m in mirrors: if m.timestamps.has_key(arch): if m.timestamps[arch] == canon.timestamps[arch]: if debug: print 'adding %s' % m.url glob_urls.append(m.url) if m.country: if not country_specific.has_key(m.country): country_specific[m.country] = [] if debug: print 'adding to %s: %s' % (m.country, m.url) country_specific[m.country].append(m.url)
Items I could think off:
- Timestamps are only checked for the big list, not for the per-country lists.
fixed in cvs. Thank you - I just hadn't indented it one layer :)
- You could also allow all timestamps which are not older than 2 days in addition to the exact match (which will then depend on good time setting on the machine doing the checks). Another possibility would be to allow older timestamps if their delta to the current timestamp is not too high, but that would then depend on frequent updates to go out.
but that means the repos don't match - which will play hell when retrieving metadata. I thought about this one a while and decided it's either an exact match or it is a failure.
-sv
Florian La Roche wrote :
What's not open source about geoip? The software is open at the very least, it's in extras.
I read through the rules there appears to be a functional free version - which is what we're using - the one that's in extras.
From their website I didn't think they have a reasonable free version.
The Fedora Extras package mentions a 97% accurate database for the free version which does sound ok to me. Let's see if .cgi holds up or if e.g. we can use the timezone setting or other items to start with sane defaults.
According to extras it is GPL'd.
Yep.
Well, not entirely. The data is definitely not GPL. It's shipped by upstream in the tarball containing all the GPL code and API without any mention of its different license... this is clearly a bug that they should take care of, and that we should take care of in the meantime.
I've already reported it here : https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=198137
Matthias