new rawhide glibc

List overview All Threads
Download

newer

older

rawhide report: 20080527 changes

IDE drive on jmicron controller or...

Ulrich Drepper

15 May 2008 15 May '08

5 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

With the next push you'll see the first new glibc version for rawhide (2.8.90-1). This version has quite a few changes, most prominently changes to getaddrinfo.

The new implementation should be faster since it parallelizes DNS lookups and avoids code duplication in many other cases. But the DNS optimization is also what I'm a bit concerned about. I tested the code with a local bind daemon and all works well. What I haven't done is testing it with other DNS servers.

Everybody should look out for changes in the host name resolution. There shouldn't be any but who knows.

Those who are using DNS servers other than bind might want to give the code bounding. Especially also the fallback code to issue lookups with TCP (you can force this by setting the RES_USEVC bit in _res.options, see <resolv.h>).

- -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (GNU/Linux) iD8DBQFILGwS2ijCOnn/RHQRAihZAJwNHfdmsO04nh7xcX9iircly+lbzgCgv6zC UDOseG5c+GMt71Rhh2alMlY= =nN2p -----END PGP SIGNATURE-----

Show replies by date

Jesse Keating

16 May 16 May

12:16 a.m.

On Thu, 2008-05-15 at 10:00 -0700, Ulrich Drepper wrote:

...

With the next push you'll see the first new glibc version for rawhide (2.8.90-1). This version has quite a few changes, most prominently changes to getaddrinfo.

The new implementation should be faster since it parallelizes DNS lookups and avoids code duplication in many other cases. But the DNS optimization is also what I'm a bit concerned about. I tested the code with a local bind daemon and all works well. What I haven't done is testing it with other DNS servers.

Everybody should look out for changes in the host name resolution. There shouldn't be any but who knows.

Those who are using DNS servers other than bind might want to give the code bounding. Especially also the fallback code to issue lookups with TCP (you can force this by setting the RES_USEVC bit in _res.options, see <resolv.h>).

This glibc made it into the chroot we use to produce rawhide, which promptly started causing invalid pointer free errors when yum was called:

*** glibc detected *** /usr/bin/python: free(): invalid pointer: 0x00007fffa6ead9ab ***

Granted, this was a chroot on a FC6 box, not sure if that's going to "taint" things, but I had to roll back glibc in order to get a rawhide compose going.

-- Jesse Keating Fedora -- Freedom² is a feature!

Tomas Mraz

7:26 a.m.

On Thu, 2008-05-15 at 20:16 -0400, Jesse Keating wrote:

...

On Thu, 2008-05-15 at 10:00 -0700, Ulrich Drepper wrote:

...
With the next push you'll see the first new glibc version for rawhide (2.8.90-1). This version has quite a few changes, most prominently changes to getaddrinfo.

The new implementation should be faster since it parallelizes DNS lookups and avoids code duplication in many other cases. But the DNS optimization is also what I'm a bit concerned about. I tested the code with a local bind daemon and all works well. What I haven't done is testing it with other DNS servers.

Everybody should look out for changes in the host name resolution. There shouldn't be any but who knows.

Those who are using DNS servers other than bind might want to give the code bounding. Especially also the fallback code to issue lookups with TCP (you can force this by setting the RES_USEVC bit in _res.options, see <resolv.h>).

This glibc made it into the chroot we use to produce rawhide, which promptly started causing invalid pointer free errors when yum was called:

Also ssh/sshd is completely broken :( https://bugzilla.redhat.com/show_bug.cgi?id=446801

-- Tomas Mraz No matter how far down the wrong road you've gone, turn back. Turkish proverb

Michal Jaegermann

2:05 p.m.

On Fri, May 16, 2008 at 09:26:50AM +0200, Tomas Mraz wrote:

...

On Thu, 2008-05-15 at 20:16 -0400, Jesse Keating wrote:

...
On Thu, 2008-05-15 at 10:00 -0700, Ulrich Drepper wrote:

...
With the next push you'll see the first new glibc version for rawhide (2.8.90-1). This version has quite a few changes, most prominently changes to getaddrinfo.

This glibc made it into the chroot we use to produce rawhide, which promptly started causing invalid pointer free errors when yum was called:

Also ssh/sshd is completely broken :( https://bugzilla.redhat.com/show_bug.cgi?id=446801

It appears that cupsd is another victim. It promptly crashes on a startup but I did not have time yet to look closer. sshd keeps running only it is not that useful at the moment.

Michal

Michal Jaegermann

17 May 17 May

4:45 p.m.

On Fri, May 16, 2008 at 09:26:50AM +0200, Tomas Mraz wrote:

...

...
On Thu, 2008-05-15 at 10:00 -0700, Ulrich Drepper wrote:

...
With the next push you'll see the first new glibc version for rawhide (2.8.90-1).

...

Also ssh/sshd is completely broken :( https://bugzilla.redhat.com/show_bug.cgi?id=446801

An update to glibc-2.8.90-2 appears to take care at least of that.

Michal

Yanko Kaneti

6:50 p.m.

On Sat, 2008-05-17 at 10:45 -0600, Michal Jaegermann wrote:

...

On Fri, May 16, 2008 at 09:26:50AM +0200, Tomas Mraz wrote:

...
...
On Thu, 2008-05-15 at 10:00 -0700, Ulrich Drepper wrote:

...
With the next push you'll see the first new glibc version for rawhide (2.8.90-1).

...
Also ssh/sshd is completely broken :( https://bugzilla.redhat.com/show_bug.cgi?id=446801

An update to glibc-2.8.90-2 appears to take care at least of that.

While not crashing 2.8.90-2 as used by ssh fails to resolve for me in ~90% of the attempts. Quite random... . Same for wget, yum. Ping seems to work every time.

Ulrich Drepper

18 May 18 May

8:56 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Yanko Kaneti wrote:

...

While not crashing 2.8.90-2 as used by ssh fails to resolve for me in ~90% of the attempts. Quite random... . Same for wget, yum. Ping seems to work every time.

I've fixed a couple of bugs in getaddrinfo- and nscd-related code today. This should fix the problems people have seen, including bug 445656. I hope Jakub has time to build a new version soon.

- -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) iEYEARECAAYFAkgv708ACgkQ2ijCOnn/RHSIzwCgmmMwqO0QHXCiuTgNp7uuNgX1 YyMAnApZ9oX5PHvyLpGYvbCrig25avLy =jcQY -----END PGP SIGNATURE-----

Horst H. von Brand

10:22 p.m.

Ulrich Drepper drepper@redhat.com wrote:

...

Yanko Kaneti wrote:

...
While not crashing 2.8.90-2 as used by ssh fails to resolve for me in ~90% of the attempts. Quite random... . Same for wget, yum. Ping seems to work every time.

...

I've fixed a couple of bugs in getaddrinfo- and nscd-related code today. This should fix the problems people have seen, including bug 445656. I hope Jakub has time to build a new version soon.

Thanks for your work, both of you (and all Fedora upstream developers, and Fedora package maintainers). We all depend day for day on you, and rarely think of all the work going on behind the scenes here.

-- Dr. Horst H. von Brand User #22616 counter.li.org Departamento de Informatica Fono: +56 32 2654431 Universidad Tecnica Federico Santa Maria +56 32 2654239 Casilla 110-V, Valparaiso, Chile Fax: +56 32 2797513

Yanko Kaneti

19 May 19 May

6:41 a.m.

On Sun, 2008-05-18 at 01:56 -0700, Ulrich Drepper wrote:

...

Yanko Kaneti wrote:

...
While not crashing 2.8.90-2 as used by ssh fails to resolve for me in ~90% of the attempts. Quite random... . Same for wget, yum. Ping seems to work every time.

I've fixed a couple of bugs in getaddrinfo- and nscd-related code today. This should fix the problems people have seen, including bug 445656. I hope Jakub has time to build a new version soon.

2.8.90-3 from koji works for me with, so far without glitches.

Thanks.

Yanko Kaneti

5:04 p.m.

On Mon, 2008-05-19 at 09:41 +0300, Yanko Kaneti wrote:

...

On Sun, 2008-05-18 at 01:56 -0700, Ulrich Drepper wrote:

...
Yanko Kaneti wrote:

...
While not crashing 2.8.90-2 as used by ssh fails to resolve for me in ~90% of the attempts. Quite random... . Same for wget, yum. Ping seems to work every time.

I've fixed a couple of bugs in getaddrinfo- and nscd-related code today. This should fix the problems people have seen, including bug 445656. I hope Jakub has time to build a new version soon.

2.8.90-3 from koji works for me with, so far without glitches.

whoops. a glitch. I get repeated failures to resolve www.newegg.com from firefox and wget. Again ping works everytime. corroborated by at least one tester on #fedora-devel. This is on a machine using a identical resolv.conf as a nearby F9 machine which doesn't exhibit the same problem with that particular hostname.

Tom London

5:12 p.m.

On Mon, May 19, 2008 at 10:04 AM, Yanko Kaneti yaneti@declera.com wrote:

...

On Mon, 2008-05-19 at 09:41 +0300, Yanko Kaneti wrote:

...
On Sun, 2008-05-18 at 01:56 -0700, Ulrich Drepper wrote:

...
Yanko Kaneti wrote:

...
While not crashing 2.8.90-2 as used by ssh fails to resolve for me in ~90% of the attempts. Quite random... . Same for wget, yum. Ping seems to work every time.

I've fixed a couple of bugs in getaddrinfo- and nscd-related code today. This should fix the problems people have seen, including bug 445656. I hope Jakub has time to build a new version soon.

2.8.90-3 from koji works for me with, so far without glitches.

whoops. a glitch. I get repeated failures to resolve www.newegg.com from firefox and wget. Again ping works everytime. corroborated by at least one tester on #fedora-devel. This is on a machine using a identical resolv.conf as a nearby F9 machine which doesn't exhibit the same problem with that particular hostname.

I can confirm:

[tbl@localhost ~]$ curl http://www.newegg.com curl: (6) Couldn't resolve host 'www.newegg.com' [tbl@localhost ~]$ ping www.newegg.com PING www.newegg.com (204.14.213.185) 56(84) bytes of data. 64 bytes from 204.14.213.185: icmp_seq=1 ttl=246 time=86.2 ms 64 bytes from 204.14.213.185: icmp_seq=2 ttl=246 time=89.6 ms

-- Tom London

Ulrich Drepper

5:36 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Yanko Kaneti wrote:

...

whoops. a glitch. I get repeated failures to resolve www.newegg.com from firefox and wget. Again ping works everytime. corroborated by at least one tester on #fedora-devel.

Fixed in cvs now. I hope Jakub has time to build a new version today.

- -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) iEYEARECAAYFAkgxuoIACgkQ2ijCOnn/RHRNxACfbLoRT7dvCcC0h0cGJ/SgoPy1 zAkAnRN5J1fF77quN9mHQKFYJdkugWdi =iAKG -----END PGP SIGNATURE-----

Yanko Kaneti

21 May 21 May

10:33 p.m.

On Mon, 2008-05-19 at 10:36 -0700, Ulrich Drepper wrote:

...

Yanko Kaneti wrote:

...
whoops. a glitch. I get repeated failures to resolve www.newegg.com from firefox and wget. Again ping works everytime. corroborated by at least one tester on #fedora-devel.

Fixed in cvs now. I hope Jakub has time to build a new version today.

Here, 2.8.90-4 which as far as I can tell contains the commit, still fails the same way with www.newegg.com

Ulrich Drepper

22 May 22 May

2:17 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Yanko Kaneti wrote:

...

Here, 2.8.90-4 which as far as I can tell contains the commit, still fails the same way with www.newegg.com

Are you sure you don't use a cached entry in nscd? It definitely works here and I was able to reproduce the problem before.

- -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) iEYEARECAAYFAkg0178ACgkQ2ijCOnn/RHSynACeJdnndo2z5ScP2toEVnLLP/jY bKQAnRpPF+BC3th4FPk4S1mz53K2g9ZO =hFK5 -----END PGP SIGNATURE-----

Yanko Kaneti

6:40 a.m.

On Wed, 2008-05-21 at 19:17 -0700, Ulrich Drepper wrote:

...

Yanko Kaneti wrote:

...
Here, 2.8.90-4 which as far as I can tell contains the commit, still fails the same way with www.newegg.com

Are you sure you don't use a cached entry in nscd? It definitely works here and I was able to reproduce the problem before.

nscd is not installed. I have not used it at any point. Did try on another machine and against different nameservers with the same result. One machine has search in resolv.conf the other doesn't.

# curl www.newegg.com curl: (6) Couldn't resolve host 'www.newegg.com'

Bill Crawford

10:01 a.m.

2008/5/22 Yanko Kaneti yaneti@declera.com:

...

# curl www.newegg.com curl: (6) Couldn't resolve host 'www.newegg.com'

Under F8, 'host' gives me:

[bill@bill ~]$ host www.newegg.com www.newegg.com has address 204.14.213.185 Host www.newegg.com not found: 3(NXDOMAIN) Host www.newegg.com not found: 3(NXDOMAIN)

So it may well be that there are some odd, and geographically distributed, problems with their DNS data?

John Summerfield

10:48 p.m.

Bill Crawford wrote:

...

2008/5/22 Yanko Kaneti yaneti@declera.com:

...
# curl www.newegg.com curl: (6) Couldn't resolve host 'www.newegg.com'

Under F8, 'host' gives me:

[bill@bill ~]$ host www.newegg.com www.newegg.com has address 204.14.213.185 Host www.newegg.com not found: 3(NXDOMAIN) Host www.newegg.com not found: 3(NXDOMAIN)

Same here, Sl5, Greenmount, Western Australia

...

So it may well be that there are some odd, and geographically distributed, problems with their DNS data?

-- Cheers John -- spambait 1aaaaaaa@coco.merseine.nu Z1aaaaaaa@coco.merseine.nu -- Advice http://webfoot.com/advice/email.top.php http://www.catb.org/~esr/faqs/smart-questions.html http://support.microsoft.com/kb/555375 You cannot reply off-list:-)

Ulrich Drepper

10:54 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

John Summerfield wrote:

...

Same here, Sl5, Greenmount, Western Australia

Which version? Make sure you have glibc-2.8.90-4 installed and the cache of nscd is cleared.

- -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iEYEARECAAYFAkg1+Y0ACgkQ2ijCOnn/RHTpkwCgwwTlrKTCTEzQ3qv7QLpGdTaQ 5xkAni+gJkqWY+iPJY+abzfzZ2HK8F7f =EySb -----END PGP SIGNATURE-----

John Summerfield

23 May 23 May

4:03 a.m.

Ulrich Drepper wrote:

...

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

John Summerfield wrote:

...
Same here, Sl5, Greenmount, Western Australia

Which version? Make sure you have glibc-2.8.90-4 installed and the cache of nscd is cleared.

Eh? sl is a clone of rhel. Like CentOS, but different.

Ulrich Drepper

3:04 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

John Summerfield wrote:

...

Eh? sl is a clone of rhel. Like CentOS, but different.

What are you then posting on the Fedora list?

- -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) iEYEARECAAYFAkg23OwACgkQ2ijCOnn/RHTC8gCgnIQX4yshfiYq/ZMePIkwMqPP kacAnRRQW+3isoIcDlMKxMuTaWz8lg7y =D1ol -----END PGP SIGNATURE-----

John Summerfield

3:11 p.m.

Ulrich Drepper wrote:

...

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

John Summerfield wrote:

...
Eh? sl is a clone of rhel. Like CentOS, but different.

What are you then posting on the Fedora list?

1. To illustrate to you that that particular DNS problem is not with your precious new glibc. 2. I run several computers, running several distributions, releases and clones thereof. Not to mention other operaring systems. One has Fedora 9 installed, though it's running Windows at the moment. 3. Do I actually need to post to this list from a Fedora system? If so, you've just expelled half the list.

Horst H. von Brand

24 May 24 May

3:29 p.m.

Ulrich Drepper drepper@redhat.com wrote:

...

John Summerfield wrote:

...
Same here, Sl5, Greenmount, Western Australia

Which version? Make sure you have glibc-2.8.90-4 installed and the cache of nscd is cleared.

Yep, that's the one; recently booted

BTW, on i386 it works OK, but on x86_64 yum can't find the repos with 2.8.90 glibc. Had to go back.

Horst H. von Brand

3:25 p.m.

John Summerfield debian@herakles.homelinux.org wrote:

...

Bill Crawford wrote:

...
2008/5/22 Yanko Kaneti yaneti@declera.com:

...
# curl www.newegg.com curl: (6) Couldn't resolve host 'www.newegg.com'

Under F8, 'host' gives me: [bill@bill ~]$ host www.newegg.com www.newegg.com has address 204.14.213.185 Host www.newegg.com not found: 3(NXDOMAIN) Host www.newegg.com not found: 3(NXDOMAIN)

Same here, Sl5, Greenmount, Western Australia

Valparaiso, Chile. Same.

$ dig www.newegg.com any

; <<>> DiG 9.5.0rc1 <<>> www.newegg.com any ;; global options: printcmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 8431 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION: ;www.newegg.com. IN ANY

;; Query time: 280 msec ;; SERVER: 190.160.0.11#53(190.160.0.11) ;; WHEN: Sat May 24 11:21:57 2008 ;; MSG SIZE rcvd: 32

...

...
So it may well be that there are some odd, and geographically distributed, problems with their DNS data?

They have a /large/ set of nameservers, by the look of it scattered al over the globe.

Ulrich Drepper

23 May 23 May

3:08 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

...

nscd is not installed. I have not used it at any point. Did try on another machine and against different nameservers with the same result. One machine has search in resolv.conf the other doesn't.

# curl www.newegg.com curl: (6) Couldn't resolve host 'www.newegg.com'

You'll have to provide more information. strace output, wireshark/tcpdump logs, ...

The problem I saw and which I fixed was that the server answering for that domain deliberately doesn't reply to IPv6 (T_AAAA) queries. I.e., it depends on the client to time out. That's terribly bad practice and the provider should change that.

Anyway, I think I fixed that in the new lookup code and, as I said, it works for me. I cannot imagine what you're seeing.

So, make really sure you're using the -4 rpms and then run

strace -o LOG getent ahosts www.newegg.com

and also capture the net traffic (just port 53 data) and stuff it all into a new BZ.

- -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) iEYEARECAAYFAkg23gQACgkQ2ijCOnn/RHQAqACfUtp71vQeXcX4vge2oZXJhm8y ckwAoK18tvvWlOEyeeFAN3C+FZ6yaXyl =UA7t -----END PGP SIGNATURE-----

John Summerfield

3:13 p.m.

Ulrich Drepper wrote:

...

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

...
nscd is not installed. I have not used it at any point. Did try on another machine and against different nameservers with the same result. One machine has search in resolv.conf the other doesn't.

# curl www.newegg.com curl: (6) Couldn't resolve host 'www.newegg.com'

You'll have to provide more information. strace output, wireshark/tcpdump logs, ...

I've illustrate that that particular problem is reproducible on an entirely different system (Sl5), and therefor almost certainly has nothing to do with anything you've done.

Bill Crawford

3:41 p.m.

2008/5/23 John Summerfield debian@herakles.homelinux.org:

...

I've illustrate that that particular problem is reproducible on an entirely different system (Sl5), and therefor almost certainly has nothing to do with anything you've done.

... and Ulrich is trying to provide a workaround in glibc for badly behaving hosts, like this one. You're complaining.

... huh?

Horst H. von Brand

24 May 24 May

3:42 p.m.

Bill Crawford billcrawford1970@gmail.com wrote:

...

2008/5/23 John Summerfield debian@herakles.homelinux.org:

...
I've illustrate that that particular problem is reproducible on an entirely different system (Sl5), and therefor almost certainly has nothing to do with anything you've done.

...

... and Ulrich is trying to provide a workaround in glibc for badly behaving hosts, like this one. You're complaining.

Is it really helpful the mess up glibc as a reward for bad behaviour?

John Summerfield

26 May 26 May

11:35 p.m.

Bill Crawford wrote:

...

2008/5/23 John Summerfield debian@herakles.homelinux.org:

...
I've illustrate that that particular problem is reproducible on an entirely different system (Sl5), and therefor almost certainly has nothing to do with anything you've done.

... and Ulrich is trying to provide a workaround in glibc for badly behaving hosts, like this one. You're complaining.

Are you mad? I was merely adding to the pool of information.

Horst H. von Brand

24 May 24 May

3:40 p.m.

John Summerfield debian@herakles.homelinux.org wrote:

...

Ulrich Drepper wrote:

...
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

...
nscd is not installed. I have not used it at any point. Did try on another machine and against different nameservers with the same result. One machine has search in resolv.conf the other doesn't.

# curl www.newegg.com curl: (6) Couldn't resolve host 'www.newegg.com'

You'll have to provide more information. strace output, wireshark/tcpdump logs, ...

I've illustrate that that particular problem is reproducible on an entirely different system (Sl5), and therefor almost certainly has nothing to do with anything you've done.

Here on Aurora (Fedora-ish on SPARC) with glibc-2.8-5.sparcv9 I see the same nonsense with www.newegg.com

Yanko Kaneti

23 May 23 May

3:45 p.m.

On Fri, 2008-05-23 at 08:08 -0700, Ulrich Drepper wrote:

...

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

...
nscd is not installed. I have not used it at any point. Did try on another machine and against different nameservers with the same result. One machine has search in resolv.conf the other doesn't.

# curl www.newegg.com curl: (6) Couldn't resolve host 'www.newegg.com'

You'll have to provide more information. strace output, wireshark/tcpdump logs, ...

The problem I saw and which I fixed was that the server answering for that domain deliberately doesn't reply to IPv6 (T_AAAA) queries. I.e., it depends on the client to time out. That's terribly bad practice and the provider should change that.

Anyway, I think I fixed that in the new lookup code and, as I said, it works for me. I cannot imagine what you're seeing.

So, make really sure you're using the -4 rpms and then run

strace -o LOG getent ahosts www.newegg.com

and also capture the net traffic (just port 53 data) and stuff it all into a new BZ.

https://bugzilla.redhat.com/show_bug.cgi?id=448117

There are two namservers in resolv.conf which might be the tripping point? Sorry for failing to mention that before.

P.S John, I believe this is about the new glibc in fedora rawhide only resolver code behaving differently than the old code and not about newegg.com having crappy dns.

Ulrich Drepper

24 May 24 May

6:03 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Yanko Kaneti wrote:

...

https://bugzilla.redhat.com/show_bug.cgi?id=448117

There are two namservers in resolv.conf which might be the tripping point? Sorry for failing to mention that before.

That's not an issue.

I think it's fixed now in cvs. The reason why I didn't fix this before is that the server I (indirectly) use doesn't return any answer fort he T_AAAA query. The server you're using is sending back a SERVFAIL error.

Anyway, I'm treating SERVFAILs now like timeouts when it comes to having received one reply.

Jakub will hopefully build a new glibc sometime soon.

To comment on a few mails in this thread:

- - the problem is new and not new. It is new because of new code paths which are obviously not yet well tested. It is not new in that we had to treat the same situations before.

The good news is that the new code should be faster since it waits less in total.

- - it is of course necessary to handle all these situations. I know that the server setup quality out there is bad. This is only the tip of the iceberg. Nevertheless, there should be an effort to rectify the situation. For instance, in this specific case, the server I talk to doesn't provide a SERVFAIL answer and therefore the resolver has to reply on a timeout. Multiply this by the number of DNS servers in /etc/resolv.conf and the number of retries and you'll arrive at substantial delays. There is nothing the resolver can do. All one can do is to use nscd to cache the results (albeit it's only effective with the new libc).

- - the reason why ping is not affected is that it doesn't look up IPv6 addresses. The problem case is when both IPv4 and IPv6 addresses are needed at the same time using the getaddrinfo interface and AF_UNSPEC as the type.

There are likely more problems looming. So keep your eyes open. When problems appear, capture the net traffic to and from port 53 as well as strace of, say, getent.

- -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) iEYEARECAAYFAkg4WF4ACgkQ2ijCOnn/RHSpBQCfe1zs1QN70FV2PeTnvJxUHuiT 91kAnjgfqyThJ3FqtMtofoqlP1bENr6h =ypOw -----END PGP SIGNATURE-----

Ulrich Drepper

25 May 25 May

1:46 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Ulrich Drepper wrote:

...

Jakub will hopefully build a new glibc sometime soon.

It's built as 2.8.90-5. There might not be a rawhide push so you might want to pull down the binaries explicitly from:

http://koji.fedoraproject.org/koji/buildinfo?buildID=50374

- -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) iEYEARECAAYFAkg4xNkACgkQ2ijCOnn/RHTSIQCgto/x5iEmZZOwM8Ky2NjTnNwn wEEAn2nDzbp/HoASzT4B/Xu6DWrLm7RQ =ekTw -----END PGP SIGNATURE-----

Yanko Kaneti

8:25 p.m.

On Sat, 2008-05-24 at 18:46 -0700, Ulrich Drepper wrote:

...

Ulrich Drepper wrote:

...
Jakub will hopefully build a new glibc sometime soon.

It's built as 2.8.90-5. There might not be a rawhide push so you might want to pull down the binaries explicitly from:

http://koji.fedoraproject.org/koji/buildinfo?buildID=50374

Works for me.

John Summerfield

26 May 26 May

11:44 p.m.

Ulrich Drepper wrote:

...

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Yanko Kaneti wrote:

...
https://bugzilla.redhat.com/show_bug.cgi?id=448117

There are two namservers in resolv.conf which might be the tripping point? Sorry for failing to mention that before.

That's not an issue.

I think it's fixed now in cvs. The reason why I didn't fix this before is that the server I (indirectly) use doesn't return any answer fort he T_AAAA query. The server you're using is sending back a SERVFAIL error.

Anyway, I'm treating SERVFAILs now like timeouts when it comes to having received one reply.

I'm not sure that papering over their problem is the right solution; one of my great frustrations administering a network is when stuff doesn't work and doesn't say why (presumably so as to not confuse naive users).

Michal Jaegermann

23 May 23 May

4:48 p.m.

On Fri, May 23, 2008 at 08:08:52AM -0700, Ulrich Drepper wrote:

...

...
nscd is not installed. I have not used it at any point. Did try on another machine and against different nameservers with the same result. One machine has search in resolv.conf the other doesn't.

# curl www.newegg.com curl: (6) Couldn't resolve host 'www.newegg.com'

The problem I saw and which I fixed was that the server answering for that domain deliberately doesn't reply to IPv6 (T_AAAA) queries. I.e., it depends on the client to time out.

A DNS record for www.newegg.com is broken in more ways that one. It is possible to get 204.14.213.185 but 'host 204.14.213.185' replies with "Host 185.213.14.204.in-addr.arpa. not found: 3(NXDOMAIN)"

...

That's terribly bad practice and the provider should change that.

Good luck with that. I have an impression that a quality of DNS records in general goes downhill and talking to providers is, in my experience, absolutely unproductive. If anything the mess is getting worse even faster.

OTOH even when records are ok then for alias names a name resolution "stutters". For example:

$ host www.ualberta.ca www.ualberta.ca has address 129.128.98.86 www.ualberta.ca is an alias for web1.srv.ualberta.ca. www.ualberta.ca is an alias for web1.srv.ualberta.ca.

This is with glibc-2.8.90-4 and nscd NOT running. In strace I can notice: .... futex(0x7f7570e77020, FUTEX_WAKE_PRIVATE, 1) = 1 rt_sigaction(SIGHUP, {0x7f75720dee23, ~[RTMIN RT_1], SA_RESTORER, 0x7f7571a7a110}, NULL, 8) = 0 rt_sigsuspend([]) = ? ERESTARTNOHAND (To be restarted) --- SIGTERM (Terminated) @ 0 (0) --- rt_sigreturn(0xf) = -1 EINTR (Interrupted system call) ....

which seems to be related to the above.

With glibc-2.7-2 (F8) or glibc-2.6-4 (F7) I do not see that "stutter".

Just for a change 'host clock.redhat.com' responds with

clock.redhat.com has address 66.187.224.4 clock.redhat.com has address 66.187.233.4

in all three cases with a reverse query on that address bringing 'clock2.redhat.com'.

Michal

Horst H. von Brand

19 May 19 May

9:21 p.m.

Yanko Kaneti yaneti@declera.com wrote:

...

On Mon, 2008-05-19 at 09:41 +0300, Yanko Kaneti wrote:

...
On Sun, 2008-05-18 at 01:56 -0700, Ulrich Drepper wrote:

...
Yanko Kaneti wrote:

...
While not crashing 2.8.90-2 as used by ssh fails to resolve for me in ~90% of the attempts. Quite random... . Same for wget, yum. Ping seems to work every time.

I've fixed a couple of bugs in getaddrinfo- and nscd-related code today. This should fix the problems people have seen, including bug 445656. I hope Jakub has time to build a new version soon.

2.8.90-3 from koji works for me with, so far without glitches.

...

whoops. a glitch. I get repeated failures to resolve www.newegg.com from firefox and wget. Again ping works everytime. corroborated by at least one tester on #fedora-devel. This is on a machine using a identical resolv.conf as a nearby F9 machine which doesn't exhibit the same problem with that particular hostname.

Here (x86_64), yum stoped finding the repos to update from, ping(1) does find the servers. Had to go back.

Horst H. von Brand

17 May 17 May

1:07 a.m.

Jesse Keating jkeating@redhat.com wrote:

...

On Thu, 2008-05-15 at 10:00 -0700, Ulrich Drepper wrote:

...
With the next push you'll see the first new glibc version for rawhide (2.8.90-1). This version has quite a few changes, most prominently changes to getaddrinfo.

The new implementation should be faster since it parallelizes DNS lookups and avoids code duplication in many other cases. But the DNS optimization is also what I'm a bit concerned about. I tested the code with a local bind daemon and all works well. What I haven't done is testing it with other DNS servers.

Everybody should look out for changes in the host name resolution. There shouldn't be any but who knows.

Those who are using DNS servers other than bind might want to give the code bounding. Especially also the fallback code to issue lookups with TCP (you can force this by setting the RES_USEVC bit in _res.options, see <resolv.h>).

This glibc made it into the chroot we use to produce rawhide, which promptly started causing invalid pointer free errors when yum was called:

*** glibc detected *** /usr/bin/python: free(): invalid pointer: 0x00007fffa6ead9ab ***

Granted, this was a chroot on a FC6 box, not sure if that's going to "taint" things, but I had to roll back glibc in order to get a rawhide compose going.

I get lots of ugly messages scrolling by starting X, and no X.

i686, dual core.

5881

Age (days ago)

5892

Last active (days ago)

test@lists.fedoraproject.org

36 comments

9 participants

tags (0)

participants (9)

Bill Crawford
Horst H. von Brand
Jesse Keating
John Summerfield
Michal Jaegermann
Tom London
Tomas Mraz
Ulrich Drepper
Yanko Kaneti