I have a system with a Tyan 2885 motherboard (S2885-ANRF) that uses dual Opteron 244 processors. Each processor has 1 GB of memory for a total of 2 GB. I am using a SATA HD. I am running the latest stock release of the SMP version of the FC3 kernel for x86_64. uname -a output follows:
Linux maggie 2.6.11-1.14_FC3smp #1 SMP Thu Apr 7 19:36:23 EDT 2005 x86_64 x86_64 x86_64 GNU/Linux
The computer is worldly node for a small cluster of computers. It is resposible for building a code that is run on the cluster. A shell script is used to start the build process. Occasionally when the script is started it crashes and the following messages are place in /var/log/messages (sorry for the ugly line wrap):
May 11 16:26:56 maggie kernel: mm/memory.c:97: bad pmd ffff81002f6a4000(0000000000000008). May 11 16:26:56 maggie kernel: mm/memory.c:97: bad pmd ffff81002f6a4010(0000000000000009). May 11 16:26:56 maggie kernel: mm/memory.c:97: bad pmd ffff81002f6a4018(0000000000401b80). May 11 16:26:56 maggie kernel: mm/memory.c:97: bad pmd ffff81002f6a4020(000000000000000b). May 11 16:26:56 maggie kernel: mm/memory.c:97: bad pmd ffff81002f6a4028(0000000000000220). May 11 16:26:56 maggie kernel: mm/memory.c:97: bad pmd ffff81002f6a4030(000000000000000c). May 11 16:26:56 maggie kernel: mm/memory.c:97: bad pmd ffff81002f6a4038(0000000000000220). May 11 16:26:56 maggie kernel: mm/memory.c:97: bad pmd ffff81002f6a4040(000000000000000d). May 11 16:26:56 maggie kernel: mm/memory.c:97: bad pmd ffff81002f6a4048(00000000000001f7). May 11 16:26:56 maggie kernel: mm/memory.c:97: bad pmd ffff81002f6a4050(000000000000000e). May 11 16:26:56 maggie kernel: mm/memory.c:97: bad pmd ffff81002f6a4058(00000000000001f7). May 11 16:26:56 maggie kernel: mm/memory.c:97: bad pmd ffff81002f6a4060(0000000000000017). May 11 16:26:56 maggie kernel: mm/memory.c:97: bad pmd ffff81002f6a4070(000000000000000f). May 11 16:26:56 maggie kernel: mm/memory.c:97: bad pmd ffff81002f6a4078(00007ffffffff098). May 11 16:26:56 maggie kernel: mm/memory.c:97: bad pmd ffff81002f6a4098(000034365f363878).
If the shell script is started immediately after the crash it works. I have been seeing "bad pmd" messages for quite some time and the shell script that builds the code is not the only event that triggers them.
I'm pretty sure the memory in the machine is fine. I have noticed a thread on linux-kernel list discussing the problem, but haven't had a chance to post there yet.
I have appended the output of lspci and lsmod to the end of this message.
Has anyone on this list noticed similar messages?
The linux-kernel thread indicates the problem is x86_64 specific and seems to be hilighted by Tyan HW.
Any insight would be greatly appreciated. Pete
/bin/lspci
00:06.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8111 PCI (rev 07) 00:07.0 ISA bridge: Advanced Micro Devices [AMD] AMD-8111 LPC (rev 05) 00:07.1 IDE interface: Advanced Micro Devices [AMD] AMD-8111 IDE (rev 03) 00:07.2 SMBus: Advanced Micro Devices [AMD] AMD-8111 SMBus 2.0 (rev 02) 00:07.3 Bridge: Advanced Micro Devices [AMD] AMD-8111 ACPI (rev 05) 00:07.5 Multimedia audio controller: Advanced Micro Devices [AMD] AMD-8111 AC97 Audio (rev 03) 00:0a.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge (rev 12) 00:0a.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X APIC (rev 01) 00:0b.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge (rev 12) 00:0b.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X APIC (rev 01) 00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 02:07.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet (rev 10) 02:09.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5703X Gigabit Ethernet (rev 02) 03:00.0 USB Controller: Advanced Micro Devices [AMD] AMD-8111 USB (rev 0b) 03:00.1 USB Controller: Advanced Micro Devices [AMD] AMD-8111 USB (rev 0b) 03:0b.0 Unknown mass storage controller: Silicon Image, Inc. (formerly CMD Technology Inc) SiI 3114 [SATALink/SATARaid] Serial ATA Controller (rev 02) 03:0c.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000 Controller (PHY/Link) 04:00.0 Host bridge: Advanced Micro Devices [AMD] AMD-8151 System Controller (rev 13) 04:01.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8151 AGP Bridge (rev 13) 05:00.0 VGA compatible controller: nVidia Corporation NV34 [GeForce FX 5200] (rev a1)
/sbin/lsmod
Module Size Used by md5 5953 1 ipv6 297665 22 parport_pc 32809 0 lp 16145 0 parport 45773 2 parport_pc,lp autofs4 24521 0 sunrpc 169017 1 pcmcia 30549 0 yenta_socket 25033 0 rsrc_nonstatic 11969 1 yenta_socket pcmcia_core 57241 3 pcmcia,yenta_socket,rsrc_nonstatic video 20169 0 button 9185 0 battery 12233 0 ac 6857 0 ohci1394 38361 0 ieee1394 385721 1 ohci1394 ohci_hcd 25429 0 i2c_amd8111 8129 0 i2c_core 28353 1 i2c_amd8111 hw_random 7393 0 snd_intel8x0 38977 0 snd_ac97_codec 91537 1 snd_intel8x0 snd_pcm_oss 62193 0 snd_mixer_oss 22209 1 snd_pcm_oss snd_pcm 109257 3 snd_intel8x0,snd_ac97_codec,snd_pcm_oss snd_timer 29897 1 snd_pcm snd 65417 6 snd_intel8x0,snd_ac97_codec,snd_pcm_oss,snd_mixer_oss,snd_pcm,snd_timer soundcore 12641 1 snd snd_page_alloc 13513 2 snd_intel8x0,snd_pcm r8169 33485 0 tg3 91717 0 floppy 68881 0 dm_snapshot 19713 0 dm_zero 4033 0 dm_mirror 25553 0 ext3 148561 2 jbd 69105 1 ext3 dm_mod 69761 6 dm_snapshot,dm_zero,dm_mirror sata_sil 11589 2 libata 54601 1 sata_sil sd_mod 20929 3 scsi_mod 155665 2 libata,sd_mod
On Wed, May 11, 2005 at 05:45:38PM -0700, Peter J. Stieber wrote:
I have a system with a Tyan 2885 motherboard (S2885-ANRF) that uses dual Opteron 244 processors. Each processor has 1 GB of memory for a total of 2 GB. I am using a SATA HD. I am running the latest stock release of the SMP version of the FC3 kernel for x86_64. uname -a output follows:
Linux maggie 2.6.11-1.14_FC3smp #1 SMP Thu Apr 7 19:36:23 EDT 2005 x86_64 x86_64 x86_64 GNU/Linux
The computer is worldly node for a small cluster of computers. It is resposible for building a code that is run on the cluster. A shell script is used to start the build process. Occasionally when the script is started it crashes and the following messages are place in /var/log/messages (sorry for the ugly line wrap):
May 11 16:26:56 maggie kernel: mm/memory.c:97: bad pmd ffff81002f6a4000(0000000000000008).
Please grab the latest test kernel from http://people.redhat.com/davej/kernels/Fedora/FC3 and try to reproduce this. It contains debugging code that hopefully will help nail this.
Dave
PJS = Peter J. Stieber PJS>> I have a system with a Tyan 2885 motherboard PJS>> (S2885-ANRF) that uses dual Opteron 244 processors. PJS>> Each processor has 1 GB of memory for a total of PJS>> 2 GB. I am using a SATA HD. I am running the latest PJS>> stock release of the SMP version of the FC3 kernel PJS>> for x86_64. uname -a output follows: PJS>> PJS>> Linux maggie 2.6.11-1.14_FC3smp #1 SMP PJS>> Thu Apr 7 19:36:23 EDT 2005 PJS>> x86_64 x86_64 x86_64 GNU/Linux PJS>> PJS>> The computer is worldly node for a small cluster of PJS>> computers. It is resposible for building a code that PJS>> is run on the cluster. A shell script is used to PJS>> start the build process. Occasionally when the script PJS>> is started it crashes and the following messages are PJS>> place in /var/log/messages (sorry for the ugly line PJS>> wrap): PJS>> PJS>> May 11 16:26:56 maggie kernel: mm/memory.c:97: PJS>> bad pmd ffff81002f6a4000(0000000000000008).
DJ = Dave Jones DJ> Please grab the latest test kernel from DJ> http://people.redhat.com/davej/kernels/Fedora/FC3 DJ> and try to reproduce this. It contains debugging code DJ> that hopefully will help nail this.
Thanks Dave. I loaded the kernel:
Linux maggie 2.6.11-1.24_FC3smp #1 SMP Tue May 10 19:12:22 EDT 2005 x86_64 x86_64 x86_64 GNU/Linux
I'm trying to force the problem to occur, but as was reported on the linux-kenel list, it isn't obvious how to make the problem rear its ugly head.
Are you looking for /var/log/messages output when it happens?
Thanks again for the help. I'm very willing to serve as a debug test bed as my worldly node is a Tyan S2885 Thunder K8W motherboard running the SMP version of x86_64 FC3 and my compute nodes are Tyan S2850 Tomcat K8S motherboards running the non-SMP version of x86_64 FC3.
Will reply to this thread when the problem pops up, Pete
PJS = Peter J. Stieber PJS>> I have a system with a Tyan 2885 motherboard PJS>> (S2885-ANRF) that uses dual Opteron 244 processors. PJS>> Each processor has 1 GB of memory for a total of PJS>> 2 GB. I am using a SATA HD. I am running the latest PJS>> stock release of the SMP version of the FC3 kernel PJS>> for x86_64. uname -a output follows: PJS>> PJS>> Linux maggie 2.6.11-1.14_FC3smp #1 SMP PJS>> Thu Apr 7 19:36:23 EDT 2005 PJS>> x86_64 x86_64 x86_64 GNU/Linux PJS>> PJS>> The computer is worldly node for a small cluster of PJS>> computers. It is resposible for building a code that PJS>> is run on the cluster. A shell script is used to PJS>> start the build process. Occasionally when the script PJS>> is started it crashes and the following messages are PJS>> place in /var/log/messages (sorry for the ugly line PJS>> wrap): PJS>> PJS>> May 11 16:26:56 maggie kernel: mm/memory.c:97: PJS>> bad pmd ffff81002f6a4000(0000000000000008).
DJ = Dave Jones DJ> Please grab the latest test kernel from DJ> http://people.redhat.com/davej/kernels/Fedora/FC3 DJ> and try to reproduce this. It contains debugging code DJ> that hopefully will help nail this.
Thanks Dave. I loaded the kernel:
Linux maggie 2.6.11-1.24_FC3smp #1 SMP Tue May 10 19:12:22 EDT 2005 x86_64 x86_64 x86_64 GNU/Linux
I'm trying to force the problem to occur, but as was reported on the linux-kenel list, it isn't obvious how to make the problem rear its ugly head.
Are you looking for /var/log/messages output when it happens?
Thanks again for the help. I'm very willing to serve as a debug test bed as my worldly node is a Tyan S2885 Thunder K8W motherboard running the SMP version of x86_64 FC3 and my compute nodes are Tyan S2850 Tomcat K8S motherboards running the non-SMP version of x86_64 FC3.
Will reply to this thread when the problem pops up,
The problem is occuring again with Dave's test kernel.
May 14 10:00:18 maggie kernel: collect2:14167: mm/memory.c:98: bad pmd ffff81005856d008(0000000000000008). May 14 10:00:18 maggie kernel: collect2:14167: mm/memory.c:98: bad pmd ffff81005856d018(0000000000000009). May 14 10:00:18 maggie kernel: collect2:14167: mm/memory.c:98: bad pmd ffff81005856d020(0000000000401b80). May 14 10:00:18 maggie kernel: collect2:14167: mm/memory.c:98: bad pmd ffff81005856d028(000000000000000b). May 14 10:00:18 maggie kernel: collect2:14167: mm/memory.c:98: bad pmd ffff81005856d030(00000000000001f4). May 14 10:00:18 maggie kernel: collect2:14167: mm/memory.c:98: bad pmd ffff81005856d038(000000000000000c). May 14 10:00:18 maggie kernel: collect2:14167: mm/memory.c:98: bad pmd ffff81005856d040(00000000000001f4). May 14 10:00:18 maggie kernel: collect2:14167: mm/memory.c:98: bad pmd ffff81005856d048(000000000000000d). May 14 10:00:18 maggie kernel: collect2:14167: mm/memory.c:98: bad pmd ffff81005856d050(00000000000001f7). May 14 10:00:18 maggie kernel: collect2:14167: mm/memory.c:98: bad pmd ffff81005856d058(000000000000000e). May 14 10:00:18 maggie kernel: collect2:14167: mm/memory.c:98: bad pmd ffff81005856d060(00000000000001f7). May 14 10:00:18 maggie kernel: collect2:14167: mm/memory.c:98: bad pmd ffff81005856d068(0000000000000017). May 14 10:00:18 maggie kernel: collect2:14167: mm/memory.c:98: bad pmd ffff81005856d078(000000000000000f). May 14 10:00:18 maggie kernel: collect2:14167: mm/memory.c:98: bad pmd ffff81005856d080(00007ffffffff0a4). May 14 10:00:18 maggie kernel: collect2:14167: mm/memory.c:98: bad pmd ffff81005856d0a0(5f36387800000000). May 14 10:00:18 maggie kernel: collect2:14167: mm/memory.c:98: bad pmd ffff81005856d0a8(0000000000003436).
and from today's logs:
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898d38(00000037e5100a88). May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898d40(0000000000000003). May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898d48(00007ffffffffee9). May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898d50(00007ffffffffeea). May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898d58(00007ffffffffeeb). May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898d68(00007ffffffffeec). May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898d70(00007ffffffffeed). May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898d78(00007ffffffffeee). May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898d80(00007ffffffffeef). May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898d88(00007ffffffffef0). May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898d90(00007ffffffffef1). May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898d98(00007ffffffffef2). May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898da0(00007ffffffffef3). May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898da8(00007ffffffffef4). May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898db0(00007ffffffffef5). May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898db8(00007ffffffffef6). May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898dc0(00007ffffffffef7). May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898dc8(00007ffffffffef8). May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898dd8(0000000000000010). May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898de0(00000000078bfbff). May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898de8(0000000000000006). May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898df0(0000000000001000). May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898df8(0000000000000011). May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898e00(0000000000000064). May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898e08(0000000000000003). May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898e10(0000000000400040). May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898e18(0000000000000004). May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898e20(0000000000000038). May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898e28(0000000000000005). May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898e30(0000000000000009). May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898e38(0000000000000007). May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898e48(0000000000000008). May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898e58(0000000000000009). May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898e60(0000000000417b10). May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898e68(000000000000000b). May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898e78(000000000000000c). May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898e88(000000000000000d). May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898e98(000000000000000e). May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898ea8(0000000000000017). May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898eb8(000000000000000f). May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898ec0(00007ffffffffee2). May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898ee0(34365f3638780000).
Dave,
I'm willing to provide what you need to debug, or try other test kernels. I also posted to the linux-kernel list. Pete
Dave Jones,
I see there has been another kernel upgrade. Before I update I was wondering if you wanted me to provide any more information concerning the x86_64 bad pmd message problem. A new kernel might mask the true underlying problem.
I apologize for being a pest, but I was wondering if you had time to look at
https://www.redhat.com/archives/fedora-list/2005-May/msg02180.html
from the fedora list or
http://www.lib.uaa.alaska.edu/linux-kernel/archive/2005-Week-19/1397.html
from the linux-kernel list. I was wondering what to do next?
Thanks, Pete
On Mon, May 23, 2005 at 11:15:51AM -0700, Peter J. Stieber wrote:
Dave Jones,
I see there has been another kernel upgrade. Before I update I was wondering if you wanted me to provide any more information concerning the x86_64 bad pmd message problem. A new kernel might mask the true underlying problem.
I apologize for being a pest, but I was wondering if you had time to look at
https://www.redhat.com/archives/fedora-list/2005-May/msg02180.html
from the fedora list or
http://www.lib.uaa.alaska.edu/linux-kernel/archive/2005-Week-19/1397.html
from the linux-kernel list. I was wondering what to do next?
Still no real answers on this one, sorry.
Dave
PJS = Peter J. Stieber PJS>> I see there has been another kernel upgrade. Before I PJS>> update I was wondering if you wanted me to provide PJS>> any more information concerning the x86_64 bad pmd PJS>> message problem. A new kernel might mask the true PJS>> underlying problem. PJS>> PJS>> I apologize for being a pest, but I was wondering if you had time to PJS>> look at PJS>> PJS>> https://www.redhat.com/archives/fedora-list/2005-May/msg02180.html PJS>> PJS>> from the fedora list or PJS>> PJS>> http://www.lib.uaa.alaska.edu/linux-kernel/archive/2005-Week-19/1397.html PJS>> PJS>> from the linux-kernel list. I was wondering what to do next?
DJ = Dave Jones wrote: DJ> Still no real answers on this one, sorry.
I was under the impression the test kernel had some type of debug messages in it someone would be interested in. Was I wrong about that?
I guess your saying I should go a head and update and see what happens?
Thanks for taking the time to reply, Pete
On Mon, May 23, 2005 at 01:14:10PM -0700, Peter J. Stieber wrote:
PJS = Peter J. Stieber PJS>> I see there has been another kernel upgrade. Before I PJS>> update I was wondering if you wanted me to provide PJS>> any more information concerning the x86_64 bad pmd PJS>> message problem. A new kernel might mask the true PJS>> underlying problem. PJS>> PJS>> I apologize for being a pest, but I was wondering if you had time to PJS>> look at PJS>> PJS>> https://www.redhat.com/archives/fedora-list/2005-May/msg02180.html PJS>> PJS>> from the fedora list or PJS>> PJS>> http://www.lib.uaa.alaska.edu/linux-kernel/archive/2005-Week-19/1397.html PJS>> PJS>> from the linux-kernel list. I was wondering what to do next?
DJ = Dave Jones wrote: DJ> Still no real answers on this one, sorry.
I was under the impression the test kernel had some type of debug messages in it someone would be interested in. Was I wrong about that?
No, you are correct. But nothing triggered with the latest builds.
I guess your saying I should go a head and update and see what happens?
Theres a number of other fixes in there which may have caused the problem to go into hiding.. I can't reproduce it at all any more, and some others who were seeing it haven't seen it recently either.
Dave
PJS = Peter J. Stieber PJS>> I was under the impression the test kernel had PJS>> some type of debug messages in it someone PJS>> would be interested in. Was I wrong about that?
DJ = Dave Jones DJ> No, you are correct. But nothing triggered with the latest builds.
PJS>> I guess your saying I should go a head and update PJS>> and see what happens?
DJ> Theres a number of other fixes in there which may have DJ> caused the problem to go into hiding.. I can't reproduce DJ> it at all any more, and some others who were seeing it DJ> haven't seen it recently either.
This morning I ran Memtest-86 v3.2 on the machine in question. I let it run for a little over 6 hours. It made 7 passes of the memory tests and had no errors.
Next I updated the kernel to 2.6.11-1.27_FC3smp. The problem happened pretty quickly for me:
May 23 16:34:36 maggie kernel: collect2:9582: mm/memory.c:98: bad pmd ffff8100190fe000(0000000000401b80). May 23 16:34:36 maggie kernel: collect2:9582: mm/memory.c:98: bad pmd ffff8100190fe008(000000000000000b). May 23 16:34:36 maggie kernel: collect2:9582: mm/memory.c:98: bad pmd ffff8100190fe010(0000000000000220). May 23 16:34:36 maggie kernel: collect2:9582: mm/memory.c:98: bad pmd ffff8100190fe018(000000000000000c). May 23 16:34:36 maggie kernel: collect2:9582: mm/memory.c:98: bad pmd ffff8100190fe020(0000000000000220). May 23 16:34:36 maggie kernel: collect2:9582: mm/memory.c:98: bad pmd ffff8100190fe028(000000000000000d). May 23 16:34:36 maggie kernel: collect2:9582: mm/memory.c:98: bad pmd ffff8100190fe030(00000000000001f7). May 23 16:34:36 maggie kernel: collect2:9582: mm/memory.c:98: bad pmd ffff8100190fe038(000000000000000e). May 23 16:34:36 maggie kernel: collect2:9582: mm/memory.c:98: bad pmd ffff8100190fe040(00000000000001f7). May 23 16:34:36 maggie kernel: collect2:9582: mm/memory.c:98: bad pmd ffff8100190fe048(0000000000000017). May 23 16:34:36 maggie kernel: collect2:9582: mm/memory.c:98: bad pmd ffff8100190fe058(000000000000000f). May 23 16:34:36 maggie kernel: collect2:9582: mm/memory.c:98: bad pmd ffff8100190fe060(00007ffffffff081). May 23 16:34:36 maggie kernel: collect2:9582: mm/memory.c:98: bad pmd ffff8100190fe080(0034365f36387800).
The collect2 command is coming from the build sequence I described in earlier emails. If I try it a second time ir runs successfully. I've seen sh cause it too.
Note that 34365f363878 in the last line = 46_68x in ASCII, which is x86_64 in reverse.
Is there anything else I should be looking for?
I'd be willing to try any debug kernel to help find the problem.
Thanks for the help Dave. Pete
On Mon, May 23, 2005 at 04:48:29PM -0700, Peter J. Stieber wrote:
PJS = Peter J. Stieber PJS>> I was under the impression the test kernel had PJS>> some type of debug messages in it someone PJS>> would be interested in. Was I wrong about that?
DJ = Dave Jones DJ> No, you are correct. But nothing triggered with the latest builds.
PJS>> I guess your saying I should go a head and update PJS>> and see what happens?
DJ> Theres a number of other fixes in there which may have DJ> caused the problem to go into hiding.. I can't reproduce DJ> it at all any more, and some others who were seeing it DJ> haven't seen it recently either.
This morning I ran Memtest-86 v3.2 on the machine in question. I let it run for a little over 6 hours. It made 7 passes of the memory tests and had no errors.
Next I updated the kernel to 2.6.11-1.27_FC3smp. The problem happened pretty quickly for me:
May 23 16:34:36 maggie kernel: collect2:9582: mm/memory.c:98: bad pmd ffff8100190fe000(0000000000401b80). May 23 16:34:36 maggie kernel: collect2:9582: mm/memory.c:98: bad pmd ffff8100190fe008(000000000000000b). May 23 16:34:36 maggie kernel: collect2:9582: mm/memory.c:98: bad pmd ffff8100190fe010(0000000000000220). May 23 16:34:36 maggie kernel: collect2:9582: mm/memory.c:98: bad pmd ffff8100190fe018(000000000000000c). May 23 16:34:36 maggie kernel: collect2:9582: mm/memory.c:98: bad pmd ffff8100190fe020(0000000000000220). May 23 16:34:36 maggie kernel: collect2:9582: mm/memory.c:98: bad pmd ffff8100190fe028(000000000000000d). May 23 16:34:36 maggie kernel: collect2:9582: mm/memory.c:98: bad pmd ffff8100190fe030(00000000000001f7). May 23 16:34:36 maggie kernel: collect2:9582: mm/memory.c:98: bad pmd ffff8100190fe038(000000000000000e). May 23 16:34:36 maggie kernel: collect2:9582: mm/memory.c:98: bad pmd ffff8100190fe040(00000000000001f7). May 23 16:34:36 maggie kernel: collect2:9582: mm/memory.c:98: bad pmd ffff8100190fe048(0000000000000017). May 23 16:34:36 maggie kernel: collect2:9582: mm/memory.c:98: bad pmd ffff8100190fe058(000000000000000f). May 23 16:34:36 maggie kernel: collect2:9582: mm/memory.c:98: bad pmd ffff8100190fe060(00007ffffffff081). May 23 16:34:36 maggie kernel: collect2:9582: mm/memory.c:98: bad pmd ffff8100190fe080(0034365f36387800).
The collect2 command is coming from the build sequence I described in earlier emails. If I try it a second time ir runs successfully. I've seen sh cause it too.
Note that 34365f363878 in the last line = 46_68x in ASCII, which is x86_64 in reverse.
Is there anything else I should be looking for?
I'd be willing to try any debug kernel to help find the problem.
Give the test kernel at http://people.redhat.com/davej/kernels/Fedora/ a shot (-28_FC3). That should be slightly different output.
Dave
DJ = Dave Jones DJ> Give the test kernel at http://people.redhat.com/davej/kernels/Fedora/ DJ> a shot (-28_FC3). That should be slightly different output.
Dave,
Here's the latest with 2.6.11-1.28_FC3smp
May 23 21:13:45 maggie kernel: collect2:5556: mm/memory.c:107: bad pmd ffff8100773aa000(0000000000000064). May 23 21:13:45 maggie kernel: collect2:5556: mm/memory.c:107: bad pmd ffff8100773aa008(0000000000000003). May 23 21:13:45 maggie kernel: collect2:5556: mm/memory.c:107: bad pmd ffff8100773aa010(0000000000400040). May 23 21:13:45 maggie kernel: collect2:5556: mm/memory.c:107: bad pmd ffff8100773aa018(0000000000000004). May 23 21:13:45 maggie kernel: collect2:5556: mm/memory.c:107: bad pmd ffff8100773aa020(0000000000000038). May 23 21:13:45 maggie kernel: collect2:5556: mm/memory.c:107: bad pmd ffff8100773aa028(0000000000000005). May 23 21:13:45 maggie kernel: collect2:5556: mm/memory.c:107: bad pmd ffff8100773aa030(0000000000000008). May 23 21:13:45 maggie kernel: collect2:5556: mm/memory.c:107: bad pmd ffff8100773aa038(0000000000000007). May 23 21:13:45 maggie kernel: collect2:5556: mm/memory.c:107: bad pmd ffff8100773aa048(0000000000000008). May 23 21:13:45 maggie kernel: collect2:5556: mm/memory.c:107: bad pmd ffff8100773aa058(0000000000000009). May 23 21:13:45 maggie kernel: collect2:5556: mm/memory.c:107: bad pmd ffff8100773aa060(0000000000401b80). May 23 21:13:45 maggie kernel: collect2:5556: mm/memory.c:107: bad pmd ffff8100773aa068(000000000000000b). May 23 21:13:45 maggie kernel: collect2:5556: mm/memory.c:107: bad pmd ffff8100773aa070(0000000000000220). May 23 21:13:45 maggie kernel: collect2:5556: mm/memory.c:107: bad pmd ffff8100773aa078(000000000000000c). May 23 21:13:45 maggie kernel: collect2:5556: mm/memory.c:107: bad pmd ffff8100773aa080(0000000000000220). May 23 21:13:45 maggie kernel: collect2:5556: mm/memory.c:107: bad pmd ffff8100773aa088(000000000000000d). May 23 21:13:45 maggie kernel: collect2:5556: mm/memory.c:107: bad pmd ffff8100773aa090(00000000000001f7). May 23 21:13:45 maggie kernel: collect2:5556: mm/memory.c:107: bad pmd ffff8100773aa098(000000000000000e). May 23 21:13:45 maggie kernel: collect2:5556: mm/memory.c:107: bad pmd ffff8100773aa0a0(00000000000001f7). May 23 21:13:45 maggie kernel: collect2:5556: mm/memory.c:107: bad pmd ffff8100773aa0a8(0000000000000017). May 23 21:13:45 maggie kernel: collect2:5556: mm/memory.c:107: bad pmd ffff8100773aa0b8(000000000000000f). May 23 21:13:45 maggie kernel: collect2:5556: mm/memory.c:107: bad pmd ffff8100773aa0c0(00007ffffffff0dc). May 23 21:13:45 maggie kernel: collect2:5556: mm/memory.c:107: bad pmd ffff8100773aa0d8(5f36387800000000). May 23 21:13:45 maggie kernel: collect2:5556: mm/memory.c:107: bad pmd ffff8100773aa0e0(0000000000003436).
Does this help?
Pete
On Mon, May 23, 2005 at 09:15:37PM -0700, Peter J. Stieber wrote:
May 23 21:13:45 maggie kernel: collect2:5556: mm/memory.c:107: bad pmd ffff8100773aa0e0(0000000000003436).
Does this help?
Hrmph. No, I screwed up the patch. -29 is rebuilding, it'll appear in the FC3/ subdir of my people page a little while after its done building.
Dave
Dave Jones - Tue, May 24 2005 01:39:16 -0400:
On Mon, May 23, 2005 at 09:15:37PM -0700, Peter J. Stieber wrote:
May 23 21:13:45 maggie kernel: collect2:5556: mm/memory.c:107: bad pmd ffff8100773aa0e0(0000000000003436).
Does this help?
Hrmph. No, I screwed up the patch. -29 is rebuilding, it'll appear in the FC3/ subdir of my people page a little while after its done building.
Dave
Having similar hardware and the same problems I tried -29 and get errors as follows:
May 24 07:47:19 sun kernel: mm/memory.c:109: bad pmd ffff810031a22c20(000000000000000b ). May 24 07:47:19 sun kernel: cut:4154 free pmd ffff810031a22c30 freed by 0xffffffffffff ffff May 24 07:47:19 sun kernel: mm/memory.c:109: bad pmd ffff810031a22c30(000000000000000c ). May 24 07:47:19 sun kernel: cut:4154 free pmd ffff810031a22c40 freed by 0xffffffffffff ffff May 24 07:47:19 sun kernel: mm/memory.c:109: bad pmd ffff810031a22c40(000000000000000d ). May 24 07:47:19 sun kernel: cut:4154 free pmd ffff810031a22c50 freed by 0xffffffffffff ffff May 24 07:47:19 sun kernel: mm/memory.c:109: bad pmd ffff810031a22c50(000000000000000e ). May 24 07:47:19 sun kernel: cut:4154 free pmd ffff810031a22c60 freed by 0xffffffffffff ffff May 24 07:47:19 sun kernel: mm/memory.c:109: bad pmd ffff810031a22c60(0000000000000017 ). May 24 07:47:19 sun kernel: cut:4154 free pmd ffff810031a22c70 freed by 0xffffffffffff ffff May 24 07:47:19 sun kernel: mm/memory.c:109: bad pmd ffff810031a22c70(000000000000000f ). May 24 07:47:19 sun kernel: cut:4154 free pmd ffff810031a22c78 freed by 0xffffffffffff ffff May 24 07:47:19 sun kernel: mm/memory.c:109: bad pmd ffff810031a22c78(00007ffffffffc99 ). May 24 07:47:19 sun kernel: cut:4154 free pmd ffff810031a22c98 freed by 0xffffffffffff ffff May 24 07:47:19 sun kernel: mm/memory.c:109: bad pmd ffff810031a22c98(0034365f36387800 ).
I can reproduce these errors by running the rootkit hunter [1].
Regards
Christoph
DJ = Dave Jones DJ> Hrmph. No, I screwed up the patch. DJ> -29 is rebuilding, it'll appear in the FC3/ subdir of my DJ> people page a little while after its done building.
here's what I get with 2.6.11-1.29_FC3smp.
May 24 06:55:43 maggie kernel: collect2:5519 free pmd ffff8100777f1000 freed by 0xffffffffffffffff May 24 06:55:43 maggie kernel: mm/memory.c:109: bad pmd ffff8100777f1000(0000000000000064). May 24 06:55:43 maggie kernel: collect2:5519 free pmd ffff8100777f1008 freed by 0xffffffffffffffff May 24 06:55:43 maggie kernel: mm/memory.c:109: bad pmd ffff8100777f1008(0000000000000003). May 24 06:55:43 maggie kernel: collect2:5519 free pmd ffff8100777f1010 freed by 0xffffffffffffffff May 24 06:55:43 maggie kernel: mm/memory.c:109: bad pmd ffff8100777f1010(0000000000400040). May 24 06:55:43 maggie kernel: collect2:5519 free pmd ffff8100777f1018 freed by 0xffffffffffffffff May 24 06:55:43 maggie kernel: mm/memory.c:109: bad pmd ffff8100777f1018(0000000000000004). May 24 06:55:43 maggie kernel: collect2:5519 free pmd ffff8100777f1020 freed by 0xffffffffffffffff May 24 06:55:43 maggie kernel: mm/memory.c:109: bad pmd ffff8100777f1020(0000000000000038). May 24 06:55:43 maggie kernel: collect2:5519 free pmd ffff8100777f1028 freed by 0xffffffffffffffff May 24 06:55:43 maggie kernel: mm/memory.c:109: bad pmd ffff8100777f1028(0000000000000005). May 24 06:55:43 maggie kernel: collect2:5519 free pmd ffff8100777f1030 freed by 0xffffffffffffffff May 24 06:55:43 maggie kernel: mm/memory.c:109: bad pmd ffff8100777f1030(0000000000000008). May 24 06:55:43 maggie kernel: collect2:5519 free pmd ffff8100777f1038 freed by 0xffffffffffffffff May 24 06:55:43 maggie kernel: mm/memory.c:109: bad pmd ffff8100777f1038(0000000000000007). May 24 06:55:43 maggie kernel: collect2:5519 free pmd ffff8100777f1048 freed by 0xffffffffffffffff May 24 06:55:43 maggie kernel: mm/memory.c:109: bad pmd ffff8100777f1048(0000000000000008). May 24 06:55:43 maggie kernel: collect2:5519 free pmd ffff8100777f1058 freed by 0xffffffffffffffff May 24 06:55:43 maggie kernel: mm/memory.c:109: bad pmd ffff8100777f1058(0000000000000009). May 24 06:55:43 maggie kernel: collect2:5519 free pmd ffff8100777f1060 freed by 0xffffffffffffffff May 24 06:55:43 maggie kernel: mm/memory.c:109: bad pmd ffff8100777f1060(0000000000401b80). May 24 06:55:43 maggie kernel: collect2:5519 free pmd ffff8100777f1068 freed by 0xffffffffffffffff May 24 06:55:43 maggie kernel: mm/memory.c:109: bad pmd ffff8100777f1068(000000000000000b). May 24 06:55:43 maggie kernel: collect2:5519 free pmd ffff8100777f1070 freed by 0xffffffffffffffff May 24 06:55:43 maggie kernel: mm/memory.c:109: bad pmd ffff8100777f1070(0000000000000220). May 24 06:55:43 maggie kernel: collect2:5519 free pmd ffff8100777f1078 freed by 0xffffffffffffffff May 24 06:55:43 maggie kernel: mm/memory.c:109: bad pmd ffff8100777f1078(000000000000000c). May 24 06:55:43 maggie kernel: collect2:5519 free pmd ffff8100777f1080 freed by 0xffffffffffffffff May 24 06:55:43 maggie kernel: mm/memory.c:109: bad pmd ffff8100777f1080(0000000000000220). May 24 06:55:43 maggie kernel: collect2:5519 free pmd ffff8100777f1088 freed by 0xffffffffffffffff May 24 06:55:43 maggie kernel: mm/memory.c:109: bad pmd ffff8100777f1088(000000000000000d). May 24 06:55:43 maggie kernel: collect2:5519 free pmd ffff8100777f1090 freed by 0xffffffffffffffff May 24 06:55:43 maggie kernel: mm/memory.c:109: bad pmd ffff8100777f1090(00000000000001f7). May 24 06:55:43 maggie kernel: collect2:5519 free pmd ffff8100777f1098 freed by 0xffffffffffffffff May 24 06:55:43 maggie kernel: mm/memory.c:109: bad pmd ffff8100777f1098(000000000000000e). May 24 06:55:43 maggie kernel: collect2:5519 free pmd ffff8100777f10a0 freed by 0xffffffffffffffff May 24 06:55:43 maggie kernel: mm/memory.c:109: bad pmd ffff8100777f10a0(00000000000001f7). May 24 06:55:43 maggie kernel: collect2:5519 free pmd ffff8100777f10a8 freed by 0xffffffffffffffff May 24 06:55:43 maggie kernel: mm/memory.c:109: bad pmd ffff8100777f10a8(0000000000000017). May 24 06:55:43 maggie kernel: collect2:5519 free pmd ffff8100777f10b8 freed by 0xffffffffffffffff May 24 06:55:43 maggie kernel: mm/memory.c:109: bad pmd ffff8100777f10b8(000000000000000f). May 24 06:55:43 maggie kernel: collect2:5519 free pmd ffff8100777f10c0 freed by 0xffffffffffffffff May 24 06:55:43 maggie kernel: mm/memory.c:109: bad pmd ffff8100777f10c0(00007ffffffff0dc). May 24 06:55:43 maggie kernel: collect2:5519 free pmd ffff8100777f10d8 freed by 0xffffffffffffffff May 24 06:55:43 maggie kernel: mm/memory.c:109: bad pmd ffff8100777f10d8(5f36387800000000). May 24 06:55:43 maggie kernel: collect2:5519 free pmd ffff8100777f10e0 freed by 0xffffffffffffffff May 24 06:55:43 maggie kernel: mm/memory.c:109: bad pmd ffff8100777f10e0(0000000000003436).
I noticed Christoph Franke's reply. It is reassuring to see someone else with the same problem, although I doubt it's reassuring for Cristophe ;-)
Does the latest output help?
Thanks again Dave, Pete
On Tue, May 24, 2005 at 07:02:40AM -0700, Peter J. Stieber wrote:
DJ = Dave Jones DJ> Hrmph. No, I screwed up the patch. DJ> -29 is rebuilding, it'll appear in the FC3/ subdir of my DJ> people page a little while after its done building.
here's what I get with 2.6.11-1.29_FC3smp.
May 24 06:55:43 maggie kernel: collect2:5519 free pmd ffff8100777f1000 freed by 0xffffffffffffffff May 24 06:55:43 maggie kernel: mm/memory.c:109: bad pmd ffff8100777f1000(0000000000000064). May 24 06:55:43 maggie kernel: collect2:5519 free pmd ffff8100777f1008 freed by 0xffffffffffffffff May 24 06:55:43 maggie kernel: mm/memory.c:109: bad pmd ffff8100777f1008(0000000000000003). May 24 06:55:43 maggie kernel: collect2:5519 free pmd ffff8100777f1010 freed by 0xffffffffffffffff May 24 06:55:43 maggie kernel: mm/memory.c:109: bad pmd ffff8100777f1010(0000000000400040). May 24 06:55:43 maggie kernel: collect2:5519 free pmd ffff8100777f1018 freed by 0xffffffffffffffff May 24 06:55:43 maggie kernel: mm/memory.c:109: bad pmd ffff8100777f1018(0000000000000004). May 24 06:55:43 maggie kernel: collect2:5519 free pmd ffff8100777f1020 freed by 0xffffffffffffffff May 24 06:55:43 maggie kernel: mm/memory.c:109: bad pmd ffff8100777f1020(0000000000000038). May 24 06:55:43 maggie kernel: collect2:5519 free pmd ffff8100777f1028 freed by 0xffffffffffffffff May 24 06:55:43 maggie kernel: mm/memory.c:109: bad pmd ffff8100777f1028(0000000000000005). May 24 06:55:43 maggie kernel: collect2:5519 free pmd ffff8100777f1030 freed by 0xffffffffffffffff May 24 06:55:43 maggie kernel: mm/memory.c:109: bad pmd ffff8100777f1030(0000000000000008). May 24 06:55:43 maggie kernel: collect2:5519 free pmd ffff8100777f1038 freed by 0xffffffffffffffff May 24 06:55:43 maggie kernel: mm/memory.c:109: bad pmd ffff8100777f1038(0000000000000007). May 24 06:55:43 maggie kernel: collect2:5519 free pmd ffff8100777f1048 freed by 0xffffffffffffffff May 24 06:55:43 maggie kernel: mm/memory.c:109: bad pmd ffff8100777f1048(0000000000000008). May 24 06:55:43 maggie kernel: collect2:5519 free pmd ffff8100777f1058 freed by 0xffffffffffffffff May 24 06:55:43 maggie kernel: mm/memory.c:109: bad pmd ffff8100777f1058(0000000000000009). May 24 06:55:43 maggie kernel: collect2:5519 free pmd ffff8100777f1060 freed by 0xffffffffffffffff May 24 06:55:43 maggie kernel: mm/memory.c:109: bad pmd ffff8100777f1060(0000000000401b80). May 24 06:55:43 maggie kernel: collect2:5519 free pmd ffff8100777f1068 freed by 0xffffffffffffffff May 24 06:55:43 maggie kernel: mm/memory.c:109: bad pmd ffff8100777f1068(000000000000000b). May 24 06:55:43 maggie kernel: collect2:5519 free pmd ffff8100777f1070 freed by 0xffffffffffffffff May 24 06:55:43 maggie kernel: mm/memory.c:109: bad pmd ffff8100777f1070(0000000000000220). May 24 06:55:43 maggie kernel: collect2:5519 free pmd ffff8100777f1078 freed by 0xffffffffffffffff May 24 06:55:43 maggie kernel: mm/memory.c:109: bad pmd ffff8100777f1078(000000000000000c). May 24 06:55:43 maggie kernel: collect2:5519 free pmd ffff8100777f1080 freed by 0xffffffffffffffff May 24 06:55:43 maggie kernel: mm/memory.c:109: bad pmd ffff8100777f1080(0000000000000220). May 24 06:55:43 maggie kernel: collect2:5519 free pmd ffff8100777f1088 freed by 0xffffffffffffffff May 24 06:55:43 maggie kernel: mm/memory.c:109: bad pmd ffff8100777f1088(000000000000000d). May 24 06:55:43 maggie kernel: collect2:5519 free pmd ffff8100777f1090 freed by 0xffffffffffffffff May 24 06:55:43 maggie kernel: mm/memory.c:109: bad pmd ffff8100777f1090(00000000000001f7). May 24 06:55:43 maggie kernel: collect2:5519 free pmd ffff8100777f1098 freed by 0xffffffffffffffff May 24 06:55:43 maggie kernel: mm/memory.c:109: bad pmd ffff8100777f1098(000000000000000e). May 24 06:55:43 maggie kernel: collect2:5519 free pmd ffff8100777f10a0 freed by 0xffffffffffffffff May 24 06:55:43 maggie kernel: mm/memory.c:109: bad pmd ffff8100777f10a0(00000000000001f7). May 24 06:55:43 maggie kernel: collect2:5519 free pmd ffff8100777f10a8 freed by 0xffffffffffffffff May 24 06:55:43 maggie kernel: mm/memory.c:109: bad pmd ffff8100777f10a8(0000000000000017). May 24 06:55:43 maggie kernel: collect2:5519 free pmd ffff8100777f10b8 freed by 0xffffffffffffffff May 24 06:55:43 maggie kernel: mm/memory.c:109: bad pmd ffff8100777f10b8(000000000000000f). May 24 06:55:43 maggie kernel: collect2:5519 free pmd ffff8100777f10c0 freed by 0xffffffffffffffff May 24 06:55:43 maggie kernel: mm/memory.c:109: bad pmd ffff8100777f10c0(00007ffffffff0dc). May 24 06:55:43 maggie kernel: collect2:5519 free pmd ffff8100777f10d8 freed by 0xffffffffffffffff May 24 06:55:43 maggie kernel: mm/memory.c:109: bad pmd ffff8100777f10d8(5f36387800000000). May 24 06:55:43 maggie kernel: collect2:5519 free pmd ffff8100777f10e0 freed by 0xffffffffffffffff May 24 06:55:43 maggie kernel: mm/memory.c:109: bad pmd ffff8100777f10e0(0000000000003436).
I noticed Christoph Franke's reply. It is reassuring to see someone else with the same problem, although I doubt it's reassuring for Cristophe ;-)
Does the latest output help?
Hmm, not really. Need to stare at it some more.
Dave
PJS = Peter J. Stieber PJS>> here's what I get with 2.6.11-1.29_FC3smp.
<snip></snip>
PJS>> I noticed Christoph Franke's reply. PJS>> It is reassuring to see someone else PJS>> with the same problem, although I PJS>> doubt it's reassuring for Cristophe ;-) PJS>> PJS>> Does the latest output help?
DJ = Dave Jones DJ> Hmm, not really. Need to stare at it some more.
I really appreciate you looking into this. I will check the list periodically for further instructions/suggestions.
Thanks again, Pete
Dave Jones - Tue, May 24 2005 12:57:15 -0400:
On Tue, May 24, 2005 at 07:02:40AM -0700, Peter J. Stieber wrote:
DJ = Dave Jones DJ> Hrmph. No, I screwed up the patch. DJ> -29 is rebuilding, it'll appear in the FC3/ subdir of my DJ> people page a little while after its done building.
here's what I get with 2.6.11-1.29_FC3smp.
May 24 06:55:43 maggie kernel: collect2:5519 free pmd ffff8100777f1000 freed by 0xffffffffffffffff May 24 06:55:43 maggie kernel: mm/memory.c:109: bad pmd ffff8100777f1000(0000000000000064).
[snip]
I noticed a -30 build on your webspace at people.redhat.com. Is this based on 2.6.11.11, and does it nail the bad pmd bug?
Regards
Christoph
On Tue, 31 May 2005 22:30:43 +0200 Christoph Franke news@thefranke.net wrote:
Dave Jones - Tue, May 24 2005 12:57:15 -0400:
On Tue, May 24, 2005 at 07:02:40AM -0700, Peter J. Stieber wrote:
DJ = Dave Jones DJ> Hrmph. No, I screwed up the patch. DJ> -29 is rebuilding, it'll appear in the FC3/ subdir of my DJ> people page a little while after its done building.
here's what I get with 2.6.11-1.29_FC3smp.
May 24 06:55:43 maggie kernel: collect2:5519 free pmd ffff8100777f1000 freed by 0xffffffffffffffff May 24 06:55:43 maggie kernel: mm/memory.c:109: bad pmd ffff8100777f1000(0000000000000064).
[snip]
I noticed a -30 build on your webspace at people.redhat.com. Is this based on 2.6.11.11, and does it nail the bad pmd bug?
Christoph,
I am not Dave, sorry ;). Testing Dave's .30 build right now on my compute nodes and yet to see any errors, even with memory-intensive multi-threaded jobs. But I can't test with my fileserver at the moment since jobs already running that need NFS access. The fileserver box is the one that gave me most headaches before so I will try rebooting it under .30 build in the next 24 hours and post results ASAP.
Cheers, Ivan
On Tue, May 31, 2005 at 10:30:43PM +0200, Christoph Franke wrote:
Dave Jones - Tue, May 24 2005 12:57:15 -0400:
On Tue, May 24, 2005 at 07:02:40AM -0700, Peter J. Stieber wrote:
DJ = Dave Jones DJ> Hrmph. No, I screwed up the patch. DJ> -29 is rebuilding, it'll appear in the FC3/ subdir of my DJ> people page a little while after its done building.
here's what I get with 2.6.11-1.29_FC3smp.
May 24 06:55:43 maggie kernel: collect2:5519 free pmd ffff8100777f1000 freed by 0xffffffffffffffff May 24 06:55:43 maggie kernel: mm/memory.c:109: bad pmd ffff8100777f1000(0000000000000064).
[snip]
I noticed a -30 build on your webspace at people.redhat.com. Is this based on 2.6.11.11, and does it nail the bad pmd bug?
No idea yet, you tell me :)
Dave
Dave Jones - Tue, May 31 2005 19:16:34 -0400:
On Tue, May 31, 2005 at 10:30:43PM +0200, Christoph Franke wrote:
Dave Jones - Tue, May 24 2005 12:57:15 -0400:
On Tue, May 24, 2005 at 07:02:40AM -0700, Peter J. Stieber wrote:
DJ = Dave Jones DJ> Hrmph. No, I screwed up the patch. DJ> -29 is rebuilding, it'll appear in the FC3/ subdir of my DJ> people page a little while after its done building.
here's what I get with 2.6.11-1.29_FC3smp.
May 24 06:55:43 maggie kernel: collect2:5519 free pmd ffff8100777f1000 freed by 0xffffffffffffffff May 24 06:55:43 maggie kernel: mm/memory.c:109: bad pmd ffff8100777f1000(0000000000000064).
[snip]
I noticed a -30 build on your webspace at people.redhat.com. Is this based on 2.6.11.11, and does it nail the bad pmd bug?
No idea yet, you tell me :)
Ok :-) But could you tell us if you already based this one upon 2.6.11.11, which contains some x86_64 fixes?
Regards
Christoph
--- Christoph Franke news@thefranke.net wrote:
Dave Jones - Tue, May 31 2005 19:16:34 -0400:
On Tue, May 31, 2005 at 10:30:43PM +0200, Christoph Franke wrote:
Dave Jones - Tue, May 24 2005 12:57:15 -0400:
On Tue, May 24, 2005 at 07:02:40AM -0700, Peter J. Stieber
wrote:
DJ = Dave Jones DJ> Hrmph. No, I screwed up the patch. DJ> -29 is rebuilding, it'll appear in the FC3/ subdir of
my
DJ> people page a little while after its done building.
here's what I get with 2.6.11-1.29_FC3smp.
May 24 06:55:43 maggie kernel: collect2:5519 free pmd
ffff8100777f1000
freed by 0xffffffffffffffff May 24 06:55:43 maggie kernel: mm/memory.c:109: bad pmd ffff8100777f1000(0000000000000064).
[snip]
I noticed a -30 build on your webspace at people.redhat.com. Is
this
based on 2.6.11.11, and does it nail the bad pmd bug?
No idea yet, you tell me :)
Ok :-) But could you tell us if you already based this one upon 2.6.11.11, which contains some x86_64 fixes?
Regards
Christoph
I had the same problem with memory, but I was unfortunatly on Windows at the time. I bought a new memory piece for the slot, I dont suggest this if your as poor as I am, going broke as we type. >.<
DONT KILL THE NEWBIE(me) -ICE
-- | GnuPG Public Key: http://www.thefranke.net/public.asc | | The BOFH Archive at: http://bofh.ntk.net/Bastard.html | | Registered Linux User: 250439 - http://counter.li.org |
--
fedora-list mailing list fedora-list@redhat.com To unsubscribe: http://www.redhat.com/mailman/listinfo/fedora-list
__________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
On Wed, Jun 01, 2005 at 07:50:25AM +0200, Christoph Franke wrote:
Dave Jones - Tue, May 31 2005 19:16:34 -0400:
On Tue, May 31, 2005 at 10:30:43PM +0200, Christoph Franke wrote:
Dave Jones - Tue, May 24 2005 12:57:15 -0400:
On Tue, May 24, 2005 at 07:02:40AM -0700, Peter J. Stieber wrote:
DJ = Dave Jones DJ> Hrmph. No, I screwed up the patch. DJ> -29 is rebuilding, it'll appear in the FC3/ subdir of my DJ> people page a little while after its done building.
here's what I get with 2.6.11-1.29_FC3smp.
May 24 06:55:43 maggie kernel: collect2:5519 free pmd ffff8100777f1000 freed by 0xffffffffffffffff May 24 06:55:43 maggie kernel: mm/memory.c:109: bad pmd ffff8100777f1000(0000000000000064).
[snip]
I noticed a -30 build on your webspace at people.redhat.com. Is this based on 2.6.11.11, and does it nail the bad pmd bug?
No idea yet, you tell me :)
Ok :-) But could you tell us if you already based this one upon 2.6.11.11, which contains some x86_64 fixes?
Yes, it does include 2.6.11.11
Dave
Dave Jones - Wed, Jun 01 2005 02:20:46 -0400:
Ok :-) But could you tell us if you already based this one upon 2.6.11.11, which contains some x86_64 fixes?
Yes, it does include 2.6.11.11
Great. Ok, another good news is: so far I am unable to reproduce the problem any longer!
/home/christoph$ cat davej >> ./notes/people_to_hug_before_I_die.txt
Regards
Christoph
On Wednesday 01 June 2005 02:47 am, Christoph Franke wrote:
Dave Jones - Wed, Jun 01 2005 02:20:46 -0400:
Ok :-) But could you tell us if you already based this one upon 2.6.11.11, which contains some x86_64 fixes?
Yes, it does include 2.6.11.11
Great. Ok, another good news is: so far I am unable to reproduce the problem any longer!
/home/christoph$ cat davej >> ./notes/people_to_hug_before_I_die.txt
Just rebooted fileserver under -30 build, looks like running clean, 2 compute nodes also have not seen errors since the new kernel installed, quite promising I would say. I'll report if I see any errors.
Cheers, Ivan
Ivan Adzhubey - Wed, Jun 01 2005 04:06:19 -0400:
On Wednesday 01 June 2005 02:47 am, Christoph Franke wrote:
Dave Jones - Wed, Jun 01 2005 02:20:46 -0400:
Ok :-) But could you tell us if you already based this one upon 2.6.11.11, which contains some x86_64 fixes?
Yes, it does include 2.6.11.11
Great. Ok, another good news is: so far I am unable to reproduce the problem any longer!
/home/christoph$ cat davej >> ./notes/people_to_hug_before_I_die.txt
Just rebooted fileserver under -30 build, looks like running clean, 2 compute nodes also have not seen errors since the new kernel installed, quite promising I would say. I'll report if I see any errors.
Great V2.0. Meanwhile we got a -31 build, but I don't find a changelog. Perhaps Dave can give us a hint.
Regards
Christoph
CF = Christoph Franke CF>>>> Ok :-) But could you tell us if you already based this one upon CF>>>> 2.6.11.11, which contains some x86_64 fixes?
DJ = Dave Jones DJ>>> Yes, it does include 2.6.11.11
CF>> Great. Ok, another good news is: so far I am CF>> unable to reproduce the CF>> problem any longer! CF>> CF>> /home/christoph$ cat davej >> ./notes/people_to_hug_before_I_die.txt
IA = Ivan Adzhubey IA> Just rebooted fileserver under -30 build, looks IA> like running clean, 2 compute nodes also have not IA> seen errors since the new kernel installed, quite IA> promising I would say. I'll report if I see any errors.
Dave,
I only see -31 in you web content. I must have missed a version.
Was there a fix specifically added to address the x86_64 bad pmd issue, or is it simply masked due to other changes?
I'll load 2.6.11-1.31smp and report back.
Pete
On Wed, Jun 01, 2005 at 11:46:06AM +0200, Christoph Franke wrote:
Ivan Adzhubey - Wed, Jun 01 2005 04:06:19 -0400:
On Wednesday 01 June 2005 02:47 am, Christoph Franke wrote:
Dave Jones - Wed, Jun 01 2005 02:20:46 -0400:
Ok :-) But could you tell us if you already based this one upon 2.6.11.11, which contains some x86_64 fixes?
Yes, it does include 2.6.11.11
Great. Ok, another good news is: so far I am unable to reproduce the problem any longer!
/home/christoph$ cat davej >> ./notes/people_to_hug_before_I_die.txt
Just rebooted fileserver under -30 build, looks like running clean, 2 compute nodes also have not seen errors since the new kernel installed, quite promising I would say. I'll report if I see any errors.
Great V2.0. Meanwhile we got a -31 build, but I don't find a changelog. Perhaps Dave can give us a hint.
Exec-shield update. Might fix some of the 'ntpd segfault' and similar bugs.
Dave
Dave Jones - Wed, Jun 01 2005 15:10:50 -0400:
Exec-shield update. Might fix some of the 'ntpd segfault' and similar bugs.
Thanks for this info and sorry for my blindness. The spec file contains the changelog with the last entry as follows:
%changelog * Wed Jun 1 2005 Dave Jones davej@redhat.com - Exec-shield improvements (Should fix #154759)
Regards
Christoph
CF = Christoph Franke CF>>>>> Ok :-) But could you tell us if you already based this one upon CF>>>>> 2.6.11.11, which contains some x86_64 fixes?
DJ = Dave Jones DJ>>>> Yes, it does include 2.6.11.11
CF>>> Great. Ok, another good news is: so far I am CF>>> unable to reproduce the CF>>> problem any longer! CF>>> CF>>> /home/christoph$ cat davej CF>>> ./notes/people_to_hug_before_I_die.txt
IA = Ivan Adzhubey IA>> Just rebooted fileserver under -30 build, looks IA>> like running clean, 2 compute nodes also have not IA>> seen errors since the new kernel installed, quite IA>> promising I would say. I'll report if I see any errors.
PS = Peter J. Stieber PS> Dave, PS> PS> I only see -31 in you web content. I must have PS> missed a version. PS> PS> Was there a fix specifically added to address the PS> x86_64 bad pmd issue, or is it simply masked due PS> to other changes? PS> PS> I'll load 2.6.11-1.31smp and report back.
I have been running 2.6.11-1.31_FC3smp for over a day and loading the system in a manner that has caused the bad pmd problem in the past, but haven't seen the problem :-))
Dave, I'm still wondering if a specific patch was added to this version to address the bad pmd problem?
Thank for all your efforts, Pete
On Thursday 02 June 2005 11:29 am, Peter J. Stieber wrote:
CF = Christoph Franke CF>>>>> Ok :-) But could you tell us if you already based this one upon CF>>>>> 2.6.11.11, which contains some x86_64 fixes?
DJ = Dave Jones DJ>>>> Yes, it does include 2.6.11.11
CF>>> Great. Ok, another good news is: so far I am CF>>> unable to reproduce the CF>>> problem any longer! CF>>> CF>>> /home/christoph$ cat davej CF>>> ./notes/people_to_hug_before_I_die.txt
IA = Ivan Adzhubey IA>> Just rebooted fileserver under -30 build, looks IA>> like running clean, 2 compute nodes also have not IA>> seen errors since the new kernel installed, quite IA>> promising I would say. I'll report if I see any errors.
PS = Peter J. Stieber PS> Dave, PS> PS> I only see -31 in you web content. I must have PS> missed a version. PS> PS> Was there a fix specifically added to address the PS> x86_64 bad pmd issue, or is it simply masked due PS> to other changes? PS> PS> I'll load 2.6.11-1.31smp and report back.
I have been running 2.6.11-1.31_FC3smp for over a day and loading the system in a manner that has caused the bad pmd problem in the past, but haven't seen the problem :-))
I can confirm, 3 boxes here now running 48 hours under heavy to regular load, no more pmd errors.
Dave, I'm still wondering if a specific patch was added to this version to address the bad pmd problem?
Thank for all your efforts, Pete
On Thu, Jun 02, 2005 at 11:38:27AM -0400, Ivan Adzhubey wrote:
I have been running 2.6.11-1.31_FC3smp for over a day and loading the system in a manner that has caused the bad pmd problem in the past, but haven't seen the problem :-))
I can confirm, 3 boxes here now running 48 hours under heavy to regular load, no more pmd errors.
Two positive reports, and no negatives so far. It is starting to look good, but I'll give it a few more days before I pronounce this bug 'dead'.
There were a number of x86-64 changes in 2.6.11.11. Looks like Andi picked the right bits to backport.
Dave
PS = Peter J. Stieber PS>>> I have been running 2.6.11-1.31_FC3smp for over a day PS>>> and loading the system in a manner that has caused the PS>>> bad pmd problem in the past, but haven't seen the PS>>> problem :-))
IA = Ivan Adzhubey IA>> I can confirm, 3 boxes here now running 48 hours under IA>> heavy to regular load, no more pmd errors.
DJ = Dave Jones DJ> Two positive reports, and no negatives so far. It is starting DJ> to look good, but I'll give it a few more days before I DJ> pronounce this bug 'dead'.
Very wise. It has taken a while (a day or two) to show up in the past after a reboot, but I have been running my system pretty hard and still no problems :-)
DJ> There were a number of x86-64 changes in 2.6.11.11. DJ> Looks like Andi picked the right bits to backport.
I mentioned your patched kernel in
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=155857
so hopefully others will confirm that all is well after testing.
Thanks again for your efforts. Pete
Dave Jones - Thu, Jun 02 2005 13:21:20 -0400:
On Thu, Jun 02, 2005 at 11:38:27AM -0400, Ivan Adzhubey wrote:
I have been running 2.6.11-1.31_FC3smp for over a day and loading the system in a manner that has caused the bad pmd problem in the past, but haven't seen the problem :-))
I can confirm, 3 boxes here now running 48 hours under heavy to regular load, no more pmd errors.
Two positive reports, and no negatives so far. It is starting to look good, but I'll give it a few more days before I pronounce this bug 'dead'.
Three, be to exact. :-)
There were a number of x86-64 changes in 2.6.11.11. Looks like Andi picked the right bits to backport.
Seems to be. But the -31 build made some trouble. Not only didn't it prevent ntpd from segfaulting (on boot, later on I can restart the service), I had a few applications that didn't like the exec shield patch, e.g. the teamspeak linux server, which segfaults when started with this build. But as far as I know this could easily be a problem of the application, which is nice to know but doesn't help really.
Regards
Christoph
On Thu, Jun 02, 2005 at 08:15:01PM +0200, Christoph Franke wrote:
Dave Jones - Thu, Jun 02 2005 13:21:20 -0400:
On Thu, Jun 02, 2005 at 11:38:27AM -0400, Ivan Adzhubey wrote:
I have been running 2.6.11-1.31_FC3smp for over a day and loading the system in a manner that has caused the bad pmd problem in the past, but haven't seen the problem :-))
I can confirm, 3 boxes here now running 48 hours under heavy to regular load, no more pmd errors.
Two positive reports, and no negatives so far. It is starting to look good, but I'll give it a few more days before I pronounce this bug 'dead'.
Three, be to exact. :-)
There were a number of x86-64 changes in 2.6.11.11. Looks like Andi picked the right bits to backport.
Seems to be. But the -31 build made some trouble. Not only didn't it prevent ntpd from segfaulting (on boot, later on I can restart the service), I had a few applications that didn't like the exec shield patch, e.g. the teamspeak linux server, which segfaults when started with this build. But as far as I know this could easily be a problem of the application, which is nice to know but doesn't help really.
Is it repeatable ? Does it behave again if you boot with exec-shield=0 ? or exec-shield-randomize=0 ?
Dave
Dave Jones - Thu, Jun 02 2005 14:25:48 -0400:
Is it repeatable ? Does it behave again if you boot with exec-shield=0 ? or exec-shield-randomize=0 ?
Yes, it is repeatable, teamspeak segfaults on every start. Ntpd segfaults on both build -30 and -31 during boot, but can be restarted afterwards. Will try a boot with exec-shield=0 tomorrow morning and pass through the results.
Besides: may I ask if your plans to release a 2.6.12 kernel rpm for FC3 are stil valid?
Regards
Christoph
On Thu, Jun 02, 2005 at 08:55:40PM +0200, Christoph Franke wrote:
Dave Jones - Thu, Jun 02 2005 14:25:48 -0400:
Is it repeatable ? Does it behave again if you boot with exec-shield=0 ? or exec-shield-randomize=0 ?
Yes, it is repeatable, teamspeak segfaults on every start. Ntpd segfaults on both build -30 and -31 during boot, but can be restarted afterwards. Will try a boot with exec-shield=0 tomorrow morning and pass through the results.
Ok.
Besides: may I ask if your plans to release a 2.6.12 kernel rpm for FC3 are stil valid?
Sure, but from the looks of things, thats still a way into the future, so I'd expect to see antoher 2.6.11 based errata for FC3 first. Upstream still seems a few weeks off (I've heard a few rumours about a -rc6 being planned). FC4 will get a rebase to 2.6.12, and a week or so later (if all goes well), I'll do a backport of the FC4 kernel (minus bits like Xen) to FC3, and see what breaks.
Dave
Dave Jones - Thu, Jun 02 2005 15:01:20 -0400:
On Thu, Jun 02, 2005 at 08:55:40PM +0200, Christoph Franke wrote:
Dave Jones - Thu, Jun 02 2005 14:25:48 -0400:
Is it repeatable ? Does it behave again if you boot with exec-shield=0 ? or exec-shield-randomize=0 ?
Yes, it is repeatable, teamspeak segfaults on every start. Ntpd segfaults on both build -30 and -31 during boot, but can be restarted afterwards. Will try a boot with exec-shield=0 tomorrow morning and pass through the results.
Ok.
Ok, I did a reboot with parameter exec-shield=0 and teamspeak started right away. Ntpd is unchanged, here is the output
3 Jun 06:29:10 ntpd[4293]: synchronized to 192.53.103.104, stratum 1 3 Jun 06:46:04 ntpd[4293]: synchronized to 192.53.103.103, stratum 1 3 Jun 07:15:59 ntpd[4293]: ntpd exiting on signal 15 3 Jun 07:18:40 ntpd[3066]: signal_no_reset: signal 17 had flags 4000000 3 Jun 07:18:42 ntpd[3066]: signal_no_reset: signal 14 had flags 4000000
A restart though works as already mentioned.
Besides: may I ask if your plans to release a 2.6.12 kernel rpm for FC3 are stil valid?
Sure, but from the looks of things, thats still a way into the future, so I'd expect to see antoher 2.6.11 based errata for FC3 first. Upstream still seems a few weeks off (I've heard a few rumours about a -rc6 being planned). FC4 will get a rebase to 2.6.12, and a week or so later (if all goes well), I'll do a backport of the FC4 kernel (minus bits like Xen) to FC3, and see what breaks.
Ok. Thanks in advance.
Christoph
Dave Jones - Thu, Jun 02 2005 15:01:20 -0400:
On Thu, Jun 02, 2005 at 08:55:40PM +0200, Christoph Franke wrote:
Dave Jones - Thu, Jun 02 2005 14:25:48 -0400:
Is it repeatable ? Does it behave again if you boot with exec-shield=0 ? or exec-shield-randomize=0 ?
Yes, it is repeatable, teamspeak segfaults on every start. Ntpd segfaults on both build -30 and -31 during boot, but can be restarted afterwards. Will try a boot with exec-shield=0 tomorrow morning and pass through the results.
Ok.
Oh, staring on teamspeak I didn't instantly see the old fellow came up again with the -31 build (booted with "exec-shield=0").
Jun 3 07:20:02 sun kernel: mm/memory.c:109: bad pmd ffff81007caadde8(0000003000000a88). Jun 3 07:20:02 sun kernel: mm/memory.c:109: bad pmd ffff81007caaddf0(0000000000000003). Jun 3 07:20:02 sun kernel: mm/memory.c:109: bad pmd ffff81007caaddf8(00007fffffffff54). Jun 3 07:20:02 sun kernel: mm/memory.c:109: bad pmd ffff81007caade00(00007fffffffff55). Jun 3 07:20:02 sun kernel: mm/memory.c:109: bad pmd ffff81007caade08(00007fffffffff56). Jun 3 07:20:02 sun kernel: mm/memory.c:109: bad pmd ffff81007caade18(00007fffffffff57). Jun 3 07:20:02 sun kernel: mm/memory.c:109: bad pmd ffff81007caade20(00007fffffffff58). Jun 3 07:20:02 sun kernel: mm/memory.c:109: bad pmd ffff81007caade28(00007fffffffff59). Jun 3 07:20:02 sun kernel: mm/memory.c:109: bad pmd ffff81007caade30(00007fffffffff5a). Jun 3 07:20:02 sun kernel: mm/memory.c:109: bad pmd ffff81007caade38(00007fffffffff5b). Jun 3 07:20:02 sun kernel: mm/memory.c:109: bad pmd ffff81007caade48(0000000000000010). Jun 3 07:20:02 sun kernel: mm/memory.c:109: bad pmd ffff81007caade50(00000000078bfbff). Jun 3 07:20:02 sun kernel: mm/memory.c:109: bad pmd ffff81007caade58(0000000000000006). Jun 3 07:20:02 sun kernel: mm/memory.c:109: bad pmd ffff81007caade60(0000000000001000). Jun 3 07:20:02 sun kernel: mm/memory.c:109: bad pmd ffff81007caade68(0000000000000011). Jun 3 07:20:02 sun kernel: mm/memory.c:109: bad pmd ffff81007caade70(0000000000000064). Jun 3 07:20:02 sun kernel: mm/memory.c:109: bad pmd ffff81007caade78(0000000000000003). Jun 3 07:20:02 sun kernel: mm/memory.c:109: bad pmd ffff81007caade80(0000000000400040). Jun 3 07:20:02 sun kernel: mm/memory.c:109: bad pmd ffff81007caade88(0000000000000004). Jun 3 07:20:02 sun kernel: mm/memory.c:109: bad pmd ffff81007caade90(0000000000000038). Jun 3 07:20:02 sun kernel: mm/memory.c:109: bad pmd ffff81007caade98(0000000000000005). Jun 3 07:20:02 sun kernel: mm/memory.c:109: bad pmd ffff81007caadea0(0000000000000009). Jun 3 07:20:02 sun kernel: mm/memory.c:109: bad pmd ffff81007caadea8(0000000000000007). Jun 3 07:20:02 sun kernel: mm/memory.c:109: bad pmd ffff81007caadeb8(0000000000000008). Jun 3 07:20:02 sun kernel: mm/memory.c:109: bad pmd ffff81007caadec8(0000000000000009). Jun 3 07:20:02 sun kernel: mm/memory.c:109: bad pmd ffff81007caaded0(0000000000417b10). Jun 3 07:20:02 sun kernel: mm/memory.c:109: bad pmd ffff81007caaded8(000000000000000b). Jun 3 07:20:02 sun kernel: mm/memory.c:109: bad pmd ffff81007caadee8(000000000000000c). Jun 3 07:20:02 sun kernel: mm/memory.c:109: bad pmd ffff81007caadef8(000000000000000d). Jun 3 07:20:03 sun kernel: mm/memory.c:109: bad pmd ffff81007caadf08(000000000000000e). Jun 3 07:20:03 sun kernel: mm/memory.c:109: bad pmd ffff81007caadf18(0000000000000017). Jun 3 07:20:03 sun kernel: mm/memory.c:109: bad pmd ffff81007caadf28(000000000000000f). Jun 3 07:20:03 sun kernel: mm/memory.c:109: bad pmd ffff81007caadf30(00007fffffffff49). Jun 3 07:20:03 sun kernel: mm/memory.c:109: bad pmd ffff81007caadf48(0034365f36387800).
Along with the "memory.c bad pmd" issue I keep getting these message again:
Usage: ld.so [OPTION]... EXECUTABLE-FILE [ARGS-FOR-PROGRAM...] You have invoked `ld.so', the helper program for shared library executables. This program usually lives in the file `/lib/ld.so', and special directives in executable files using ELF shared libraries tell the system's program loader to load the helper program from this file. This helper program loads the shared libraries needed by the program executable, prepares the program to run, and runs it. You may invoke this helper program directly from the command line to load and run an ELF executable file; this is like executing that file itself, but always uses this helper program from the file you specified, instead of the helper program file specified in the executable file you run. This is mostly of use for maintainers to test new versions of this helper program; chances are you did not intend to run this program.
--list list all dependencies and how they are resolved --verify verify that given object really is a dynamically linked object we can handle --library-path PATH use given PATH instead of content of the environment variable LD_LIBRARY_PATH --inhibit-rpath LIST ignore RUNPATH and RPATH information in object names in LIST
This occures during compilation of programs as well as on a cronjob renicing some processes and is always parallel to the memory.c log entries. The older builds all showed up with this, -30 didn't but -31 does again.
Regards
Christoph
On Fri, Jun 03, 2005 at 09:52:09AM +0200, Christoph Franke wrote:
Dave Jones - Thu, Jun 02 2005 15:01:20 -0400:
On Thu, Jun 02, 2005 at 08:55:40PM +0200, Christoph Franke wrote:
Dave Jones - Thu, Jun 02 2005 14:25:48 -0400:
Is it repeatable ? Does it behave again if you boot with exec-shield=0 ? or exec-shield-randomize=0 ?
Yes, it is repeatable, teamspeak segfaults on every start. Ntpd segfaults on both build -30 and -31 during boot, but can be restarted afterwards. Will try a boot with exec-shield=0 tomorrow morning and pass through the results.
Ok.
Oh, staring on teamspeak I didn't instantly see the old fellow came up again with the -31 build (booted with "exec-shield=0").
This occures during compilation of programs as well as on a cronjob renicing some processes and is always parallel to the memory.c log entries. The older builds all showed up with this, -30 didn't but -31 does again.
I think that was just by chance than by design. This has dragged on so long, and with no resolution in sight, that I'm actually getting more and more tempted to backport the current FC4 kernel (based on 2.6.12rc5) to FC3.
plus sides - I've not seen any reports of the bad pmd bug on 2.6.12rc kernels. I'm pretty confident that this bug is dead there. - Upstream aren't going to devote much more time to tracking down 2.6.11.x bugs with .12 'due soon', so if it is still present in 12rc, it'll likely get more attention. - its actually pretty stable in my experience so far - An increased userbase for the FC4 kernel is going to shake out bugs faster.
downsides - lots of code change - the usual potential breakage of existing userspace - .12 isn't 'final' yet. - There are a few things that aren't 'quite right' in the FC4 kernel that will be shipping that I intend to fix in an update, so its by no means a perfect kernel, just trading some bugs for some different ones. (Nothing new there eh?) The first two above we'd have to deal with when .12 is final anyway)
I'll think it over some more. The actual backporting of FC4 kernel to FC3 is probably just an afternoons work.
Dave
On Fri, Jun 03, 2005 at 11:36:06AM -0400, Dave Jones wrote:
On Fri, Jun 03, 2005 at 09:52:09AM +0200, Christoph Franke wrote:
Dave Jones - Thu, Jun 02 2005 15:01:20 -0400:
On Thu, Jun 02, 2005 at 08:55:40PM +0200, Christoph Franke wrote:
Dave Jones - Thu, Jun 02 2005 14:25:48 -0400:
Is it repeatable ? Does it behave again if you boot with exec-shield=0 ? or exec-shield-randomize=0 ?
Yes, it is repeatable, teamspeak segfaults on every start. Ntpd segfaults on both build -30 and -31 during boot, but can be restarted afterwards. Will try a boot with exec-shield=0 tomorrow morning and pass through the results.
Ok.
Oh, staring on teamspeak I didn't instantly see the old fellow came up again with the -31 build (booted with "exec-shield=0").
This occures during compilation of programs as well as on a cronjob renicing some processes and is always parallel to the memory.c log entries. The older builds all showed up with this, -30 didn't but -31 does again.
I think that was just by chance than by design. This has dragged on so long, and with no resolution in sight, that I'm actually getting more and more tempted to backport the current FC4 kernel (based on 2.6.12rc5) to FC3.
.. I'll think it over some more. The actual backporting of FC4 kernel to FC3 is probably just an afternoons work.
Ok, here's something for folks to chew on over the weekend http://people.redhat.com/davej/kernels/test/ has a 2.6.12rc5 based kernel for FC3.
I've not even had chance to test-boot this one yet, so buyer-beware.. There's no guarantee I won't do another 2.6.11 update for FC3 before pushing this out as an update to updates-testing (where it'll sit for a week or two). It all depends on how this works out.
Dave
Dave Jones - Fri, Jun 03 2005 23:56:58 -0400:
Ok, here's something for folks to chew on over the weekend http://people.redhat.com/davej/kernels/test/ has a 2.6.12rc5 based kernel for FC3.
I've not even had chance to test-boot this one yet, so buyer-beware.. There's no guarantee I won't do another 2.6.11 update for FC3 before pushing this out as an update to updates-testing (where it'll sit for a week or two). It all depends on how this works out.
Thanks, Dave. I will try to get this booted tomorrow morning and see if I can reproduce the memory.c problem, ntpd segfault etc. Would be nice if at least someone could tell me if this one boots flawlessly for my machine is hosted in a remote location and if it hangs I'll have to pay 50 Euros to get a technican as a "remote hand".
Regards
Christoph
Dave Jones - Fri, Jun 03 2005 23:56:58 -0400:
I've not even had chance to test-boot this one yet, so buyer-beware.. There's no guarantee I won't do another 2.6.11 update for FC3 before pushing this out as an update to updates-testing (where it'll sit for a week or two). It all depends on how this works out.
[root@sun src]# rpm -ihv kernel-smp-2.6.11-1.1369_FC3.x86_64.rpm Fehler: Failed dependencies: selinux-policy-targeted < 1.23.16-1 conflicts with kernel-smp-2.6.11-1.1369_FC3.x86_64
Regards
Christoph
On Sat, Jun 04, 2005 at 11:40:51AM +0200, Christoph Franke wrote:
Dave Jones - Fri, Jun 03 2005 23:56:58 -0400:
I've not even had chance to test-boot this one yet, so buyer-beware.. There's no guarantee I won't do another 2.6.11 update for FC3 before pushing this out as an update to updates-testing (where it'll sit for a week or two). It all depends on how this works out.
[root@sun src]# rpm -ihv kernel-smp-2.6.11-1.1369_FC3.x86_64.rpm Fehler: Failed dependencies: selinux-policy-targeted < 1.23.16-1 conflicts with kernel-smp-2.6.11-1.1369_FC3.x86_64
Ohh, crap. I forgot about that. This'll have to wait until Monday when Dan Walsh gets back from the Red Hat Summit in New Orleans.
Dan (Or any other SELinux policy bods that may be listening), until you get a policy package update ready for FC3, is it going to bite folks if they use the FC4 packages on FC3 systems ?
Dave
Dave Jones wrote:
On Wed, Jun 01, 2005 at 11:46:06AM +0200, Christoph Franke wrote:
Great V2.0. Meanwhile we got a -31 build, but I don't find a changelog. Perhaps Dave can give us a hint.
Exec-shield update. Might fix some of the 'ntpd segfault' and similar bugs.
I can confirm that kernel-2.6.11-1.31_FC3 fixes the ntpd segfault problem on my i686 system.