memory.c - bad pmd - x86_64

Peter J. Stieber developer at toyon.com
Sun May 15 16:50:53 UTC 2005


> PJS = Peter J. Stieber
> PJS>> I have a system with a Tyan 2885 motherboard
> PJS>> (S2885-ANRF) that uses dual Opteron 244 processors.
> PJS>> Each processor has 1 GB of memory for a total of
> PJS>> 2 GB. I am using a SATA HD. I am running the latest
> PJS>> stock release of the SMP version of the FC3 kernel
> PJS>> for x86_64. uname -a output follows:
> PJS>>
> PJS>> Linux maggie 2.6.11-1.14_FC3smp #1 SMP
> PJS>> Thu Apr 7 19:36:23 EDT 2005
> PJS>> x86_64 x86_64 x86_64 GNU/Linux
> PJS>>
> PJS>> The computer is worldly node for a small cluster of
> PJS>> computers. It is resposible for building a code that
> PJS>> is run on the cluster. A shell script is used to
> PJS>> start the build process. Occasionally when the script
> PJS>> is started it crashes and the following messages are
> PJS>> place in /var/log/messages (sorry for the ugly line
> PJS>> wrap):
> PJS>>
> PJS>> May 11 16:26:56 maggie kernel: mm/memory.c:97:
> PJS>> bad pmd ffff81002f6a4000(0000000000000008).
>
> DJ = Dave Jones
> DJ> Please grab the latest test kernel from
> DJ> http://people.redhat.com/davej/kernels/Fedora/FC3
> DJ> and try to reproduce this. It contains debugging code
> DJ> that hopefully will help nail this.
>
> Thanks Dave.
> I loaded the kernel:
>
> Linux maggie 2.6.11-1.24_FC3smp #1
> SMP Tue May 10 19:12:22 EDT 2005
> x86_64 x86_64 x86_64 GNU/Linux
>
> I'm trying to force the problem to occur, but as was reported on the 
> linux-kenel list, it isn't obvious how to make the problem rear its 
> ugly head.
>
> Are you looking for /var/log/messages output when it happens?
>
> Thanks again for the help. I'm very willing to serve as a debug test 
> bed as my worldly node is a Tyan S2885 Thunder K8W motherboard running 
> the SMP version of x86_64 FC3 and my compute nodes are Tyan S2850 
> Tomcat K8S motherboards running the non-SMP version of x86_64 FC3.
>
> Will reply to this thread when the problem pops up,

The problem is occuring again with Dave's test kernel.

May 14 10:00:18 maggie kernel: collect2:14167: mm/memory.c:98: bad pmd 
ffff81005856d008(0000000000000008).
May 14 10:00:18 maggie kernel: collect2:14167: mm/memory.c:98: bad pmd 
ffff81005856d018(0000000000000009).
May 14 10:00:18 maggie kernel: collect2:14167: mm/memory.c:98: bad pmd 
ffff81005856d020(0000000000401b80).
May 14 10:00:18 maggie kernel: collect2:14167: mm/memory.c:98: bad pmd 
ffff81005856d028(000000000000000b).
May 14 10:00:18 maggie kernel: collect2:14167: mm/memory.c:98: bad pmd 
ffff81005856d030(00000000000001f4).
May 14 10:00:18 maggie kernel: collect2:14167: mm/memory.c:98: bad pmd 
ffff81005856d038(000000000000000c).
May 14 10:00:18 maggie kernel: collect2:14167: mm/memory.c:98: bad pmd 
ffff81005856d040(00000000000001f4).
May 14 10:00:18 maggie kernel: collect2:14167: mm/memory.c:98: bad pmd 
ffff81005856d048(000000000000000d).
May 14 10:00:18 maggie kernel: collect2:14167: mm/memory.c:98: bad pmd 
ffff81005856d050(00000000000001f7).
May 14 10:00:18 maggie kernel: collect2:14167: mm/memory.c:98: bad pmd 
ffff81005856d058(000000000000000e).
May 14 10:00:18 maggie kernel: collect2:14167: mm/memory.c:98: bad pmd 
ffff81005856d060(00000000000001f7).
May 14 10:00:18 maggie kernel: collect2:14167: mm/memory.c:98: bad pmd 
ffff81005856d068(0000000000000017).
May 14 10:00:18 maggie kernel: collect2:14167: mm/memory.c:98: bad pmd 
ffff81005856d078(000000000000000f).
May 14 10:00:18 maggie kernel: collect2:14167: mm/memory.c:98: bad pmd 
ffff81005856d080(00007ffffffff0a4).
May 14 10:00:18 maggie kernel: collect2:14167: mm/memory.c:98: bad pmd 
ffff81005856d0a0(5f36387800000000).
May 14 10:00:18 maggie kernel: collect2:14167: mm/memory.c:98: bad pmd 
ffff81005856d0a8(0000000000003436).

and from today's logs:

May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd 
ffff810062898d38(00000037e5100a88).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd 
ffff810062898d40(0000000000000003).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd 
ffff810062898d48(00007ffffffffee9).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd 
ffff810062898d50(00007ffffffffeea).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd 
ffff810062898d58(00007ffffffffeeb).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd 
ffff810062898d68(00007ffffffffeec).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd 
ffff810062898d70(00007ffffffffeed).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd 
ffff810062898d78(00007ffffffffeee).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd 
ffff810062898d80(00007ffffffffeef).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd 
ffff810062898d88(00007ffffffffef0).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd 
ffff810062898d90(00007ffffffffef1).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd 
ffff810062898d98(00007ffffffffef2).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd 
ffff810062898da0(00007ffffffffef3).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd 
ffff810062898da8(00007ffffffffef4).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd 
ffff810062898db0(00007ffffffffef5).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd 
ffff810062898db8(00007ffffffffef6).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd 
ffff810062898dc0(00007ffffffffef7).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd 
ffff810062898dc8(00007ffffffffef8).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd 
ffff810062898dd8(0000000000000010).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd 
ffff810062898de0(00000000078bfbff).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd 
ffff810062898de8(0000000000000006).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd 
ffff810062898df0(0000000000001000).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd 
ffff810062898df8(0000000000000011).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd 
ffff810062898e00(0000000000000064).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd 
ffff810062898e08(0000000000000003).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd 
ffff810062898e10(0000000000400040).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd 
ffff810062898e18(0000000000000004).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd 
ffff810062898e20(0000000000000038).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd 
ffff810062898e28(0000000000000005).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd 
ffff810062898e30(0000000000000009).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd 
ffff810062898e38(0000000000000007).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd 
ffff810062898e48(0000000000000008).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd 
ffff810062898e58(0000000000000009).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd 
ffff810062898e60(0000000000417b10).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd 
ffff810062898e68(000000000000000b).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd 
ffff810062898e78(000000000000000c).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd 
ffff810062898e88(000000000000000d).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd 
ffff810062898e98(000000000000000e).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd 
ffff810062898ea8(0000000000000017).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd 
ffff810062898eb8(000000000000000f).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd 
ffff810062898ec0(00007ffffffffee2).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd 
ffff810062898ee0(34365f3638780000).

Dave,

I'm willing to provide what you need to debug, or try other test 
kernels.
I also posted to the linux-kernel list.
Pete 





More information about the users mailing list