On Mon, May 23, 2005 at 04:48:29PM -0700, Peter J. Stieber wrote:
PJS = Peter J. Stieber PJS>> I was under the impression the test kernel had PJS>> some type of debug messages in it someone PJS>> would be interested in. Was I wrong about that?
DJ = Dave Jones DJ> No, you are correct. But nothing triggered with the latest builds.
PJS>> I guess your saying I should go a head and update PJS>> and see what happens?
DJ> Theres a number of other fixes in there which may have DJ> caused the problem to go into hiding.. I can't reproduce DJ> it at all any more, and some others who were seeing it DJ> haven't seen it recently either.
This morning I ran Memtest-86 v3.2 on the machine in question. I let it run for a little over 6 hours. It made 7 passes of the memory tests and had no errors.
Next I updated the kernel to 2.6.11-1.27_FC3smp. The problem happened pretty quickly for me:
May 23 16:34:36 maggie kernel: collect2:9582: mm/memory.c:98: bad pmd ffff8100190fe000(0000000000401b80). May 23 16:34:36 maggie kernel: collect2:9582: mm/memory.c:98: bad pmd ffff8100190fe008(000000000000000b). May 23 16:34:36 maggie kernel: collect2:9582: mm/memory.c:98: bad pmd ffff8100190fe010(0000000000000220). May 23 16:34:36 maggie kernel: collect2:9582: mm/memory.c:98: bad pmd ffff8100190fe018(000000000000000c). May 23 16:34:36 maggie kernel: collect2:9582: mm/memory.c:98: bad pmd ffff8100190fe020(0000000000000220). May 23 16:34:36 maggie kernel: collect2:9582: mm/memory.c:98: bad pmd ffff8100190fe028(000000000000000d). May 23 16:34:36 maggie kernel: collect2:9582: mm/memory.c:98: bad pmd ffff8100190fe030(00000000000001f7). May 23 16:34:36 maggie kernel: collect2:9582: mm/memory.c:98: bad pmd ffff8100190fe038(000000000000000e). May 23 16:34:36 maggie kernel: collect2:9582: mm/memory.c:98: bad pmd ffff8100190fe040(00000000000001f7). May 23 16:34:36 maggie kernel: collect2:9582: mm/memory.c:98: bad pmd ffff8100190fe048(0000000000000017). May 23 16:34:36 maggie kernel: collect2:9582: mm/memory.c:98: bad pmd ffff8100190fe058(000000000000000f). May 23 16:34:36 maggie kernel: collect2:9582: mm/memory.c:98: bad pmd ffff8100190fe060(00007ffffffff081). May 23 16:34:36 maggie kernel: collect2:9582: mm/memory.c:98: bad pmd ffff8100190fe080(0034365f36387800).
The collect2 command is coming from the build sequence I described in earlier emails. If I try it a second time ir runs successfully. I've seen sh cause it too.
Note that 34365f363878 in the last line = 46_68x in ASCII, which is x86_64 in reverse.
Is there anything else I should be looking for?
I'd be willing to try any debug kernel to help find the problem.
Give the test kernel at http://people.redhat.com/davej/kernels/Fedora/ a shot (-28_FC3). That should be slightly different output.
Dave