On Jan 22, 2019, at 10:43, stan stanl-fedorauser@vfemail.net wrote:
On Mon, 21 Jan 2019 18:48:04 -0500 Nate Pearlstein darknater@gmail.com wrote:
I normally run w/o quiet and rhgb anyway. I added earlyprintk=vga and it’s clear the system panics early. I tried adding boot_delay=500 and also boot_delay=10 to try to capture the spew with my phone camera capturing at 60fps. Only leaving off boot_delay can I see the panic but the output is coming faster than 60fps.
From what I can piece together without using a serial console and capturing from another host:
kernel BUG at mm/page_alloc.c:791! Invalid opcode: 0000 [#10 SMP PTI] (not sure about this too jumbled) I can’t really see the stack trace either __free_page_ok free_all_bootmem mem_init start_kernel secondary_startup_64 [1.860030] free_one_page RIP: 0010:free_one_page [1.863221] Code: 08 0e 03 00 0f 0b 48 89 da be 0c 00 00 00 4c 89 ff e8 56 02 00 e9 9c fb ff ff 48 c7 c6 08 86 0d 92 4c 89 f7 e8 e2 0d 03 00 <0f> ob 48 c6 30 86 0d 92 48 89 df e8 d1 0d 03 00 0f 0b 31 d2 e9 [1.872806] RSP: 0000:ffffffff92203e20 EFLAGS: 00010046 . . [1.923827] Kernel panic - not syncing
Samuel might be able to decipher this, but I have an off the wall idea. Kernels get bigger with each release. I wonder if there is a memory problem, that the earlier kernels don't trigger, but the larger kernels do. Run a memory test?
The other thing to try is re-installing the kernel. A really long shot, but worth a try.
And maybe it is a kernel bug. The line you are referring to is VM_BUG_ON_PAGE(bad_range(zone, page), page); and it occurs when trying to deallocate a page.
static inline void __free_one_page(struct page *page, unsigned long pfn, struct zone *zone, unsigned int order, int migratetype) {
I interpret the errors as saying that the kernel is trying to deallocate a page, and the CPU receives a 0000 opcode. That would be an error. But is it coming from the kernel, or is the kernel reading a bad location?
I think it has to be something about your hardware, because if the kernel was actually having trouble deallocating pages for all boots, this would be a well known problem. Maybe you have hit a corner case. You could open a bugzilla, but it will be difficult for someone to fix this without your hardware to replicate the crash or the complete crash output.
The 4.20 kernel series is not far away from coming to stable. You could either grab one from koji, https://koji.fedoraproject.org/koji/packageinfo?packageID=8 or use an older kernel until it is released. It might fix the issue as a side effect of other changes. _______________________________________________
List ate my reply, it was too long included entire console output
Ok, broke out the old Keyspan:
This is on 4.20.3-200.fc29.x86_64
Probing EDD (edd=off to disable)... ok [ 0.000000] microcode: microcode updated early to revision 0x1f, date = 2018-05-08 [ 0.000000] Linux version 4.20.3-200.fc29.x86_64 (mockbuild@bkernel04.phx2.fedoraproject.org) (gcc version 8.2.1 20181215 (Red Hat 8.2.1-6) (GCC)) #1 SMP Thu Jan 17 15:19:35 UTC 2019 [ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-4.20.3-200.fc29.x86_64 root=UUID=f16fcae3-fe27-4314-afa3-42deec5f378c ro rootflags=subvol=btrfsroot1 rd.driver.blacklist=nouveau rd.lvm=0 rd.dm=0 rd.md.uuid=bfe3028c:482de62e:e81670f7:c1d008bf rd.luks.uuid=luks-c680a803-db0f-423b-8fb4-5ac67c7e141b rd.luks.allow-discards=luks-c680a803-db0f-423b-8fb4-5ac67c7e141b rd.md.uuid=bdc6f872:46939f0e:7e802862:035eb1f1 rd.luks.uuid=luks-ab55575a-5048-445b-830a-3cdcb78222b6 rd.luks.allow-discards=luks-ab55575a-5048-445b-830a-3cdcb78222b6 rd.luks.uuid=luks-2c2515c2-e6a3-438c-9514-0aa9ddd2a1b3 rd.luks.allow-discards=luks-2c2515c2-e6a3-438c-9514-0aa9ddd2a1b3 vconsole.keymap=us crashkernel=128M usbcore.autosuspend=-1 console=tty0 console=ttyS0,115200 ... [ 2.073219] page 0xc24000 outside node 1 zone Normal [ 0x100000 - 0xc24000 ] [ 2.080074] page:fffff8ebf0900000 count:0 mapcount:0 mapping:0000000000000000 index:0x0 [ 2.088036] flags: 0x57fffe00000000() [ 2.091671] raw: 0057fffe00000000 fffff8ebf0900008 fffff8ebf0900008 0000000000000000 [ 2.099373] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 [ 2.107075] page dumped because: VM_BUG_ON_PAGE(bad_range(zone, page)) [ 2.113573] ------------[ cut here ]------------ [ 2.118154] kernel BUG at mm/page_alloc.c:798! [ 2.122570] invalid opcode: 0000 [#1] SMP PTI [ 2.126895] CPU: 0 PID: 0 Comm: swapper Not tainted 4.20.3-200.fc29.x86_64 #1 [ 2.133991] Hardware name: Dell Inc. Precision WorkStation T7500 /06FW8P, BIOS A17 03/11/2018 [ 2.142563] RIP: 0010:free_one_page+0x50e/0x540 [ 2.147060] Code: 08 16 03 00 0f 0b 48 89 da be 0c 00 00 00 4c 89 ff e8 56 07 02 00 e9 9c fb ff ff 48 c7 c6 18 f8 0d 89 4c 89 f7 e8 e2 15 03 00 <0f> 0b 48 c7 c6 40 f8 0d 89 48 89 df e8 d1 15 03 00 0f 0b 31 d2 e9 [ 2.165754] RSP: 0000:ffffffff89203e20 EFLAGS: 00010046 [ 2.170946] RAX: 000000000000003a RBX: 0000000000000400 RCX: ffffffff89254668 [ 2.178043] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000046 [ 2.185140] RBP: 0000000000c24000 R08: 6d75642065676170 R09: 6163656220646570 [ 2.192236] R10: 7375616365622064 R11: 55425f4d56203a65 R12: 000000000000000a [ 2.199334] R13: 00000000000003ff R14: fffff8ebf0900000 R15: ffffa04423fd5d00 [ 2.206430] FS: 0000000000000000(0000) GS:ffffa04ff3400000(0000) knlGS:0000000000000000 [ 2.214480] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2.220191] CR2: ffffa04e49e01000 CR3: 000000164920a001 CR4: 00000000000206b0 [ 2.227288] Call Trace: [ 2.229715] __free_pages_ok+0x15c/0x440 [ 2.233610] memblock_free_all+0x127/0x192 [ 2.237676] mem_init+0x1b/0xb9 [ 2.240792] start_kernel+0x293/0x528 [ 2.244427] secondary_startup_64+0xa4/0xb0 [ 2.248579] Modules linked in: [ 2.251625] ---[ end trace ffc919177d0487be ]--- [ 2.256195] RIP: 0010:free_one_page+0x50e/0x540 [ 2.260695] Code: 08 16 03 00 0f 0b 48 89 da be 0c 00 00 00 4c 89 ff e8 56 07 02 00 e9 9c fb ff ff 48 c7 c6 18 f8 0d 89 4c 89 f7 e8 e2 15 03 00 <0f> 0b 48 c7 c6 40 f8 0d 89 48 89 df e8 d1 15 03 00 0f 0b 31 d2 e9 [ 2.279388] RSP: 0000:ffffffff89203e20 EFLAGS: 00010046 [ 2.284581] RAX: 000000000000003a RBX: 0000000000000400 RCX: ffffffff89254668 [ 2.291678] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000046 [ 2.298775] RBP: 0000000000c24000 R08: 6d75642065676170 R09: 6163656220646570 [ 2.305872] R10: 7375616365622064 R11: 55425f4d56203a65 R12: 000000000000000a [ 2.312968] R13: 00000000000003ff R14: fffff8ebf0900000 R15: ffffa04423fd5d00 [ 2.320066] FS: 0000000000000000(0000) GS:ffffa04ff3400000(0000) knlGS:0000000000000000 [ 2.328114] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2.333826] CR2: ffffa04e49e01000 CR3: 000000164920a001 CR4: 00000000000206b0 [ 2.340924] Kernel panic - not syncing: Attempted to kill the idle task! [ 2.347618] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]