Dell T7500 won’t boot on 4.19.13 or later from fedora updates for F29. 4.19.10 works. 4.19.11 and 4.19.12 from Koji boot as well. No output after Probing EDD on vga console starting with 4.19.13 through the latest today, 4.9.15.
On 1/20/19 11:00 AM, Nate Pearlstein wrote:
Dell T7500 won’t boot on 4.19.13 or later from fedora updates for F29. 4.19.10 works. 4.19.11 and 4.19.12 from Koji boot as well. No output after Probing EDD on vga console starting with 4.19.13 through the latest today, 4.9.15.
Remove "quiet" and "rhgb" from the kernel command line when you boot to see if there is more output. When you boot a working kernel is there journal output for the failed boot? Use "sudo journalctl -b-1" to check. See if the times match and if so, where does the log end?
What graphics device do you have?
On Jan 20, 2019, at 4:21 PM, Samuel Sieb samuel@sieb.net wrote:
On 1/20/19 11:00 AM, Nate Pearlstein wrote:
Dell T7500 won’t boot on 4.19.13 or later from fedora updates for F29. 4.19.10 works. 4.19.11 and 4.19.12 from Koji boot as well. No output after Probing EDD on vga console starting with 4.19.13 through the latest today, 4.9.15.
Remove "quiet" and "rhgb" from the kernel command line when you boot to see if there is more output. When you boot a working kernel is there journal output for the failed boot? Use "sudo journalctl -b-1" to check. See if the times match and if so, where does the log end?
What graphics device do you have? _______________________________________________ users mailing list -- users@lists.fedoraproject.org To unsubscribe send an email to users-leave@lists.fedoraproject.org Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org
I normally run w/o quiet and rhgb anyway. I added earlyprintk=vga and it’s clear the system panics early. I tried adding boot_delay=500 and also boot_delay=10 to try to capture the spew with my phone camera capturing at 60fps. Only leaving off boot_delay can I see the panic but the output is coming faster than 60fps.
From what I can piece together without using a serial console and capturing from another host:
kernel BUG at mm/page_alloc.c:791! Invalid opcode: 0000 [#10 SMP PTI] (not sure about this too jumbled) I can’t really see the stack trace either __free_page_ok free_all_bootmem mem_init start_kernel secondary_startup_64 [1.860030] free_one_page RIP: 0010:free_one_page [1.863221] Code: 08 0e 03 00 0f 0b 48 89 da be 0c 00 00 00 4c 89 ff e8 56 02 00 e9 9c fb ff ff 48 c7 c6 08 86 0d 92 4c 89 f7 e8 e2 0d 03 00 <0f> ob 48 c6 30 86 0d 92 48 89 df e8 d1 0d 03 00 0f 0b 31 d2 e9 [1.872806] RSP: 0000:ffffffff92203e20 EFLAGS: 00010046 . . [1.923827] Kernel panic - not syncing
On Mon, 21 Jan 2019 18:48:04 -0500 Nate Pearlstein darknater@gmail.com wrote:
I normally run w/o quiet and rhgb anyway. I added earlyprintk=vga and it’s clear the system panics early. I tried adding boot_delay=500 and also boot_delay=10 to try to capture the spew with my phone camera capturing at 60fps. Only leaving off boot_delay can I see the panic but the output is coming faster than 60fps.
From what I can piece together without using a serial console and capturing from another host:
kernel BUG at mm/page_alloc.c:791! Invalid opcode: 0000 [#10 SMP PTI] (not sure about this too jumbled) I can’t really see the stack trace either __free_page_ok free_all_bootmem mem_init start_kernel secondary_startup_64 [1.860030] free_one_page RIP: 0010:free_one_page [1.863221] Code: 08 0e 03 00 0f 0b 48 89 da be 0c 00 00 00 4c 89 ff e8 56 02 00 e9 9c fb ff ff 48 c7 c6 08 86 0d 92 4c 89 f7 e8 e2 0d 03 00 <0f> ob 48 c6 30 86 0d 92 48 89 df e8 d1 0d 03 00 0f 0b 31 d2 e9 [1.872806] RSP: 0000:ffffffff92203e20 EFLAGS: 00010046 . . [1.923827] Kernel panic - not syncing
Samuel might be able to decipher this, but I have an off the wall idea. Kernels get bigger with each release. I wonder if there is a memory problem, that the earlier kernels don't trigger, but the larger kernels do. Run a memory test?
The other thing to try is re-installing the kernel. A really long shot, but worth a try.
And maybe it is a kernel bug. The line you are referring to is VM_BUG_ON_PAGE(bad_range(zone, page), page); and it occurs when trying to deallocate a page.
static inline void __free_one_page(struct page *page, unsigned long pfn, struct zone *zone, unsigned int order, int migratetype) {
I interpret the errors as saying that the kernel is trying to deallocate a page, and the CPU receives a 0000 opcode. That would be an error. But is it coming from the kernel, or is the kernel reading a bad location?
I think it has to be something about your hardware, because if the kernel was actually having trouble deallocating pages for all boots, this would be a well known problem. Maybe you have hit a corner case. You could open a bugzilla, but it will be difficult for someone to fix this without your hardware to replicate the crash or the complete crash output.
The 4.20 kernel series is not far away from coming to stable. You could either grab one from koji, https://koji.fedoraproject.org/koji/packageinfo?packageID=8 or use an older kernel until it is released. It might fix the issue as a side effect of other changes.
On Jan 22, 2019, at 10:43, stan stanl-fedorauser@vfemail.net wrote:
On Mon, 21 Jan 2019 18:48:04 -0500 Nate Pearlstein darknater@gmail.com wrote:
I normally run w/o quiet and rhgb anyway. I added earlyprintk=vga and it’s clear the system panics early. I tried adding boot_delay=500 and also boot_delay=10 to try to capture the spew with my phone camera capturing at 60fps. Only leaving off boot_delay can I see the panic but the output is coming faster than 60fps.
From what I can piece together without using a serial console and capturing from another host:
kernel BUG at mm/page_alloc.c:791! Invalid opcode: 0000 [#10 SMP PTI] (not sure about this too jumbled) I can’t really see the stack trace either __free_page_ok free_all_bootmem mem_init start_kernel secondary_startup_64 [1.860030] free_one_page RIP: 0010:free_one_page [1.863221] Code: 08 0e 03 00 0f 0b 48 89 da be 0c 00 00 00 4c 89 ff e8 56 02 00 e9 9c fb ff ff 48 c7 c6 08 86 0d 92 4c 89 f7 e8 e2 0d 03 00 <0f> ob 48 c6 30 86 0d 92 48 89 df e8 d1 0d 03 00 0f 0b 31 d2 e9 [1.872806] RSP: 0000:ffffffff92203e20 EFLAGS: 00010046 . . [1.923827] Kernel panic - not syncing
Samuel might be able to decipher this, but I have an off the wall idea. Kernels get bigger with each release. I wonder if there is a memory problem, that the earlier kernels don't trigger, but the larger kernels do. Run a memory test?
The other thing to try is re-installing the kernel. A really long shot, but worth a try.
And maybe it is a kernel bug. The line you are referring to is VM_BUG_ON_PAGE(bad_range(zone, page), page); and it occurs when trying to deallocate a page.
static inline void __free_one_page(struct page *page, unsigned long pfn, struct zone *zone, unsigned int order, int migratetype) {
I interpret the errors as saying that the kernel is trying to deallocate a page, and the CPU receives a 0000 opcode. That would be an error. But is it coming from the kernel, or is the kernel reading a bad location?
I think it has to be something about your hardware, because if the kernel was actually having trouble deallocating pages for all boots, this would be a well known problem. Maybe you have hit a corner case. You could open a bugzilla, but it will be difficult for someone to fix this without your hardware to replicate the crash or the complete crash output.
The 4.20 kernel series is not far away from coming to stable. You could either grab one from koji, https://koji.fedoraproject.org/koji/packageinfo?packageID=8 or use an older kernel until it is released. It might fix the issue as a side effect of other changes. _______________________________________________
List ate my reply, it was too long included entire console output
Ok, broke out the old Keyspan:
This is on 4.20.3-200.fc29.x86_64
Probing EDD (edd=off to disable)... ok [ 0.000000] microcode: microcode updated early to revision 0x1f, date = 2018-05-08 [ 0.000000] Linux version 4.20.3-200.fc29.x86_64 (mockbuild@bkernel04.phx2.fedoraproject.org) (gcc version 8.2.1 20181215 (Red Hat 8.2.1-6) (GCC)) #1 SMP Thu Jan 17 15:19:35 UTC 2019 [ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-4.20.3-200.fc29.x86_64 root=UUID=f16fcae3-fe27-4314-afa3-42deec5f378c ro rootflags=subvol=btrfsroot1 rd.driver.blacklist=nouveau rd.lvm=0 rd.dm=0 rd.md.uuid=bfe3028c:482de62e:e81670f7:c1d008bf rd.luks.uuid=luks-c680a803-db0f-423b-8fb4-5ac67c7e141b rd.luks.allow-discards=luks-c680a803-db0f-423b-8fb4-5ac67c7e141b rd.md.uuid=bdc6f872:46939f0e:7e802862:035eb1f1 rd.luks.uuid=luks-ab55575a-5048-445b-830a-3cdcb78222b6 rd.luks.allow-discards=luks-ab55575a-5048-445b-830a-3cdcb78222b6 rd.luks.uuid=luks-2c2515c2-e6a3-438c-9514-0aa9ddd2a1b3 rd.luks.allow-discards=luks-2c2515c2-e6a3-438c-9514-0aa9ddd2a1b3 vconsole.keymap=us crashkernel=128M usbcore.autosuspend=-1 console=tty0 console=ttyS0,115200 ... [ 2.073219] page 0xc24000 outside node 1 zone Normal [ 0x100000 - 0xc24000 ] [ 2.080074] page:fffff8ebf0900000 count:0 mapcount:0 mapping:0000000000000000 index:0x0 [ 2.088036] flags: 0x57fffe00000000() [ 2.091671] raw: 0057fffe00000000 fffff8ebf0900008 fffff8ebf0900008 0000000000000000 [ 2.099373] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 [ 2.107075] page dumped because: VM_BUG_ON_PAGE(bad_range(zone, page)) [ 2.113573] ------------[ cut here ]------------ [ 2.118154] kernel BUG at mm/page_alloc.c:798! [ 2.122570] invalid opcode: 0000 [#1] SMP PTI [ 2.126895] CPU: 0 PID: 0 Comm: swapper Not tainted 4.20.3-200.fc29.x86_64 #1 [ 2.133991] Hardware name: Dell Inc. Precision WorkStation T7500 /06FW8P, BIOS A17 03/11/2018 [ 2.142563] RIP: 0010:free_one_page+0x50e/0x540 [ 2.147060] Code: 08 16 03 00 0f 0b 48 89 da be 0c 00 00 00 4c 89 ff e8 56 07 02 00 e9 9c fb ff ff 48 c7 c6 18 f8 0d 89 4c 89 f7 e8 e2 15 03 00 <0f> 0b 48 c7 c6 40 f8 0d 89 48 89 df e8 d1 15 03 00 0f 0b 31 d2 e9 [ 2.165754] RSP: 0000:ffffffff89203e20 EFLAGS: 00010046 [ 2.170946] RAX: 000000000000003a RBX: 0000000000000400 RCX: ffffffff89254668 [ 2.178043] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000046 [ 2.185140] RBP: 0000000000c24000 R08: 6d75642065676170 R09: 6163656220646570 [ 2.192236] R10: 7375616365622064 R11: 55425f4d56203a65 R12: 000000000000000a [ 2.199334] R13: 00000000000003ff R14: fffff8ebf0900000 R15: ffffa04423fd5d00 [ 2.206430] FS: 0000000000000000(0000) GS:ffffa04ff3400000(0000) knlGS:0000000000000000 [ 2.214480] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2.220191] CR2: ffffa04e49e01000 CR3: 000000164920a001 CR4: 00000000000206b0 [ 2.227288] Call Trace: [ 2.229715] __free_pages_ok+0x15c/0x440 [ 2.233610] memblock_free_all+0x127/0x192 [ 2.237676] mem_init+0x1b/0xb9 [ 2.240792] start_kernel+0x293/0x528 [ 2.244427] secondary_startup_64+0xa4/0xb0 [ 2.248579] Modules linked in: [ 2.251625] ---[ end trace ffc919177d0487be ]--- [ 2.256195] RIP: 0010:free_one_page+0x50e/0x540 [ 2.260695] Code: 08 16 03 00 0f 0b 48 89 da be 0c 00 00 00 4c 89 ff e8 56 07 02 00 e9 9c fb ff ff 48 c7 c6 18 f8 0d 89 4c 89 f7 e8 e2 15 03 00 <0f> 0b 48 c7 c6 40 f8 0d 89 48 89 df e8 d1 15 03 00 0f 0b 31 d2 e9 [ 2.279388] RSP: 0000:ffffffff89203e20 EFLAGS: 00010046 [ 2.284581] RAX: 000000000000003a RBX: 0000000000000400 RCX: ffffffff89254668 [ 2.291678] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000046 [ 2.298775] RBP: 0000000000c24000 R08: 6d75642065676170 R09: 6163656220646570 [ 2.305872] R10: 7375616365622064 R11: 55425f4d56203a65 R12: 000000000000000a [ 2.312968] R13: 00000000000003ff R14: fffff8ebf0900000 R15: ffffa04423fd5d00 [ 2.320066] FS: 0000000000000000(0000) GS:ffffa04ff3400000(0000) knlGS:0000000000000000 [ 2.328114] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2.333826] CR2: ffffa04e49e01000 CR3: 000000164920a001 CR4: 00000000000206b0 [ 2.340924] Kernel panic - not syncing: Attempted to kill the idle task! [ 2.347618] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]
On Thu, 24 Jan 2019 23:33:29 -0500 Nate Pearlstein darknater@gmail.com wrote:
List ate my reply, it was too long included entire console output
Ok, broke out the old Keyspan:
This is on 4.20.3-200.fc29.x86_64
So, still failing with the new kernel, though the line number in the error has changed, indicating some change in the source file.
Cut and pasted so it would be unwrapped, thus no quoting
Probing EDD (edd=off to disable)... ok [ 0.000000] microcode: microcode updated early to revision 0x1f, date = 2018-05-08 [ 0.000000] Linux version 4.20.3-200.fc29.x86_64 (mockbuild@bkernel04.phx2.fedoraproject.org) (gcc version 8.2.1 20181215 (Red Hat 8.2.1-6) (GCC)) #1 SMP Thu Jan 17 15:19:35 UTC 2019 [ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-4.20.3-200.fc29.x86_64 root=UUID=f16fcae3-fe27-4314-afa3-42deec5f378c ro rootflags=subvol=btrfsroot1 rd.driver.blacklist=nouveau rd.lvm=0 rd.dm=0 rd.md.uuid=bfe3028c:482de62e:e81670f7:c1d008bf rd.luks.uuid=luks-c680a803-db0f-423b-8fb4-5ac67c7e141b rd.luks.allow-discards=luks-c680a803-db0f-423b-8fb4-5ac67c7e141b rd.md.uuid=bdc6f872:46939f0e:7e802862:035eb1f1 rd.luks.uuid=luks-ab55575a-5048-445b-830a-3cdcb78222b6 rd.luks.allow-discards=luks-ab55575a-5048-445b-830a-3cdcb78222b6 rd.luks.uuid=luks-2c2515c2-e6a3-438c-9514-0aa9ddd2a1b3 rd.luks.allow-discards=luks-2c2515c2-e6a3-438c-9514-0aa9ddd2a1b3 vconsole.keymap=us crashkernel=128M usbcore.autosuspend=-1 console=tty0 console=ttyS0,115200 ... [ 2.073219] page 0xc24000 outside node 1 zone Normal [ 0x100000 - 0xc24000 ] [ 2.080074] page:fffff8ebf0900000 count:0 mapcount:0 mapping:0000000000000000 index:0x0 [ 2.088036] flags: 0x57fffe00000000() [ 2.091671] raw: 0057fffe00000000 fffff8ebf0900008 fffff8ebf0900008 0000000000000000 [ 2.099373] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 [ 2.107075] page dumped because: VM_BUG_ON_PAGE(bad_range(zone, page)) [ 2.113573] ------------[ cut here ]------------ [ 2.118154] kernel BUG at mm/page_alloc.c:798! [ 2.122570] invalid opcode: 0000 [#1] SMP PTI [ 2.126895] CPU: 0 PID: 0 Comm: swapper Not tainted 4.20.3-200.fc29.x86_64 #1 [ 2.133991] Hardware name: Dell Inc. Precision WorkStation T7500 /06FW8P, BIOS A17 03/11/2018 [ 2.142563] RIP: 0010:free_one_page+0x50e/0x540 [ 2.147060] Code: 08 16 03 00 0f 0b 48 89 da be 0c 00 00 00 4c 89 ff e8 56 07 02 00 e9 9c fb ff ff 48 c7 c6 18 f8 0d 89 4c 89 f7 e8 e2 15 03 00 <0f> 0b 48 c7 c6 40 f8 0d 89 48 89 df e8 d1 15 03 00 0f 0b 31 d2 e9 [ 2.165754] RSP: 0000:ffffffff89203e20 EFLAGS: 00010046 [ 2.170946] RAX: 000000000000003a RBX: 0000000000000400 RCX: ffffffff89254668 [ 2.178043] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000046 [ 2.185140] RBP: 0000000000c24000 R08: 6d75642065676170 R09: 6163656220646570 [ 2.192236] R10: 7375616365622064 R11: 55425f4d56203a65 R12: 000000000000000a [ 2.199334] R13: 00000000000003ff R14: fffff8ebf0900000 R15: ffffa04423fd5d00 [ 2.206430] FS: 0000000000000000(0000) GS:ffffa04ff3400000(0000) knlGS:0000000000000000 [ 2.214480] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2.220191] CR2: ffffa04e49e01000 CR3: 000000164920a001 CR4: 00000000000206b0 [ 2.227288] Call Trace: [ 2.229715] __free_pages_ok+0x15c/0x440 [ 2.233610] memblock_free_all+0x127/0x192 [ 2.237676] mem_init+0x1b/0xb9 [ 2.240792] start_kernel+0x293/0x528 [ 2.244427] secondary_startup_64+0xa4/0xb0 [ 2.248579] Modules linked in: [ 2.251625] ---[ end trace ffc919177d0487be ]--- [ 2.256195] RIP: 0010:free_one_page+0x50e/0x540 [ 2.260695] Code: 08 16 03 00 0f 0b 48 89 da be 0c 00 00 00 4c 89 ff e8 56 07 02 00 e9 9c fb ff ff 48 c7 c6 18 f8 0d 89 4c 89 f7 e8 e2 15 03 00 <0f> 0b 48 c7 c6 40 f8 0d 89 48 89 df e8 d1 15 03 00 0f 0b 31 d2 e9 [ 2.279388] RSP: 0000:ffffffff89203e20 EFLAGS: 00010046 [ 2.284581] RAX: 000000000000003a RBX: 0000000000000400 RCX: ffffffff89254668 [ 2.291678] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000046 [ 2.298775] RBP: 0000000000c24000 R08: 6d75642065676170 R09: 6163656220646570 [ 2.305872] R10: 7375616365622064 R11: 55425f4d56203a65 R12: 000000000000000a [ 2.312968] R13: 00000000000003ff R14: fffff8ebf0900000 R15: ffffa04423fd5d00 [ 2.320066] FS: 0000000000000000(0000) GS:ffffa04ff3400000(0000) knlGS:0000000000000000 [ 2.328114] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2.333826] CR2: ffffa04e49e01000 CR3: 000000164920a001 CR4: 00000000000206b0 [ 2.340924] Kernel panic - not syncing: Attempted to kill the idle task! [ 2.347618] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]
I can't see any obvious issue. I suggest you go to https://bugzilla.redhat.com/ and enter a bug against the kernel, describing the problem you are seeing, and putting the above output either as additional information or as an attachment. https://fedoraproject.org/wiki/How_to_file_a_bug_report
I looked for the error on the web and in fedora bugzilla, and there were no reports of this error. It must be something specific to your system.
In the meantime you should probably use an older kernel, and lock the kernel version.
If you are up to it, you could try building a custom kernel to see if that would fix the issue. https://fedoraproject.org/wiki/Building_a_custom_kernel https://fedoraproject.org/wiki/Building_a_custom_kernel/Source_RPM
On Sat, 26 Jan 2019 12:42:31 -0700 stan stanl-fedorauser@vfemail.net wrote:
If you are up to it, you could try building a custom kernel to see if that would fix the issue. https://fedoraproject.org/wiki/Building_a_custom_kernel https://fedoraproject.org/wiki/Building_a_custom_kernel/Source_RPM
A further thought. Before you build a kernel, try rebuilding the initramfs using dracut.
/usr/bin/dracut -f -H [numerical part of the vmlinuz file in boot]
e.g. /usr/bin/dracut -f -H 4.20.3-200.fc29.x86_64
This has to be done as root while in the /boot directory. It might pick up something that the kernel needs that is missing from the current initramfs.
If you end up building a custom kernel, be sure to use make localmodconfig from a running system (an older kernel that boots) so that the configuration options are selected to be customized for your system based on modules installed. The Fedora kernel is a general kernel, with options that should support the broadest set of kernels, but it might be there is an option that you need that isn't in the stock kernel. This rebuilds the initramfs during install automatically.