Hi all:
I've got an odd problem that I was hoping for some help on.
The laptop was previously working fine, I am running F30, updated regularly.
I've got an Acer laptop that has been shutting down suddenly. (I suspect a bad battery, working on that.)
After a sudden shutdown last night, I now get a kernel panic on boot right after "Starting Switch Root..."
I was able to boot from a USB stick, and was able to read the journal, but I didn't see anything obvious to help.
I was able to get all of my data off of the (encrypted) disks, so that's not a problem, but I don't want to just give up, and wash and reload too quickly.
Any suggestions?
Thanks, --murph
On Tue, 3 Sep 2019 22:31:52 -0400 murph nj murphnj+fedora@gmail.com wrote:
I've got an Acer laptop that has been shutting down suddenly. (I suspect a bad battery, working on that.)
After a sudden shutdown last night, I now get a kernel panic on boot right after "Starting Switch Root..."
The switch root is changing from the initramfs to the installed OS. If there is a power problem, it is possible that memory is not working properly, and disk reads are erronious. Fix the power supply before you go any further. Any testing will yield questionable results until that is resolved.
On Wed, Sep 4, 2019 at 11:45 AM stan via users users@lists.fedoraproject.org wrote:
The switch root is changing from the initramfs to the installed OS. If there is a power problem, it is possible that memory is not working properly, and disk reads are erronious. Fix the power supply before you go any further. Any testing will yield questionable results until that is resolved.
Thanks for the response.
The power supply was not the problem, it was the battery. All subsequent testing has been done plugged into power. It also boots just fine from a USB stick with either Mint or Fedora.
There are three kernels available from the boot menu, along with rescue mode. All of them kernel panic.
--murph
On Wed, 4 Sep 2019 12:33:59 -0400 murph nj murphnj+fedora@gmail.com wrote:
The power supply was not the problem, it was the battery. All subsequent testing has been done plugged into power. It also boots just fine from a USB stick with either Mint or Fedora.
There are three kernels available from the boot menu, along with rescue mode. All of them kernel panic.
Does it get far enough along to put messages in the journal? You could run a boot that fails, then boot from the USB, and mnt the installed filesystem (usually under /mnt/sysimage), chroot to /mnt/sysimage, and run journalctl -b to look at boot messages.
Before you do that, boot from the USB, and then run a file system check on the drives in the system while they are not mounted. For ext4, that would be e2fsck -n -v /dev/sd??, answer no to any queries (make no changes), and verbose messages. If it wants to fix things, and the changes look OK, run with -p instead of -n. See man e2fsck; this is your filesystem, you want to know what you are doing; a mistake can be fatal.
I was able to mount the drives, and check them, no problems. I was also able to move all of my important data off of the drives to an external, so I'm not worried about any loss to the data on the drives. No worries of fatal mistakes anymore.
I also took a look at the journals that were left, there was nothing that seemed terribly indicative of the problem. It was all from the last time that it was running normally, nothing was getting written on the failed boot attempts, unfortunately.
On Wed, Sep 4, 2019 at 2:56 PM stan via users users@lists.fedoraproject.org wrote:
On Wed, 4 Sep 2019 12:33:59 -0400 murph nj murphnj+fedora@gmail.com wrote:
The power supply was not the problem, it was the battery. All subsequent testing has been done plugged into power. It also boots just fine from a USB stick with either Mint or Fedora.
There are three kernels available from the boot menu, along with rescue mode. All of them kernel panic.
Does it get far enough along to put messages in the journal? You could run a boot that fails, then boot from the USB, and mnt the installed filesystem (usually under /mnt/sysimage), chroot to /mnt/sysimage, and run journalctl -b to look at boot messages.
Before you do that, boot from the USB, and then run a file system check on the drives in the system while they are not mounted. For ext4, that would be e2fsck -n -v /dev/sd??, answer no to any queries (make no changes), and verbose messages. If it wants to fix things, and the changes look OK, run with -p instead of -n. See man e2fsck; this is your filesystem, you want to know what you are doing; a mistake can be fatal. _______________________________________________ users mailing list -- users@lists.fedoraproject.org To unsubscribe send an email to users-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org
On Wed, 4 Sep 2019 15:05:36 -0400 murph nj murphnj+fedora@gmail.com wrote:
I was able to mount the drives, and check them, no problems. I was also able to move all of my important data off of the drives to an external, so I'm not worried about any loss to the data on the drives. No worries of fatal mistakes anymore.
Did that include the boot partition, where the kernels and initramfs' are?
I also took a look at the journals that were left, there was nothing that seemed terribly indicative of the problem. It was all from the last time that it was running normally, nothing was getting written on the failed boot attempts, unfortunately.
Did you remove rhgb and quiet from the kernel boot parameters? That should show you the stream of messages from systemd as the system boots, and the error should be near the end of that stream, so it should still be on screen when the crash happens. I think you hit space or escape during boot to get the boot menu, and then you can edit the line. I have a timeout, so I always see it.
Does it drop to a dracut or grub prompt? If it does, it is also possible to view the temporary boot logs from there (I think they are under /tmp (/var/tmp?)). It's painful because the shell is very limited. Use ls /bin to see the available commands. I think less is there, as is vi.
On Wed, Sep 4, 2019 at 4:21 PM stan via users users@lists.fedoraproject.org wrote:
On Wed, 4 Sep 2019 15:05:36 -0400 murph nj murphnj+fedora@gmail.com wrote:
I was able to mount the drives, and check them, no problems.
Did that include the boot partition, where the kernels and initramfs' are?
I had not, but on your suggestion, I did. And....
I also took a look at the journals that were left, there was nothing that seemed terribly indicative of the problem. It was all from the last time that it was running normally, nothing was getting written on the failed boot attempts, unfortunately.
Did you remove rhgb and quiet from the kernel boot parameters? That should show you the stream of messages from systemd as the system boots, and the error should be near the end of that stream, so it should still be on screen when the crash happens. I think you hit space or escape during boot to get the boot menu, and then you can edit the line. I have a timeout, so I always see it.
I had not, but I edited from the grub menu, and eliminated them. I was then able to see the messages. (I was able to see them previously by hitting the escape key right after entering my password for the disks.)
Everything left on the screen after the kernel panic is (too much to type)
Does it drop to a dracut or grub prompt? If it does, it is also possible to view the temporary boot logs from there (I think they are under /tmp (/var/tmp?)). It's painful because the shell is very limited. Use ls /bin to see the available commands. I think less is there, as is vi.
Unfortunately, it goes right to a kernel panic, so no prompts at all to work with. (I'd have chewed on it more before asking for help if I had more to work with)
On Wed, Sep 4, 2019 at 5:42 PM murph nj murphnj+fedora@gmail.com wrote:
On Wed, Sep 4, 2019 at 4:21 PM stan via users users@lists.fedoraproject.org wrote:
On Wed, 4 Sep 2019 15:05:36 -0400 murph nj murphnj+fedora@gmail.com wrote:
I was able to mount the drives, and check them, no problems.
Did that include the boot partition, where the kernels and initramfs' are?
I had not, but on your suggestion, I did. And....
Sorry, responded prematurely.
I meant to say that both the volumes that are mounted as /boot and /boot/efi check out OK.
thanks, --murph
On Thu, Sep 5, 2019 at 11:08 AM sixpack13 sixpack13@online.de wrote:
e2fsck is for ext -partitions AFAIK /boot/efi is fat !
You are correct. I already checked, all the partitions are fine.
On 19-09-04 17:42:40, murph nj wrote: ...
Everything left on the screen after the kernel panic is (too much to type)
...
Unfortunately, it goes right to a kernel panic, so no prompts at all to work with. (I'd have chewed on it more before asking for help if I had more to work with)
It dies trying to pivot-root, so add to the kernel command line
rd.break=pre-pivot
(See `man dracut.cmdline`, from `man kernel-command-line`.) It will drop into a shell just before disaster strikes (if I read it right) and you'll get a chance to look around. The current root will be the init ramdisk, and the new root (hard drive) should be /sysroot IIRC.
On Wed, Sep 4, 2019 at 9:56 PM Tony Nelson tonynelson@georgeanelson.com wrote:
It dies trying to pivot-root, so add to the kernel command line
rd.break=pre-pivot
Interesting. I was able to drop to a shell. I took a look at the journal, and it looked like a possible hibernation problem, so I took that out of the kernel line. Still panicked. I'll have to pick this up tomorrow afternoon. I'll try to get a dump of the journal then and send it along.
Thanks for the help so far!
Tony:
Can you give me some insight as to what the boot process does next AFTER that pivot point? The output I'm getting isn't enlightening me, and I'm not quite sure where to look next.
I can just reinstall, but I'm trying to learn more about how to recover from errors, instead of taking the east way out.
Thanks, --murph
On 19-09-09 14:10:24, murph nj wrote:
Tony:
Can you give me some insight as to what the boot process does next AFTER that pivot point? The output I'm getting isn't enlightening me, and I'm not quite sure where to look next.
...
Usually, when it all goes wrong just after pivot-root, there is a problem with the new (real) root. Exactly how it goes wrong is not useful. I would look hard at whatever is mounted at /sysroot. It should be what you expect / (root) to be, but mounted ro at that point (`cat /proc/mounts` should show it mounted at /sysroot (see `man 5 fstab` for format); `ls -l /sysroot` should show the contents you expect). (All off the top of my head; I haven't rebooted to test.)
On Mon, Sep 9, 2019 at 9:07 PM Tony Nelson tonynelson@georgeanelson.com wrote:
Usually, when it all goes wrong just after pivot-root, there is a problem with the new (real) root. Exactly how it goes wrong is not useful. I would look hard at whatever is mounted at /sysroot. It should be what you expect / (root) to be, but mounted ro at that point (`cat /proc/mounts` should show it mounted at /sysroot (see `man 5 fstab` for format); `ls -l /sysroot` should show the contents you expect). (All off the top of my head; I haven't rebooted to test.)
Thanks. That gives me something to examine. I'm working tonight, will probably dig back in to this tomorrow.
Symbolic links missing on / (/bin->/usr/bin and such). /bin/bash missing or corrupted. /lib64/ libraries missing or corrupted. systemd itself missing, pivot root attempts to run systemd and that may be what is failing, but you don't really know what the underlying cause is for it to fail.
What exactly was the "improper shutdown"
The easiest way to debug is to livecd boot it mount up the old filessystems on say /oldroot (rootfs) mount /oldroot/usr for usr and so forth and then cd /oldroot and do a "chroot ."
I would expect the chroot . to fail with some sort of error that may hint at what is broken.
On Tue, Sep 10, 2019 at 7:15 AM murph nj murphnj+fedora@gmail.com wrote:
On Mon, Sep 9, 2019 at 9:07 PM Tony Nelson tonynelson@georgeanelson.com wrote:
Usually, when it all goes wrong just after pivot-root, there is a problem with the new (real) root. Exactly how it goes wrong is not useful. I would look hard at whatever is mounted at /sysroot. It should be what you expect / (root) to be, but mounted ro at that point (`cat /proc/mounts` should show it mounted at /sysroot (see `man 5 fstab` for format); `ls -l /sysroot` should show the contents you expect). (All off the top of my head; I haven't rebooted to test.)
Thanks. That gives me something to examine. I'm working tonight, will probably dig back in to this tomorrow. _______________________________________________ users mailing list -- users@lists.fedoraproject.org To unsubscribe send an email to users-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org
On Tue, Sep 10, 2019 at 2:52 PM Roger Heflin rogerheflin@gmail.com wrote:
Symbolic links missing on / (/bin->/usr/bin and such). /bin/bash missing or corrupted. /lib64/ libraries missing or corrupted. systemd itself missing, pivot root attempts to run systemd and that may be what is failing, but you don't really know what the underlying cause is for it to fail.
What exactly was the "improper shutdown"
A failing battery (I think) caused it to shut down when I wasn't ready, and didn't shut down properly. Then I accidentally opened the lid, which started it again, and it again shut down cold. I wouldn't think that would corrupt things, but it's all I have for now.
I'll dig into this tomorrow probably.
The easiest way to debug is to livecd boot it mount up the old filessystems on say /oldroot (rootfs) mount /oldroot/usr for usr and so forth and then cd /oldroot and do a "chroot ."
I would expect the chroot . to fail with some sort of error that may hint at what is broken.
I'll try that.
Thanks.
On Tue, Sep 10, 2019 at 2:52 PM Roger Heflin rogerheflin@gmail.com wrote:
Symbolic links missing on / (/bin->/usr/bin and such). /bin/bash missing or corrupted. /lib64/ libraries missing or corrupted. systemd itself missing, pivot root attempts to run systemd and that may be what is failing, but you don't really know what the underlying cause is for it to fail.
What exactly was the "improper shutdown"
The easiest way to debug is to livecd boot it mount up the old filessystems on say /oldroot (rootfs) mount /oldroot/usr for usr and so forth and then cd /oldroot and do a "chroot ."
I would expect the chroot . to fail with some sort of error that may hint at what is broken.
On Tue, Sep 10, 2019 at 7:15 AM murph nj murphnj+fedora@gmail.com wrote:
On Mon, Sep 9, 2019 at 9:07 PM Tony Nelson tonynelson@georgeanelson.com wrote:
Usually, when it all goes wrong just after pivot-root, there is a problem with the new (real) root. Exactly how it goes wrong is not useful. I would look hard at whatever is mounted at /sysroot. It should be what you expect / (root) to be, but mounted ro at that point (`cat /proc/mounts` should show it mounted at /sysroot (see `man 5 fstab` for format); `ls -l /sysroot` should show the contents you expect). (All off the top of my head; I haven't rebooted to test.)
I was able to mount all of the boot drives, and it seemed OK, I couldn't see anything that was out of order.
Unfortunately, I'm going to need this laptop next weekend, so my journey of discovery on this is going to have to end, since I couldn't fix it, I'll have to reload. We gave it a shot though, and I have some new things to try for other potential issues.
Again thanks for the help.
--murph
On Wed, 4 Sep 2019 17:42:40 -0400 murph nj murphnj+fedora@gmail.com wrote:
I had not, but I edited from the grub menu, and eliminated them. I was then able to see the messages. (I was able to see them previously by hitting the escape key right after entering my password for the disks.)
Everything left on the screen after the kernel panic is (too much to type)
What I pull from that is that the error causing the problem is a page fault when it asks for some memory. But I don't know where to go from that insight, other than opening a bugzilla against the kernel, describing your problem, and attaching your screenshot with its stack trace. I'm stumped. If it boots from USB, then the memory must be working properly, but here you are failing when accessing memory. What's different?
Long shot, reseat the memory, and run a memory check, to be sure memory is OK.
Unfortunately, it goes right to a kernel panic, so no prompts at all to work with. (I'd have chewed on it more before asking for help if I had more to work with)
Too bad. I see that Tony gave you a workaround. That will show the state before the error, but not after the error. I vaguely recall that there is a way to tell the kernel or systemd to dump a core file when this happens. That would also be good to attach to a bugzilla, if it is available.
Two years later, I can say that this error might still be a thing. I concluded that the most likely cause is my mother. Jokes aside, she unplugged the power chord before the pc switches off completely. Through the years, she managed to crash in the same way f33, f34, and now f35. This time I had a look at the hard drive, and I can try to add something to the discussion. The situation is the same as described above, kernel panic at switch-root. Three kernels were available (up to 5.12), and they all failed in the same way. Partitions look fine, fat for UEFI (not tested, but hey, I reach boot without problems), ext4 for boot, btrfs for data (2 sub-volumes with / and /home mount points) without encryption. All btrfs checks run from a live USB key were perfectly clean. I tried to switch root from the live key, and... error. Some executables and libraries were completely deleted (zero sizes). I have a partial list, but I don't think it will be useful. Core software was all there except for python. Unfortunately, I already reinstalled f35 by nuking the root subvolume and creating a new one. Next time I will be smarter and save it for further analysis.