I decided to log out and log back in to my X11 based KDE session just now, and I saw that 'Discover' was telling me I had updates available. So I said 'go ahead'. Eventually, it said I needed to reboot, so I did. After 4 (or 5) reboots that the machine drove itself through, the last reboot failed to start up to a GUI session.
As a matter of fact, it dropped me down and told me it needed to enter an emergency boot and asked for my root password. The message also told me to look at 'journalctl -xb' After a few thousand lines of info, I saw nothing of significance other than it hadn't finished.
The other suggestion was 'systemctl default'. That resulted in the following:
Failed to mount /boot/efi Dependency failed for Local File System Dependency failed for Mark the need to relabel after reboot Failed to mount RPC File System Dependency failed for rpc-pipefs.target Dependency failed for RPC security service for NFS client and server Failed to start Load Kernel Modules Failed to mount Arbitrary Executable File Formats File System Failed to mount Arbitrary Executable File Formats File System Failed to mount Arbitrary Executable File Formats File System Failed to mount Arbitrary Executable File Formats File System Failed to mount Arbitrary Executable File Formats File System Failed to start Set Up Additional Binary Formats
... and then nothing. I had to cold start and that brings me back to the same issues. Trying to reboot a previous kernel doesn't even result in any boot messages.
I now have a 'non-working' machine. Suggestions are welcome (and needed)!
TIA Fulko
On Sun, 16 Jan 2022 21:07:29 -0500 Fulko Hew fulko.hew@gmail.com wrote:
I decided to log out and log back in to my X11 based KDE session just now, and I saw that 'Discover' was telling me I had updates available. So I said 'go ahead'. Eventually, it said I needed to reboot, so I did. After 4 (or 5) reboots that the machine drove itself through, the last reboot failed to start up to a GUI session.
Suspicious that this is not deterministic. It should either fail identically every time or restart every time (in my opinion).
As a matter of fact, it dropped me down and told me it needed to enter an emergency boot and asked for my root password. The message also told me to look at 'journalctl -xb' After a few thousand lines of info, I saw nothing of significance other than it hadn't finished.
You could try journalctl -rxb so that the last messages are presented first. It is likely that that is where the error will be.
The other suggestion was 'systemctl default'. That resulted in the following:
Failed to mount /boot/efi Dependency failed for Local File System Dependency failed for Mark the need to relabel after reboot Failed to mount RPC File System Dependency failed for rpc-pipefs.target Dependency failed for RPC security service for NFS client and server Failed to start Load Kernel Modules Failed to mount Arbitrary Executable File Formats File System Failed to mount Arbitrary Executable File Formats File System Failed to mount Arbitrary Executable File Formats File System Failed to mount Arbitrary Executable File Formats File System Failed to mount Arbitrary Executable File Formats File System Failed to start Set Up Additional Binary Formats
... and then nothing. I had to cold start
It seems like a hardware error to me from the symptoms. How can the kernel not mount /boot/efi unless the drive has either power issues or seek errors / bad sectors. This is really basic.
and that brings me back to the same issues. Trying to reboot a previous kernel doesn't even result in any boot messages.
I now have a 'non-working' machine. Suggestions are welcome (and needed)!
Long shots.
It might be software, but it could be just a coincidence that it chose this time for a hardware error to expose itself.
Do you have a list of what was updated? It would be good to see if there are any updates that might have caused this to happen via software. I'm not sure what would stop /boot/efi from being mounted.
Did you power down completely at any point? That will allow components to lose any retained state.
You could, while completely powered down, try reseating internal components, especially drive connectors.
If you can reach the BIOS menu, look at the power supply numbers. Are they at or near spec?
Can you boot a livecd / usb so you can do checks of the drives to see if they are still functioning properly, maybe a smartctl (smartctl -a /dev/[drive designation])? If a live image boots and runs, it will indicate that your memory is (probably) not the issue as well.
On Mon, Jan 17, 2022 at 2:00 PM stan via users < users@lists.fedoraproject.org> wrote:
On Sun, 16 Jan 2022 21:07:29 -0500 Fulko Hew fulko.hew@gmail.com wrote:
I decided to log out and log back in to my X11 based KDE session just now, and I saw that 'Discover' was telling me I had updates available. So I said 'go ahead'. Eventually, it said I needed to reboot, so I did. After 4 (or 5) reboots that the machine drove itself through, the last reboot failed to start up to a GUI session.
Suspicious that this is not deterministic. It should either fail identically every time or restart every time (in my opinion).
I don't think you understand what I was saying. The update process that 'Discover' performed was 'strange' to me. For the last 20 years, I've used rpm, yum and dnf to download and install updates, and if I wanted... I'd reboot to use any new kernel that may have been updated.
This time I chose to use 'Discover', because (for a change) it actually told me there was new stuff. (I'm getting the feeling that 'discover' only runs at login time. Something I do only once every few months. ie. at every power failure.)
So after 'discover' downloaded and (apparently) updated everything, it asked me to reboot. So I used discover's reboot button to proceed. During the first reboot cycle I watched the boot messages go by, and I saw words to the effect that it was doing some post reboot additional updates. It finished them and then said it was rebooting.
On that next boot, I watched again, while it talked about other updates it needed to do, and... and another reboot.
After the n'th reboot, I no longer saw any 'installing' activity, and it went all the way through and then ... nothing. No more boot messages, and no GUI either.
So I DID do a cold reboot and then it went through the standard boot messages until those errors I mentioned and it dropped me into that emergency boot prompt.
As a matter of fact, it dropped me down and told me it needed to
enter an emergency boot and asked for my root password. The message also told me to look at 'journalctl -xb' After a few thousand lines of info, I saw nothing of significance other than it hadn't finished.
You could try journalctl -rxb so that the last messages are presented first. It is likely that that is where the error will be.
journalctl -xb gave me those error messages I provided. So yes, the first error was that it couldn't mount /boot/efi.
The other suggestion was 'systemctl default'.
That resulted in the following:
Failed to mount /boot/efi Dependency failed for Local File System Dependency failed for Mark the need to relabel after reboot Failed to mount RPC File System Dependency failed for rpc-pipefs.target Dependency failed for RPC security service for NFS client and server Failed to start Load Kernel Modules Failed to mount Arbitrary Executable File Formats File System Failed to mount Arbitrary Executable File Formats File System Failed to mount Arbitrary Executable File Formats File System Failed to mount Arbitrary Executable File Formats File System Failed to mount Arbitrary Executable File Formats File System Failed to start Set Up Additional Binary Formats
... and then nothing. I had to cold start
It seems like a hardware error to me from the symptoms. How can the kernel not mount /boot/efi unless the drive has either power issues or seek errors / bad sectors. This is really basic.
I read other postings that people have had issues with missing VFAT support in their kernel, that's needed to mount that filesystem.
and that brings me back to the same issues.
Trying to reboot a previous kernel doesn't even result in any boot messages.
I now have a 'non-working' machine. Suggestions are welcome (and needed)!
Long shots.
It might be software, but it could be just a coincidence that it chose this time for a hardware error to expose itself.
Do you have a list of what was updated? It would be good to see if there are any updates that might have caused this to happen via software. I'm not sure what would stop /boot/efi from being mounted.
Sadly I don't have that list. There were about 60 components including the kernel that were updated.
Did you power down completely at any point? That will allow components
to lose any retained state.
You could, while completely powered down, try reseating internal components, especially drive connectors.
If you can reach the BIOS menu, look at the power supply numbers. Are they at or near spec?
Can you boot a livecd / usb so you can do checks of the drives to see if they are still functioning properly, maybe a smartctl (smartctl -a /dev/[drive designation])? If a live image boots and runs, it will indicate that your memory is (probably) not the issue as well.
After a lot of experimentation, I did get the previous kernel to boot all the way to the GUI. (I don't know why that didn't work the first time I tried it.) So I'm back to a working system. My hardware is fine. And that older kernel (5.15-13-200.fc35) IS able to mount /boot/efi It's just the newer kernel that can't.
What do I see now?
1/ I see that about 30 of those 60 packages that were supposed to be originally installed never were. Mostly wine stuff. I installed them manually with dnf.
2/ I think I'd like to uninstall those latest kernel packages. (5.15.14-200.fc35) kernel, kernel-core, kernel-devel, kernel-modules, kernel-modules-extra and then re-install them. I'm not confident yet on what that actual command line would be, so I haven't done it yet.
3/ I don't think I'll ever use 'discover' again. It seems tedious, doesn't provide any status feedback on what it's doing. And it always seems to want to reboot. What was wrong with the old 'new rpm download/install' procedure/utility?
On Mon, 17 Jan 2022 17:20:17 -0500 Fulko Hew fulko.hew@gmail.com wrote:
After a lot of experimentation, I did get the previous kernel to boot all the way to the GUI. (I don't know why that didn't work the first time I tried it.) So I'm back to a working system. My hardware is fine. And that older kernel (5.15-13-200.fc35) IS able to mount /boot/efi It's just the newer kernel that can't.
That's great! Wonderful feeling when the system recovers, isn't it?
What do I see now?
1/ I see that about 30 of those 60 packages that were supposed to be originally installed never were. Mostly wine stuff. I installed them manually with dnf.
2/ I think I'd like to uninstall those latest kernel packages. (5.15.14-200.fc35) kernel, kernel-core, kernel-devel, kernel-modules, kernel-modules-extra and then re-install them. I'm not confident yet on what that actual command line would be, so I haven't done it yet.
In boot, there should be a config file for that latest kernel. Run the command grep -i vfat config[latest kernel text] If there is no vfat, this will show it, but I think all fedora kernels are built with drivers for vfat built in. CONFIG_VFAT_FS=y
3/ I don't think I'll ever use 'discover' again. It seems tedious, doesn't provide any status feedback on what it's doing. And it always seems to want to reboot. What was wrong with the old 'new rpm download/install' procedure/utility?
It is for people only familiar with gui interfaces who don't understand, or care to understand, what is going on under the hood. Mac or Windows users, or people who have no computer experience and just want a utility to go on the web, send some emails, maybe do a little spreadsheet or word processing. i.e. Fedora trying to appeal to a broader audience than its original technical base.
The reboot is because, especially from the GUI instead of a virtual console, updates can create crashes when newer libraries are installed that are not backward compatible, or when new applications expect an api that isn't in an older library. The reboot ensures that everything is at the latest versions.
I always update using dnf, without a gui running, from a virtual console, and have never had a problem. I start the GUI after the updates, so it picks up all the latest, greatest. I think it would be rare even if I was running dnf updates from the gui. But, again, for non technically savvy folks, this ensures they don't hit a crash or error, something very frightening to them, and which might give a bad impression that they then spread via social media comments.
On Tue, 18 Jan 2022 09:22:55 -0700 stan upaitag@zoho.com wrote:
In boot, there should be a config file for that latest kernel. Run the command grep -i vfat config[latest kernel text] If there is no vfat, this will show it, but I think all fedora kernels are built with drivers for vfat built in. CONFIG_VFAT_FS=y
If vfat is built in, you could just do a dnf reinstall kernel[lasest kernel package name] to get a proper install of the kernel. It seems something went wrong on the original install.
On Tue, 18 Jan 2022 at 12:31, stan via users users@lists.fedoraproject.org wrote:
On Tue, 18 Jan 2022 09:22:55 -0700 stan upaitag@zoho.com wrote:
In boot, there should be a config file for that latest kernel. Run the command grep -i vfat config[latest kernel text] If there is no vfat, this will show it, but I think all fedora kernels are built with drivers for vfat built in. CONFIG_VFAT_FS=y
If vfat is built in, you could just do a dnf reinstall kernel[lasest kernel package name] to get a proper install of the kernel. It seems something went wrong on the original install.
One of my systems failed to boot after updating because something had added text to the end of the linux command line in /etc/defaullt/grub
On Mon, Jan 17, 2022 at 6:06 PM Joe Zeff joe@zeff.us wrote:
On 1/17/22 11:59 AM, stan via users wrote:
Suspicious that this is not deterministic. It should either fail identically every time or restart every time (in my opinion).
If it's really nnondeterministic, it's what's called a mandelbug.
When booting the new kernel, it fails the same way every time. Very deterministic.
mount: /boot/efi: unknown filesystem type 'vfat'.
What I don't remember is what this multi-stage install-reboot was trying to accomplish. Looking back in messages or journalctl, I don't see anything. But it was doing something explicitly.
On Mon, Jan 17, 2022 at 11:29 PM Joe Zeff joe@zeff.us wrote:
On 1/17/22 6:31 PM, Fulko Hew wrote: What I don't remember is what this multi-stage install-reboot was trying to accomplish. Looking back in messages or journalctl, I don't see anything. But it was doing something explicitly.
If you were installing something, try dnf history.
I looked at that, and it tells me in the first transaction (that night) that it updated a bunch of packages and removed kmod-VirtualBox. [I presume now was when Discover insisted I reboot.] The next transaction (10 minutes later), it was trying to install a new version of kmod-VirtualBox. But it failed! [Here I presume, the system auto-rebooted. The next transaction (another 10 minutes later), it was trying to install that same new version of kmod-VirtualBox. And again it failed. [And here I presume, was another auto-reboot. But this time, I was dropped into 'emergency boot mode due to the "can't mount /boot/efi" issue.]
On Tue, Jan 18, 2022 at 11:23 AM stan via users < users@lists.fedoraproject.org> wrote:
If you were installing something, try dnf history. In boot, there should be a config file for that latest kernel. Run the
command
grep -i vfat config[latest kernel text] If there is no vfat, this will show it, but I think all fedora kernels are built with drivers for vfat built in. CONFIG_VFAT_FS=y
I see the latest version of config is the same as the older versions of config... and they all contain: CONFIG_VFAT_FS=m
(So if I wanted to pursue this, I think I'd have to go into the initramfs to see if the VFAT module was in there. But since I'm back into a working state I'm declaring my problem solved (see below).)
3/ I don't think I'll ever use 'discover' again. It seems tedious, doesn't provide any status feedback on what it's doing. And it always seems to want to reboot. What was wrong with the old 'new rpm download/install' procedure/utility?
It is for people only familiar with gui interfaces who don't understand, or care to understand, what is going on under the hood. Mac or Windows users, or people who have no computer experience and just want a utility to go on the web, send some emails, maybe do a little spreadsheet or word processing. i.e. Fedora trying to appeal to a broader audience than its original technical base.
I knew why it's being done, but it's a BAD design decision. But I'd have to say "Please stop making Linus (or Fedora) as bad as MS Windows". My GF is constantly asking me why Windows is doing this or that while shutting down, and I have to tell her "I don't know. It doesn't tell you." She asks: "How long will it take before it shuts down?" I answer: "We have no idea what it's doing, or when it will be done. "For all I know it will probably reboot a few times too, before it actually will be done." "It could take minutes or hours... we have no way of knowing."
On a lark, today I saw that Discover was telling me there were updates. I asked it to show me what. After a number of minutes it told me... 7 packages. OK, I did a 'dnf update' instead. After 30 seconds it had fetched and installed all 7 packages. Then I tried 'discover' again. After another few minutes of fetching, it told me it (still) had to update those 7 packages. I asked it to refresh, and then asked again. Another few minutes later, it still thought it needed to install those 7 packages. What! So I killed off 'discover' on the task bar. Here is is, 2 hours later, and I see the icon has re-appeared on the task bar. It tells me it has 22 packages to update. But dnf tells me it is up to date.
Does dnf use a different set of repo servers than 'discover' does ? Does discover query a different local RPM db? It seems so.
-----
Anyway, the update to my status is that I removed the 5 kernel related packages, and then did the 'dnf upgrade' to reinstall them, and my system is now in a working state with that latest kernel update that had failed on me 2 days ago. Problem averted.
Thanks for all the input.
Fulko
On Tue, 18 Jan 2022 22:58:28 -0500 Fulko Hew fulko.hew@gmail.com wrote:
On a lark, today I saw that Discover was telling me there were updates. I asked it to show me what. After a number of minutes it told me... 7 packages. OK, I did a 'dnf update' instead. After 30 seconds it had fetched and installed all 7 packages. Then I tried 'discover' again. After another few minutes of fetching, it told me it (still) had to update those 7 packages. I asked it to refresh, and then asked again. Another few minutes later, it still thought it needed to install those 7 packages. What! So I killed off 'discover' on the task bar. Here is is, 2 hours later, and I see the icon has re-appeared on the task bar. It tells me it has 22 packages to update. But dnf tells me it is up to date.
Does dnf use a different set of repo servers than 'discover' does ?
They use the same set of repo servers.
Does discover query a different local RPM db?
I think all the package installers use the same rpm db. I haven't used discover, but when I had Packagekit active it seemed to keep its own internal records for reference. I think the problem is that unless the updates are actually run from discover or Packagekit, its internal database is not updated with changes. It doesn't do a cleanup of its internal db from the rpm db at each invocation (probably so it can be faster). So, if dnf is used to update packages, it will never remove the updates that dnf did from its internal database. It doesn't actually consider whether there is another installer running, it presumes that it is the sole package installer on the system. For its target users, that is probably a reasonable assumption, as they will always use it.
It seems so.
Yes.