On Mon, Apr 4, 2022 at 5:12 AM Fabio Valentini <decathorpe(a)gmail.com> wrote:
On Thu, Mar 31, 2022 at 11:39 PM Neal Gompa <ngompa13(a)gmail.com> wrote:
> Hey all,
> Earlier this week, the Fedora Workstation WG discussed a ticket
> brought to us asking for a GUI-based rescue/recovery environment.
> While we all agreed in principle that such a thing would be a very
> good thing to have, we don't really know how to achieve such a thing.
> Additionally, we're not really sure what the scope of things should be
> provided in said recovery environment and what kind of things people
> would expect to be able to fix in there.
> So I come to y'all to ask about this and give us some feedback on the
> idea, how to do it, and what kinds of things you expect people to need
> a recovery environment for.
This sounds interesting.
The only situation in which I would really have needed a "recovery
environment" was when after upgrading my old Fedora installs to Fedora
33 or something (whenever we switched GRUB to use BLS snippets, I
think) some required GRUB modules that were split off into subpackages
didn't get pulled in (I think it was grub2-efi-x64 ?), leaving the
system in an unbootable state, which I was only able to fix by booting
from a Live USB and installing the missing GRUB modules.
On UEFI, the blscfg.mod that brough BLS support is built-in to
grub$arch.efi on the ESP, and on conventional RPM installations is
replaced whenever the, e.g. grub2-efi-x64 package is updated. So I'm
not sure what this problem could have been. Meanwhile on BIOS systems,
the embedded "stage 2" bootloader (a.k.a. core.img which goes in the
MBR gap, or in the BIOS boot partition on GPT) is never updated
automatically, even when doing full system upgrades. This embedded
core.img does become stale. We had a problem where old (circa Fedora
24?) GRUB core.img became confused with a new generation blscfg.mod
found in /boot/grub2, and the fix was to point GRUB command line to
load the old grub.cfg, thereby being able to boot, and then
grub2-install to get the latest core.img created and embedded.
So yeah this was sufficiently esoteric and low level enough that it
would have prevented booting a rescue environment, at least one that's
under discussion. I think it's more the domain of coreos/bootupd, to
perhaps support an A/B approach to updating the bootloader, and thus
there'd be a fallback bootloader.
If I didn't know what to do (or wouldn't have had a Live USB
that would have basically bricked my system (or locked me into booting
Windows with its own bootloader). Not sure if it would be possible
with the "recovery environment" you would have in mind, but a basic
"are all the components that are required to boot there" check, with
suggestions how to fix them if they're missing, would have saved me
tons of time recovering from the borked grub install.
A fairly simple tool could check the bootloader path:
1. Does NVRAM contain one Fedora entry;
2. Does that NVRAM entry point to a file that exists;
3. Is that file shim, and does its hash match its RPM, and is it the
4. Is grub$arch.efi present, in the correct location, does its hash match RPM;
5. does /boot/efi/fedora/grub.cfg exist;
6. does the fs-uuid contained in this grub.cfg exist per libblkid;
7. is that fs-uuid mountable, i.e. mount it;
8. does that mounted fs have grub2/ directory, and does grub.cfg exist;
9. does that grub.cfg do insmod blscfg;
10. does the mounted fs have loader/entries, and do snippets exist;
11. minimally parse/check those bls snippets, and do they pass/fail
basic sanity checking;
All of these are just pass/fail tests, not repairing anything. I think
we'd need a more sophisticated tool to handle repairs and making sure
those repairs are reversible when possible. But any kind of repair is
risky because it changes things. So I'd like to emphasize collecting
information and comparing it to what's expected, rather than become
invasive and make risky changes. There isn't much hand holding here,
but the user is on perhaps a sparsely populated island rather than a