Hi,
I have a request for list regulars.
The Fedora Workstation working group is curious if there's any pattern or categorization how Fedora installations typically break. i.e. the installation is successful, the system has been updated multiple times successfully, and then for whatever reason it breaks.
Are most failures hardware related? This could be broken down into hard failure (drive or logic board failed) and soft failure (some hardware configuration change and reverting the change resolves the problem).
What portion of the failures are early boot failures? (Defined as bootloader, kernel, or early initramfs failures. But excludes being landed at a dracut prompt.)
What portion of the failures land the user at a dracut shell?
What portion of the failures does the user get to a graphical shell but can't login?
What portion of the failures can the user login but there's some sort of anomalous behavior?
What portion of all failures are fixable without reinstalling?
Is the GRUB "rescue" menu entry ever useful in resolving problems?
Could everyone reading this try booting the "rescue" menu entry and describe what happens? How does the actual behavior compare to what you thought would happen?
The questions list is not complete, feel free to add your own categorizations / failure patterns that you tend to see.
Thanks!
My impression is that audio has been install time problematic since F34.
i always do a fresh install rather than update.
overall workstation has been stellar despite its current EOL status.
On Fri, Jul 15, 2022 at 11:43 AM Chris Murphy lists@colorremedies.com wrote:
Hi,
I have a request for list regulars.
The Fedora Workstation working group is curious if there's any pattern or categorization how Fedora installations typically break. i.e. the installation is successful, the system has been updated multiple times successfully, and then for whatever reason it breaks.
Are most failures hardware related? This could be broken down into hard failure (drive or logic board failed) and soft failure (some hardware configuration change and reverting the change resolves the problem).
What portion of the failures are early boot failures? (Defined as bootloader, kernel, or early initramfs failures. But excludes being landed at a dracut prompt.)
What portion of the failures land the user at a dracut shell?
What portion of the failures does the user get to a graphical shell but can't login?
What portion of the failures can the user login but there's some sort of anomalous behavior?
What portion of all failures are fixable without reinstalling?
Is the GRUB "rescue" menu entry ever useful in resolving problems?
Could everyone reading this try booting the "rescue" menu entry and describe what happens? How does the actual behavior compare to what you thought would happen?
The questions list is not complete, feel free to add your own categorizations / failure patterns that you tend to see.
Thanks!
-- Chris Murphy _______________________________________________ users mailing list -- users@lists.fedoraproject.org To unsubscribe send an email to users-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
On Fri, 15 Jul 2022 14:41:09 -0400 Chris Murphy wrote:
Is the GRUB "rescue" menu entry ever useful in resolving problems?
I've never had any use for it. I tend to boot off a USB stick if I want to investigate a problem independent of the current software.
failure patterns that you tend to see
The only consistent failure pattern I see is a new install locking up due to nouveau driver problems when web browsing. I always go with the default nouveau drivers when I install a new fedora version and for several years now, my system has always experienced these total lock up failures within a week, usually within 2 hours. They disappear once I install the rpmfusion nvidia drivers. (The failure is always a sudden lock up, leaving no clues in log files because the whole system is frozen, I'm only sure it is nouveau because it goes away when I switch to nvidia).
Far less frequently I'll find that my wireless keyboard isn't talking following a reboot of a new kernel, only occasionally happens on new kernels, not consistent. A complete power cycle fixes it, but so does unplugging and plugging back in the USB dongle (which now sits out front in an easily reachable position on a hub :-).
The only other consistent failures I see are user errors because I never read the release notes till I have to because I can't get sound to work when I then find pulse is gone and pipewire is the new thing and the config info I copied from previous system is wrong (just to pick a recent example, disappearing ifcfg file support in networking is another one).
On 2022-07-15 11:41, Chris Murphy wrote:
Is the GRUB "rescue" menu entry ever useful in resolving problems?
I think I've used it once when I moved a hard drive or possibly a disk image to another computer. The problem is that it never gets updated, so by the time you need it, it's several releases old and might not even work with the installed system.
Am 15.07.22 um 20:41 schrieb Chris Murphy:
Hi,
I have a request for list regulars.
The Fedora Workstation working group is curious if there's any pattern or categorization how Fedora installations typically break. i.e. the installation is successful, the system has been updated multiple times successfully, and then for whatever reason it breaks.
Are most failures hardware related?
Some are. They happen, but these happen once or twice a year.
What portion of the failures are early boot failures? (Defined as bootloader, kernel, or early initramfs failures. But excludes being landed at a dracut prompt.)
On BIOS-systems I am experiencing these once or twice a year kernel bugs.
By far most often (1x per month): Networking issues. In recent years, I several times found myself in situations, I had to tweak details of network configurations, because Fedora's "vision" of network configuration did harmonize well with mine.
(One recent such incident: Something in Fedora broke "domainname expansion".)
What portion of all failures are fixable without reinstalling?
The only occasion, I found myself reinstalling was something having screwed up the grub configuration on a BIOS multiboot system after a Fedora upgrade.
Is the GRUB "rescue" menu entry ever useful in resolving problems?
It's probably ten years or more, since I tried to used it. Nowadays, "rescue" is among the things I usually remove first from installations.
Ralf
On Fri, 15 Jul 2022 14:41:09 -0400 Chris Murphy lists@colorremedies.com wrote:
Hi,
I have a request for list regulars.
The Fedora Workstation working group is curious if there's any pattern or categorization how Fedora installations typically break. i.e. the installation is successful, the system has been updated multiple times successfully, and then for whatever reason it breaks.
I find Gnome 3 unusable, so I use Mate instead of Workstation.
Are most failures hardware related? This could be broken down into hard failure (drive or logic board failed) and soft failure (some hardware configuration change and reverting the change resolves the problem).
I have had my share of hardware failures, but it is far more common for hardware to reach end of support or to be replaced by an upgrade.
What portion of the failures are early boot failures? (Defined as bootloader, kernel, or early initramfs failures. But excludes being landed at a dracut prompt.)
Almost none, and those are usually self inflicted.
What portion of the failures land the user at a dracut shell?
Almost none, and those are usually self inflicted. I did have one case where LVM did not find the root volume group and another with a corrupt root partition, but those were years ago.
What portion of the failures does the user get to a graphical shell but can't login?
Never. (Except when a network failure happens and NIS is unhappy.)
What portion of the failures can the user login but there's some sort of anomalous behavior?
Most.
I have recently started seeing kernel panics in the nouveau kernel driver.
Many are new “features” that aren’t ready for prime-time. Two recent examples are systemd-resolved with bridged networks and pipewire with bluetooth.
There were also a few upgrades that changed the RPM database format, and the migration from authconfig to authselect. I feel that the Fedora Change process and QA are helping here.
(I missed the great a.out to ELF transition).
What portion of all failures are fixable without reinstalling?
All except drive failures.
Is the GRUB "rescue" menu entry ever useful in resolving problems?
No. I have a bootable USB drive I can use for this purpose, or I can boot the installer into rescue mode. I remove dracut-config-rescue and the rescue image from /boot.
The questions list is not complete, feel free to add your own categorizations / failure patterns that you tend to see.
I have never had hibernation work right.
Applications designed for Gnome 3 often do not play nicely with other window managers; they show Gnome 3 window decorations instead of what I have configured marco to show. (Blueberry and evince are two examples.)
I set a delay in the BIOS so I can access the BIOS menus, and a delay in the boot manager so I can access its menus.
Jim
On 7/16/22 14:00, Joe Zeff wrote:
On 7/16/22 13:55, James Szinger wrote:
I find Gnome 3 unusable, so I use Mate instead of Workstation.
When I read a description of what Gnome 3 would be, I started hunting around and ended up with Xfce, before Gnome 3 was released and never looked back.
MATE is a bit more stable that Xfce, but Xfce has more features and I find it more usable. Xfce does have a few annoying bugs, but I have learned to deal with them. I can boot to both but have not booted to MATE in about three months.
On Sat, 16 Jul 2022 15:00:45 -0600 Joe Zeff joe@zeff.us wrote:
On 7/16/22 13:55, James Szinger wrote:
I find Gnome 3 unusable, so I use Mate instead of Workstation.
When I read a description of what Gnome 3 would be, I started hunting around and ended up with Xfce, before Gnome 3 was released and never looked back.
I have used XFCE and liked it well enough that I would use it if MATE were not available. I still prefer MATE. For example, the MATE keyboard settings lets me customize the layout, but I had to use setxkbmap or xmodmap with XFCE.
KDE looks slick and polished, but has enough problems under the hood to remain a plaything. (Just look at the blocker bugs from the last few Fedora releases).
Jim
On Sun, 2022-07-17 at 08:02 -0600, James Szinger wrote:
On Sat, 16 Jul 2022 15:00:45 -0600 Joe Zeff joe@zeff.us wrote:
On 7/16/22 13:55, James Szinger wrote:
I find Gnome 3 unusable, so I use Mate instead of Workstation.
When I read a description of what Gnome 3 would be, I started hunting around and ended up with Xfce, before Gnome 3 was released and never looked back.
I have used XFCE and liked it well enough that I would use it if MATE were not available. I still prefer MATE. For example, the MATE keyboard settings lets me customize the layout, but I had to use setxkbmap or xmodmap with XFCE.
KDE looks slick and polished, but has enough problems under the hood to remain a plaything. (Just look at the blocker bugs from the last few Fedora releases).
I've used KDE for years. The only issue that bothers me right now is that session restore still isn't working on Wayland. On X11 it's fine.
I find Gnome essentially incomprehensible. Every now and again I give it a try to see if it has improved, and rapidly return to KDE. This isn't a "right or wrong" thing, it's just a matter of taste.
poc
On Sun, 17 Jul 2022 15:21:42 +0100 Patrick O'Callaghan pocallaghan@gmail.com wrote:
This isn't a "right or wrong" thing, it's just a matter of taste.
Hammer, nail, head. It isn't denigrating Gnome or KDE or ... to say that I don't prefer them. They obviously turn a lot of people's crank, since there are people putting in effort to enhance and maintain them.
Even given that, I still don't criticize them for what I perceive to be shortcomings, as that can be taken as a personal attack by the people who are working on them, and we need all the people working on enhancing software that we can get. And who knows? There might come a day when I see the light and want to use one of them. I do install all the desktops, and occasionally try them, and use their applications even if not using their desktop. So, choice is win-win from my perspective.
Thanks to all the developers implementing their vision. And all the maintainers purveying that vision to us.
On Sat, 2022-07-16 at 13:55 -0600, James Szinger wrote:
I find Gnome 3 unusable, so I use Mate instead of Workstation.
+1
On 7/15/22 11:41, Chris Murphy wrote:
Hi,
I have a request for list regulars.
The Fedora Workstation working group is curious if there's any pattern or categorization how Fedora installations typically break. i.e. the installation is successful, the system has been updated multiple times successfully, and then for whatever reason it breaks.
Are most failures hardware related? This could be broken down into hard failure (drive or logic board failed) and soft failure (some hardware configuration change and reverting the change resolves the problem).
What portion of the failures are early boot failures? (Defined as bootloader, kernel, or early initramfs failures. But excludes being landed at a dracut prompt.)
What portion of the failures land the user at a dracut shell?
What portion of the failures does the user get to a graphical shell but can't login?
What portion of the failures can the user login but there's some sort of anomalous behavior?
What portion of all failures are fixable without reinstalling?
Is the GRUB "rescue" menu entry ever useful in resolving problems?
Could everyone reading this try booting the "rescue" menu entry and describe what happens? How does the actual behavior compare to what you thought would happen?
The questions list is not complete, feel free to add your own categorizations / failure patterns that you tend to see.
Thanks
It is usually hardware failure(s). Typically fans (over heat cooking things), data drives (especially mechanical drives), and power supplies. Occasionally, a monitor will go out or a printer will die.
I will do upgrades on Fedora, but on Windows, I like to wipe and reinstall. To many problems to remove with Windows. I do come across issues upgrading Fedora, but with the help of the guys on this group, have always got them fixed.
I am pretty tickled with Fedora 36.
On Fri, 15 Jul 2022 14:41:09 -0400 Chris Murphy lists@colorremedies.com wrote:
What portion of the failures are early boot failures? (Defined as bootloader, kernel, or early initramfs failures. But excludes being landed at a dracut prompt.)
One thing that Fedora Linux REALLY needs is the file /usr/share/doc/grub2-common/README.Fedora which describes how to customize grub, how to set the default boot arguments, how kernel upgrades work, and so on. I find that Fedora’s grub is sufficiently different from upstream and other distros that guides written for them fall short when working with Fedora. I think this is somewhere on the Fedora website, but it is hard to find when needed. Another advantage of a local file is that it should be specific for the current installation.
Jim
On Sun, 17 Jul 2022 08:22:32 -0600 James Szinger wrote:
describes how to customize grub
After fighting with the "convenient" /etc/default/grub parameters for years to make grub do what I want, I finally gave up and now have a perl script that runs after any dnf update to hit the grub.cfg file with a big hammer and make it do what I want :-).
This is the current Fedora GRUB doc. https://fedoraproject.org/wiki/GRUB_2
This doc needs updating but skimming it I'm not finding outright bad advice. https://docs.fedoraproject.org/en-US/fedora/latest/system-administrators-gui...
The wiki doc contains grubby examples for modifying the kernel command line and is the preferred tool for such modifications because they're applied correctly universally: regardless of BIOS or UEFI, or Fedora version, or architecture, or whether the user might have at one time opted out of BootLoaderSpec conversion.
In particular the section https://fedoraproject.org/wiki/GRUB_2#Instructions_for_UEFI-based_systems contains more thorough instructions resulting in a more complete reinstallation that should be equivalent to a clean installed system.
-- Chris Murphy
On Fri, Jul 15, 2022 at 3:43 PM Chris Murphy lists@colorremedies.com wrote:
Hi,
I have a request for list regulars.
The Fedora Workstation working group is curious if there's any pattern or categorization how Fedora installations typically break. i.e. the installation is successful, the system has been updated multiple times successfully, and then for whatever reason it breaks.
[...]
I work with large packages from NASA and ESA that rely heavily on open source libraries. It is not unusual to have problems with these packages on new OS installs because they tend to rely on older libraries (gfortran is the most common example) which the developers have carried over from an earlier OS release, but aren't installed by default in fresh OS installs.
I generally do upgrades a couple times, then a fresh install to try a new filesystem (xfs a while ago, and recently btrfs) and because linux accumulates old libraries over time. This has resulted in cases where my system was able to run 3rd party packages that required older libraries and would not run on colleagues systems with fresh install. If time permits, I do update before the fresh install to check for issues in mission critical apps.
The issues I have seen:
On an elderly system with an old Nvidia card, the new kernel had a bug in nouveau affecting my card. I had to use an older kernel until the bug was fixed. With other distros and the same machine I have had issues when drivers were removed from the kernel (they were eventually restored).
On newer hardware with UEFI, upgrading from F34 to F35 failed to boot. I used the rescue kernel to discover that the kernel command line generated by the installer had misplaced quotes adding back "rhgb quiet" that I had removed. Since I was planning a fresh install anyway, I didn't try very hard to fix the problem, just did a fresh install with a new filesystem.
On Fri, 15 Jul 2022 14:41:09 -0400 Chris Murphy lists@colorremedies.com wrote:
Caveat: Running rawhide since it was F35
The Fedora Workstation working group is curious if there's any pattern or categorization how Fedora installations typically break. i.e. the installation is successful, the system has been updated multiple times successfully, and then for whatever reason it breaks.
I don't use Workstation, though I have it installed, so I'm not sure how meaningful my answers will be.
Are most failures hardware related? This could be broken down into hard failure (drive or logic board failed) and soft failure (some hardware configuration change and reverting the change resolves the problem).
The last failure I had was a power supply. Very rare.
What portion of the failures are early boot failures? (Defined as bootloader, kernel, or early initramfs failures. But excludes being landed at a dracut prompt.)
Don't have these, but I use custom compiled kernels tuned to my hardware.
Spoke too soon, just compiled the latest 5.19 rc6 kernel, and it doesn't boot, hangs. Previous rc4 boots and runs normally. The only difference in the configuration is a new automatic timing remediation that they had default Y, and I turned off. I'll be mentioning this on the kernel list.
What portion of the failures land the user at a dracut shell?
Only when install fail. I usually use the small netinstall image to do fresh minimal installs, and once they boot, do the rest of the install. The failures are in the fresh install, not the rest.
What portion of the failures does the user get to a graphical shell but can't login?
NA I always boot to runlevel 3, and start X from there.
What portion of the failures can the user login but there's some sort of anomalous behavior?
NA
What portion of all failures are fixable without reinstalling?
NA
Is the GRUB "rescue" menu entry ever useful in resolving problems?
It has helped me precisely once. I cloned an existing fedora install to a new partition, but for new hardware. It would not boot until I used the rescue image, because it needed drivers that weren't available on the cloned image.
I do keep it up to date by deleting the rescue kernel and initramfs, and then running /usr/lib/kernel/install.d/51-dracut-rescue.install add $(uname -r) "" /lib/modules/$(uname -r)/vmlinuz in the /boot directory.
Could everyone reading this try booting the "rescue" menu entry and describe what happens? How does the actual behavior compare to what you thought would happen?
I tried booting the rescue entry described above, and it failed. It looked like it was going to start, then the drive clunked, and the system hung. I checked permissions and mode, it was 700 vs 755 for regular kernels and unconfined vs system_u for regular kernels. Still had the issue on a reboot.
The one I successfully use before used a different command to generate, since it was years ago.
The questions list is not complete, feel free to add your own categorizations / failure patterns that you tend to see.
By far the most common error for me is packaging conflicts. New packages will be blocked because packages that depend on their old version are installed, and not being updated. Right now there are thousands of packages in rawhide affected by this. I'm not investigating because I see from the chatter that there are some issues being worked on.
The other problem is that packages get orphaned, and bit rot sets in, causing issues with older packages that are no longer supported, but are installed.
The above two paragraph aren't a criticism of package maintainers, rather the package system. I think the build system should have automatic checks to determine that all dependent packages are being rebuilt as part of a package build. Can that be done automatically? Look at the binary to determine what it uses? I'm not sure how difficult that would be, but that's what computers are for, to remove repetitive, routine work that is error prone. A couple of databases, every package has all of its dependencies in one, and every package has all of the packages it depends on in another. These are redundant, but it makes lookup simpler. A program runs periodically and updates them to the current situation.
On Sun, 17 Jul 2022 11:27:48 -0700 stan via users wrote:
I think the build system should have automatic checks to determine that all dependent packages are being rebuilt as part of a package build.
I have suggested in the past that the repos should have an extra layer of checks before being published: A virtual machine with as close to "everything" installed as possible, if it gets package errors doing a dnf update from the pre-published repos, then mail all the package maintainers for every package with errors and postpone publishing.
No one seems to like the idea because it isn't absolutely perfect, it would only solve 99% of the package dependency problems :-).
On Sun, 17 Jul 2022 14:42:09 -0400 Tom Horsley horsley1953@gmail.com wrote:
On Sun, 17 Jul 2022 11:27:48 -0700 stan via users wrote:
I think the build system should have automatic checks to determine that all dependent packages are being rebuilt as part of a package build.
I have suggested in the past that the repos should have an extra layer of checks before being published: A virtual machine with as close to "everything" installed as possible, if it gets package errors doing a dnf update from the pre-published repos, then mail all the package maintainers for every package with errors and postpone publishing.
No one seems to like the idea because it isn't absolutely perfect, it would only solve 99% of the package dependency problems :-).
That would work, but I think it is better to have it as part of the build system rather than the delivery system. And I wonder if they have the hardware to run that for every deliverable. My understanding is that it isn't possible to install all the packages on a single system. I could see having one system with all the desktops, with all their packages and dependencies, games, office, etc., one system with all the development tools, and one system with all the server / cloud packages. It would have to be automated as it's too much to expect a person to run this all the time.
Maybe have both, if there are the resources. I infer, from comments on the devel list, that there are resource constraints in infrastructure, so it might be hard to find someone to actually design and build it, and then maintain it, not even considering hardware. Probably lots of other demands on the existing resources, too.
Another way to take care of this would be to enhance fedora to have support for multilib. But that is an even bigger project than both of the above combined. And has its own issues; container stale libraries on steroids. Would have to always use the latest available library api for every function, garbage collection for no longer used libraries, frozen library api function call signatures and functionality, gets more and more complex.
On 7/17/22 12:27, stan via users wrote:
NA I always boot to runlevel 3, and start X from there.
When I first started using Linux as a secondary OS, I did that too. Then, I realized that I was doing almost everything in X and decided that it was silly not to boot into runlevel 5. Is there a particular reason that you don't, or is it just habit?
On Sun, 17 Jul 2022 14:28:00 -0600 Joe Zeff joe@zeff.us wrote:
On 7/17/22 12:27, stan via users wrote:
NA I always boot to runlevel 3, and start X from there.
When I first started using Linux as a secondary OS, I did that too. Then, I realized that I was doing almost everything in X and decided that it was silly not to boot into runlevel 5. Is there a particular reason that you don't, or is it just habit?
I use the consoles for development, low overhead, and with screen lots of alternate screens. Also, I do dnf updates from a console without X running. Same idea as only doing updates when the system is going down, though not as robust. And it allows me to see the messages from X as it runs (I start it on a different console than 1), if there are problems.
All that said, I think the consoles can be considered as deprecated, as they receive little love from developers, and are slowly being whittled down to inconsequential.
Chris, I have been using Fedora as my D2D corporate workstation since F32. I have had very few issues with any of it. My biggest issue and daily pita is systemd-resolved . I have to consistently "systemctl restart systemd-resolved" in order to get my VPN to restart. In my home network, systemd-resolved is so bad that I can't get freeipa \ DNS to work at all. I messed with it for quite a while, it seems to work for 5 minutes then not. I have a F36 KVM host with these guests: F36 freeipa F36 Server with GUI Alma 8 Server with GUI CentOS 8 server with GUI
External hosts F36 Laptop I get no stable DNS resolution from any of them. I suspect that the FreeIPA host is arguing with the systemd-resolved proxy....