Drawing lessons from fatal SELinux bug #1054350

Kevin Kofler kevin.kofler at chello.at
Thu Jan 23 23:55:23 UTC 2014


Hi,

it is time to analyze the fallout from the following catastrophic Fedora 20 
regression:
https://bugzilla.redhat.com/show_bug.cgi?id=1054350
"rpm scriptlets are exiting with status 127"

The impact:
* EVERYONE with Fedora 20 installed with SELinux enabled and in enforcing 
mode, and who updated to the current stable updates, was hit by this bug.
* The bug completely breaks upgrading any package through both GUI and CLI 
tools. Even the fix itself cannot be installed correctly.
* The only possible workaround requires use of the command line. It is 
IMPOSSIBLE to fix this using GUI tools installed by default. The
system-config-selinux tool which can be used to fix this in a pure GUI 
method is NOT installed by default in Fedora 20 for some stupid reason 
(because somebody decided to make it as painful as possible to disable that 
SELinux junk? Now I have to install system-config-selinux first thing post-
install just so I can disable the dreaded thing), and of course cannot be 
installed after the fact because of the bug. Normal users do not use 
terminals, so they can only reinstall Fedora or (more likely) a competing 
distribution (or even operating system)!
* The only possible workaround also requires root access to the machine. 
PolicyKit policy allows all users to install official updates by default, 
but those users then cannot fix the breakage without bothering an 
administrator.
* As per the above, there are several installations that can be considered 
BRICKED.
* We are losing users to Ubuntu because of this issue. People are explicitly 
saying they are switching to Ubuntu because of this bug (e.g. 
https://bugzilla.redhat.com/show_bug.cgi?id=1054312#c5 , later confirmed: 
https://bugzilla.redhat.com/show_bug.cgi?id=1054312#c10 ), and I am sure 
there are many more who are silently doing it without telling us.
* The bug now has 38 (!) duplicates in Bugzilla, plus many complaints on 
IRC, mailing lists, comments to other unrelated bugs (the fix for which 
cannot be installed due to the SELinux bug) etc.

So it is time to draw some lessons from this issue to prevent such a bug 
from ever occurring again!

So, what happened:
* We are enabling SELinux enabled (enforcing) by default, a tool designed to 
prevent anything it does not like from happening. (Reread this carefully: 
The ONLY thing that tool is designed to do at all is PREVENT things. It does 
not have a SINGLE feature other than being a roadblock and an annoyance.)
* SELinux works by shipping a "policy" that effectively tries to specify in 
one single place (read: single point of failure!) everything any program in 
Fedora (scalability disaster!) ever wants to do (second-guessing its actual 
code, i.e., duplication of all logic!). (Note the 3 (!) major antipatterns 
in a single-sentence (!) description of how SELinux works!)
* An update to that SELinux policy was shipped that BREAKS the most critical 
tools in Fedora, the ones required to update the system and thus install the 
fixes for any regressions, including the very regression that caused the 
breakage. And also any automated workarounds are blocked by design.
* That update made it out to the stable updates! In other words, the 
draconian Update Policies that were enacted in a vain attempt to prevent 
such issues from happening utterly failed at catching this bug.

Meanwhile, SELinux is also causing similarly fatal issues in Rawhide:
https://bugzilla.redhat.com/show_bug.cgi?id=1052317
"selinux-policy preventing login through sddm and ssh"
which are still NOT fixed! At least in that case, RPM is apparently not 
affected, but if you cannot log in to your system (SDDM is the default 
display manager for KDE in Rawhide), it is totally unusable ("bricked")!

So, what needs to happen:
* SELinux must be disabled (or preferably, not installed in the first place, 
to avoid wasting space for nothing) by default! Just consider the benefits 
(none!) vs. the risks (what you are seeing now: bricked systems in both F20 
and Rawhide, the users switching to other distributions). If we want to have 
any users left, SELinux needs to go away NOW!
* The Update Policies must be repealed. This regression has shown us that 
not only they totally failed at preventing it, but they are actively 
contributing to exposing MORE users to broken updates by delaying regression 
fixes. (This kind of regression fixes needs to go out DIRECTLY to stable!)

Last time an issue like that happened (the D-Bus regression that broke 
updates), a big drama was made that ultimately lead to the (flawed) Update 
Policies. And even a "catastrophe" that hit only a very small portion of our 
users (those running the server part of bind) was used as a(n additional) 
justification for the Update Policies, whereas this one now hits ALL users 
who merely had the mishap of sticking to our flawed defaults (SELinux 
enforcing). Why would we stick our heads in the sand this time?

DISABLE/DROP SELINUX NOW!

Thank you for your consideration,
        Kevin Kofler

PS: I still recommend to ALL Fedora users to disable SELinux immediately 
after installing Fedora. That is the most effective way to avoid ever being 
hit by catastrophical breakage such as bug #1054350 or bug #1052317. But we 
should not ship with a broken default in the first place!



More information about the devel mailing list