Drawing lessons from fatal SELinux bug #1054350

Fri Jan 24 03:43:26 UTC 2014

On Fri, 2014-01-24 at 00:55 +0100, Kevin Kofler wrote:

> Last time an issue like that happened (the D-Bus regression that broke 
> updates),

That was my fault.  Something which left an impact on me, you can be
sure.  Like SELinux, DBus impacts everything nearly everything in early
userspace.

In fact, this particular regression is exactly one of the reasons I made
OSTree.  With both this and the SELinux bug, knowing you can *always*
reboot into the previous system state and recover makes things
fundamentally better for a fast-moving system like Fedora.

Note OSTree is fully capable of *atomic* upgrades to SELinux - where
your running system is untouched, with the old policy.  When you reboot,
you have the new policy, with the new daemon code.  

With the RPM live updates model by default, you have old policy, until
rpm reloads it in the middle of a "transaction", restarts some daemons,
but not all of them, leaving you some *old* code with *new* policy -
something hard to test because you have to go out of your way to
reproduce it.  You can't boot into that state directly.

Secondarily, on the server side, as many people have noted - this type
of thing can be caught by automated testing.  It's not hardware
specific.

OSTree pairs extremely well with automated testing, because it allows
fast incremental updates for offline VMs.  I've had this working for
about a year in the gnome-continuous context, and I can bring it to
rpm-ostree too.  As in "a week or two".

So no, SELinux doesn't need to be disabled.  We can make it much better
- if we step beyond the current philosophy of "test a package" to "test
many system states as atomic units".