Drawing lessons from fatal SELinux bug #1054350

Fri Jan 24 21:17:20 UTC 2014

On Fri, 2014-01-24 at 21:38 +0100, Michael Schwendt wrote:
> On Fri, 24 Jan 2014 21:06:29 +0100, Dominick Grift wrote:
> 
> > Agreed, The testers did not fail. Their issues were solved.
> 
> That doesn't match what one can read here:
> 
>   https://admin.fedoraproject.org/updates/FEDORA-2014-0806/selinux-policy-3.12.1-116.fc20
> 

I just had a quick look at the above URL. From all the testers there was
one guy that noticed the anomaly, and the biggest part of the events
weren't even related to the RPM issue

The RPM issue did not cause the normal AVC (type=AVC) denials (AFAIK)
that one would expect. Instead there were some SELINUX_ERR events
(type=SELINUX_ERR) that one might not notice if one is looking for AVC
(type=AVC) denials. (not sure if setroubleshoot would have reported
those)

The person that did notice the anomaly did some thorough testing, and
maybe there was also a little bit of luck involved there

> > They could not have found this issue in reason. 
> 
> Why not? Please explain.
> 

Because you would need to run RPM to notice it, and then be able to
correlate the issue to SELinux. If you are waiting for a package that
has your fixes then you test your issues and give karma, it may take a
while before one actually runs yum again, and by then the update may
have been ended up in the repository.

And this is just in the case of RPM. There can be bugs in policy for
many components, but those are often not fatal.

> > There was no change log entry for it, 
> 
> You make it sound as if the testers have tried to skim over the several of
> dozen bugzilla ticket descriptions linked at
>   https://admin.fedoraproject.org/updates/FEDORA-2014-0806/selinux-policy-3.12.1-116.fc20
> in an attempt at trying to find out _what_ the update touches.
> 
> A fundamental problem here is that even if a tester confirms that the
> update fixes a _single_ bug, the other several dozens of changes could
> cause regression -> reason to be careful and test this thing a bit longer.
> 

Sure, what i am saying is that this could have been prevented if the
team just put a little more passion into it and also did some proof
reading/coordination. The team knows whats going on. They know the
issues and they can quickly and effortlessly identify issues like these
if only they would take some time to watch each others commits.

> > and even if there was they would still would need to be able trace
> > the bug to SELinux.
> 
> That has been easy once the update arrived here on the nearby mirror.
> "setenforce 0 && repeat previous command that caused strange behaviour"
> is a very common troubleshooting thing, even if there haven't been any
> AVC denied messages.

If it was as common as you make it sound then maybe it might not have
come this far. It did. Again, one would have first had to identify the
issue (e.g. run RPM). There was no indication of any change related to
RPM (no change log entry).

But sure i give you that, yes thorough testing could have also prevented
this. (i still think its pretty unlikely but it could so i will take
that back)

Never the less, I think this issue could have been prevented even before
a package was spun.