Bad file access on the rise

Doug Ledford dledford at
Sun Jun 9 15:05:44 UTC 2013

On 06/09/2013 10:34 AM, Adam Williamson wrote:
> On Sun, 2013-06-09 at 10:03 -0400, Steve Grubb wrote:
>>> I don't think anyone wants these accesses to generate audit records. The
>>> question is whether the right way to fix that is to avoid those accesses
>>> in the first place or to provide a mechanism so that legitimate accesses
>>> don't generate audit records.
>> There isn't a mechanism to allow these to slip through. Over the years I have 
>> come to realize that the audit system can be a great resource for debugging 
>> user space. It was sitting through one of Dave Jones' why userspace sucks 
>> lectures and afterwards pouring through audit logs that I saw that we can find 
>> some of these problems. If part of the goals when writing software is 
>> correctness and efficiency, then wouldn't failing syscalls be of interest? Not 
>> just in the case of EPERM, but also for example EINVAL? 
> Well what I'm trying to say is that you're acting as if the entire
> 'audit system' was carved on stone tablets and handed down from God. It
> wasn't. It's just a set of checks, the logic behind _each of which_ is
> as open to question as anything else. Just because a test for all EPERM
> syscall fails is a part of 'the audit system' does not make it an
> unquestionable totem. Instead of answering the question "do we actually
> believe that all cases of EPERM should be 'fixed', or in some cases
> would the cure be worse than the disease?" you seem to just keep saying
> "The Holy Audit System told me there's a problem!"
> I don't know who's right, in this case. But looking at the debate, I see
> one side raising what looks like a legitimate line of inquiry, and you
> just batting it back with 'The Holy Audit System has no flaws'.
> "There isn't a mechanism", okay, point taken. But that can be a flaw of
> the audit system as much as anything else.

Not necessarily.  The audit system is part of a security verification
system.  Those have always had different rules than, as you bring up
later, a test suite.  I remember the days when it was considered safe
practice to plug a dot-matrix line printer into your server and pipe the
output of syslog directly to the line printer so that if you got hacked,
the hackers could not erase their log trail.

The audit system is just a more modern version of that same thing.  And
the second you put any sort of exception into the audit rules, then you
have to verify that the exception can never be used to circumvent the
legitimate purpose of the rule.  So, if we put an exception into the
system so that PulseAudio can open these shm files and not be audited,
we would have to prove conclusively that no other application can ever
use that exception to hide their tracks (a rather difficult task) or
risk loosing some of our security certifications.  At least, that's my
understanding of how these security certifications work, Steve can
verify this.

So is the audit system the "Holy Audit System"?  Not in so many words,
but in some ways, yes, it is.  It is not the same as the rest of the
system.  Fedora has its concept of critical path packages for the
release, and for security certifications there is an entirely different
critical path set and the audit subsystem is right up top on that list,
and the kernel is right up there with it.  So if your first thought
about any conflict between the audit subsystem and normal packages is
"Should we fix this in the audit subsystem or in the package?", then a
necessary part of evaluating whether or not it should be fixed in the
audit subsystem *must* be adding the burden of proving that your fix in
the audit subsystem can not be abused by malicious hackers to help
subvert your system (or to hide the evidence of their
subversion/attempts).  If you aren't immediately considering that burden
when thinking about where to fix things, then you aren't considering the
entirety of the work that must be done to solve the problem in the audit

>> Why would anyone write software that is incorrect enough the OS spits it back 
>> as EINVAL?
> This is entirely irrelevant. From a QA monkey perspective, I'm comparing
> this with the case where we have a suite of tests, and someone raises
> the question if one of them is a sensible test. Talking about how good
> one of the others is is entirely out of scope. The fact we put them all
> together and called them a 'test suite' is really neither here nor
> there. The question here is not 'is auditing useful?', it's 'is this
> particular audit check one which always indicates a genuine bug that
> must be fixed?'
>> I'll leave it here for anyone curious enough to dig out the details of how 
>> each syscall is wrong. But its my belief that these are not intentionally 
>> written to fail and people didn't know they were issuing syscalls that will 
>> never work.
> Well, that's clearly not the case in the situation we're actually
> discussing: the author of one of the pieces of software you audited says
> he knows about the failed syscalls and does not think they're a problem.

To answer Steve's question: because the kernel is the ultimate arbiter
of what's allowed and what isn't, so it's easier and quicker to not
bother with checking the legitimacy of your options and simply allow the
kernel to do it for you.  It's lazy, but not wrong.  It is, however,
inefficient in some cases.  It can cause extra overhead that the user
may not know about, but might not appreciate if they did.  If it just
means wasted syscalls, it doesn't waste too much CPU resources, but if
it generates audit events, the waste is much higher as Steve's tests showed.

But, putting that sysadmin hat back on, once this level of auditing is
enabled, and I as a sysadmin can see in black and white which
programmers are being lazy and which ones are properly implementing
defensive programming in their code instead of just implementing a
"throw it at the kernel and see if it breaks" approach to programming, I
start removing the lazy programmer's code from my system so my logs are
clean and I have more faith in the code on my system.

And really, we've spent more time on this thread than it would take
Lennart to fix PA.  Just a quick stat and check of uid before trying to
remove the stale files and this would all go away.  Sure, your stat and
remove could race, but this is nothing more than a garbage collection
process anyway, so who cares?  We'll just get it next time.

More information about the devel mailing list