Bad file access on the rise

Mon Jun 10 15:00:28 UTC 2013

On 06/10/2013 04:43 AM, Lennart Poettering wrote:
> On Sun, 09.06.13 11:05, Doug Ledford (dledford at redhat.com) wrote:
> 
>> The audit system is just a more modern version of that same thing.  And
>> the second you put any sort of exception into the audit rules, then you
>> have to verify that the exception can never be used to circumvent the
>> legitimate purpose of the rule.  So, if we put an exception into the
>> system so that PulseAudio can open these shm files and not be audited,
>> we would have to prove conclusively that no other application can ever
>> use that exception to hide their tracks (a rather difficult task) or
>> risk loosing some of our security certifications.  At least, that's my
>> understanding of how these security certifications work, Steve can
>> verify this.
> 
> You know, audit has so many holes anyway, it's super easy to circumvent
> it entirely for any attacker.

That depends entirely on the attack in question.

> For example, if I never want to show up in
> audit logs about forbidden file accesses I could write a tiny
> preloadable library that replaces open()/fopen() by stat() or access()
> right before open()/fopen().

Not necessarily.  Lets assume that the attack in question is a local
exploit, that it involves a buffer overflow, and that the item you have
to overflow is, tada, a filename.  Because of stack randomization the
exact location of the overflow changes from run to run, so you brute
force the attack by hitting the program over and over again with a known
offset that will work eventually when the right stack offset is
selected.  But for each failed run, you still generate an audit event
because the actual exploit itself involves actually calling open() on
that overflowed variable.

Obviously, this is a contrived example, but my point is that your claim
that audit events can be trivially avoided simply is not always true.
Whether or not it is possible depends entirely on where the exploit
actually lies.

> Since stat()/access() are generally not
> audited this allows me to completely evade any auditing (of course it
> would still be ugly and racy as hell, but why would a hacker
> care?). That this is possible you are admitting yourself by suggesting
> me to do this for the special case of PA. Now, audit of course logs more
> than just failed open()s, but I'd be almost willing to bet you that I
> can easily find a way to circumvent generation of almost any audit
> message in the system.
> 
> audit hooks into various subsystems of the OS. It makes assumptions
> about why people call certain interfaces, but these assumption are
> frequently wrong. And it assumes that people won't hide their intentions
> when using these APIs. It assumes bad file accesses were something
> unexpected in all cases, and it expects that people which want to steal
> all data they can will always try to open the files directly. And in
> both cases it is wrong. As the case of PA shows.
> 
> Now, this fuzziness of audit doesn't really make it a useless tool, far
> from that, but it does make clear that our APIs are not designed with
> audit and only audit in mind. Our APIs are usually designed to be
> race-free, fast, simple, atomic, secure and a lot of other things in
> mind, but auditability is really something that never was on the
> table. Because for that they'd have to declare the intention why people
> call these functions, and our APIs coud not have been this redundant.
> 
> What to make of this? Well, audit has to deal with the fact that its
> data is incomplete in some areas, and incldues too much information in
> others. Hence its emphasis should be on making the best of its dataset
> but not assume too much about it. However, Steve is kinda assuming he
> could rearrange his dataset instead. But that's simply not feasible. Not
> feasible because of the size of our codebase, not feasible with the
> current APIs, and simply because for developers correctness,
> race-freeness, simplicity is more important.
> 
>> audit subsystem *must* be adding the burden of proving that your fix in
>> the audit subsystem can not be abused by malicious hackers to help
>> subvert your system (or to hide the evidence of their
>> subversion/attempts).  
> 
> Ncie idea, but that's precisely the problem. Audit can be circumvented
> to easily anyway (see above),

And see my above where I point out that the circumventability is
constrained by the locality of the fault being (or attempting to be)
taken advantage of.

> this is definitely not something to
> check. In our current OS it's the job of the audit guys to make their
> reporting tools useful to deal with its incomplete/redundant dataset
> rather than the one of the rest of the OS developers to generate audit
> data that is perfect in the eyes of the audit guys, at the expense of
> code correctness, race-freenes and simplicity.
> 
>>> Well, that's clearly not the case in the situation we're actually
>>> discussing: the author of one of the pieces of software you audited says
>>> he knows about the failed syscalls and does not think they're a problem.
>>
>> To answer Steve's question: because the kernel is the ultimate arbiter
>> of what's allowed and what isn't, so it's easier and quicker to not
>> bother with checking the legitimacy of your options and simply allow the
>> kernel to do it for you.  It's lazy, but not wrong.  It is, however,
> 
> It's not lazy. It's the only correct thing to do. Reimplementing the
> kernel's security checks is nearly impossible. Capabilities, file ACLs,
> multiple uids, security frameworks make it incredibly hard to correctly
> guess from userspace whether the kernel will grant or deny file
> access. And even if you write complex code for this that covers all
> current security mechanisms in place, you can bet that this will be
> out-of-date in a year or two when the next security technology comes along.

It is lazy.  It is not the only correct thing to do.  An attempt to open
a file may return EPERM, and if all you are doing is calling perror()
afterwards, your error reporting is terse and in some cases downright
unhelpful.  A bit of pre-checking in your code can allow you to detect
some of these EPERM situations and not only report that the permissions
are off, but why and how to fix them.  This isn't second guessing the
kernel, and you don't have to implement all of the checks that the
kernel does in order to still provide useful debugging information to
the user in the event of failure.

And you don't even have to let these pre-checks alter your actual
attempt one iota.  For example, in this particular case, you could do a
stat on the file ahead of time, when you see that it's owned by another
uid() you could log to syslog at level info something like "Shared
memory segment is stale, attempting to remove, but uids don't match" and
then go ahead and try anyway, knowing you are likely to fail (but maybe
there is a special circumstance that will allow it to succeed, who
knows, so you try anyway), it does fail, it creates an audit event, but
now you have a corresponding info level log event to match the audit
event.  For a sysadmin trying to track down the source of audit events,
this just went from being a headache to track down to an automatic pass
and something that can be safely ignored.

This all stems from the fact that the kernel may check 20 different
things about your syscall request, but only have 5 suitable error codes
it can use on return.  The kernel will not print out a verbose message
about why your attempt failed (to do so would create a user space DoS
against the kernel by filling log buffers trivially).  The kernel can
not pass back finer grained error messages (that API has been in stone
for years).  So, as a user space programmer, if you want more detailed
causes for failures, it is up to you to check for them.  This has value
to an end user, even if it isn't a requirement of all user space code.
So I stand by what I said: just passing the call to the kernel and
bailing without doing any checks on your own is lazy.  You provide
something of value to end users by pre-checking your inputs to the
kernel and getting more fine grained failure analysis, even if you still
want to let the kernel be the final arbiter of that failure.

>> And really, we've spent more time on this thread than it would take
>> Lennart to fix PA.  Just a quick stat and check of uid before trying to
>> remove the stale files and this would all go away.  Sure, your stat and
>> remove could race, but this is nothing more than a garbage collection
>> process anyway, so who cares?  We'll just get it next time.
> 
> Yeah, but I don't do hacks like that.

What I suggested above is hardly a hack.