Please stop apps going into state D uninterrupted sleep !!

Wed May 9 13:01:34 UTC 2012

On Wed, 2012-05-09 at 13:52 +0930, Tim wrote:
> On Tue, 2012-05-08 at 16:02 -0430, Patrick O'Callaghan wrote:
> > As I tried to explain, rewriting a couple of apps is not going to hack
> > it. The apps don't *know* they're using a networked filesystem,
> > they're just accessing files. They could find out and try to take
> > measures, but then what about all the other apps that also write
> > files? Rewrite tar, cpio, dd, cat, ...?
> >  
> > The price of treating a networked fs as equivalent to a local one is
> > that you get screwed when it doesn't behave like a local one. Dealing
> > with this in a coherent and consistent way is hard. See the literature
> > on distributed filesystems. The semantics of an NFS system are *not*
> > the same as a local system. We brush this under the carpet most of the
> > time because it usually works, but sometimes the differences bite.
> 
> And thinking out loud...  In Linux, when anything wants file system
> access, does it directly access the file system, or does it ask the
> system to access it?

How can a program possibly access the file system without being mediated
by the system? "File systems" are an abstraction maintained by the
system and programs have no direct access to the media that store them
(using /dev/whatever is just a lower-level abstraction).

> If it's direct access, then I can see that you'd need to change every
> program that wants access.  But if everything asks the system to access
> the drive, then you have the potential to change how the system works,
> solving the problem in (mostly) one place.

Yes, hypothetically you could rewrite the kernel to support a different
set of abstractions (not as simple as adding a new file system since you
have to deal with the VFS layer which covers all of them). Then rewrite
the application programs so they understand the new abstractions, i.e. a
new API. It's not clear that what you would end up with could still be
called Linux, and better make the new API an extension of the old one or
you won't get any users, but sure. 

> i.e. The ability to set more reasonable timeout periods (seconds, not
> minutes or hours).

The overwhelming majority of programs don't deal in timeouts of any
length. Programmers prefer a simple file access abstraction: if I can
open the file, I can access it until I close it. It's worth noting that
the clean file model (no "access methods", no "fixed versus variable
length record structure", no "character versus binary" files, no "end of
file mark", etc.) was an important reason for the original success of
Unix, without which we wouldn't even be here.

> And for the system to report access success or
> failure to whatever wanted to access the drives, and that accessing
> program would have to accept failure (this part being a problem that has
> to be implemented in each application - though they should already have
> failure handling built in, unless programmed by a fool).

Most of the commonly used programs deal with simple failures such as
non-existent or protected files, out of disk space etc. Few if any are
written to deal with the file suddenly disappearing in the middle of the
access.

> That'd prevent the infinite waits for a non-available file system, and
> deal with programs thinking they're writing files when they're really
> dumping data nowhere.

How do you know they're infinite? The answer is that you don't, you're
just guessing (see the recent "down vs disconnected" discussion). In
practice, "infinite" means "until the user's patience runs out". One
user's "infinite" wait is another user's slow system. The low level
network ops are already timing out and retrying and at some point they
may work, but there's no way to know.

I strongly recommend Jerry Saltzer's classic paper "End to End Arguments
in System
Design" (http://web.mit.edu/Saltzer/www/publications/endtoend/endtoend.pdf). In some circumstances, the lower levels *cannot* resolve every problem. No matter how cleverly you design it, there will always be cases when the abstraction of a reliable network breaks down and you have to deal with messy reality at the only level where it's possible to make an intelligent decision. The use of NFS soft mounts is an example where the user can intervene and deal with the consequences, but D state hangs are not confined to network file systems.

poc