Please stop apps going into state D uninterrupted sleep !!

Patrick O'Callaghan pocallaghan at gmail.com
Tue May 8 19:39:31 UTC 2012


On Tue, 2012-05-08 at 19:42 +0100, Andrew Gray wrote:
> Hi 
> 
> Either give use a way to kill a hung cp or rsync  when the VPN goes down
> and they end up is state D uninterrupted sleep or stop apps being able
> to go into uninterrupted sleep !!

It is *not possible* to kill a process in D state. D state can be
defined as "the state which cannot be interrupted". It originally
applied to fast operations which were guaranteed to succeed, e.g. a DMA
transfer. If the DMA interrupt didn't happen, you had problems much more
serious than a hanging app, and there was no reasonable way for the
system to recover automatically without user intervention.

With the introduction of networked filesystems, the desire for
transparency at the app level disguised the fact that networks actually
do fail from time to time, and not in nice ways. I think it was Lamport
who said that you can't tell 'down' from 'disconnected' (did the remote
server crash, or is there a network disconnection? maybe the network is
just congested, there's no way to tell).

Apps which access resources in the real world, including networked
devices, can be written to allow for these suddenly disappearing, or
just not bother. In the former case, the app becomes very much more
complex without ever completely solving the problem, just reducing the
probability of it happening. Note that "the app" doesn't just mean cp or
rsync, it means anything which accesses the filesystem, which can mean
virtually any program under Linux or similar systems. Making resource
failures completely transparent is a seriously hard problem, and doing
it in such a way that the applications programmer never needs to worry
about is probably unsolvable, given that the right thing to do in each
circumstance depends on the semantics that the app is trying to
preserve.

Take a look at the literature on fault-tolerant computing to see how
complex and expensive it is to even approximate this level of
reliability. General purpose systems such as Linux take the view that
transparency and a clean file access model are easier for programmers to
deal with, and in any case many such problems are better resolved by
direct user intervention. If that means a reboot, then so be it. I don't
like it either, but there it is.

> It is unacceptable for a Linux system to have to be CRASH  reboot as the
> mounted CIF mount can't be umounted as it is in use by cp or rsync in
> stated D uninterrupted sleep !!
> 
> This there should be NO uninterrupted sleep !!

I agree. Also, it shouldn't rain on public holidays.

poc



More information about the users mailing list