Can't kill hung remote CP copy

Patrick O'Callaghan pocallaghan at gmail.com
Fri Apr 13 19:31:03 UTC 2012


On Fri, 2012-04-13 at 20:09 +0100, Andrew Gray wrote:
> On Fri, 2012-04-13 at 15:22 +0100, Andrew Gray wrote:
> > On Fri, 2012-04-13 at 15:59 +0200, suvayu ali wrote:
> > > On Fri, Apr 13, 2012 at 15:22, Kevin Martin <kevintm at ameritech.net> wrote:
> > > >
> > > >
> > > > On 04/13/2012 05:55 AM, suvayu ali wrote:
> > > >> On Fri, Apr 13, 2012 at 11:47, Reindl Harald <h.reindl at thelounge.net> wrote:
> > > >>>> How can I kill the broken cp operation ?
> > > >>> killall -s SIGKILL cp
> > > >> This might not work always. I have faced similar issues with processes
> > > >> waiting to access a filesystem over the network. In these cases if there
> > > >> is a problem with the network it might get into an UNINTERRUPTIBLE SLEEP
> > > >> since it is waiting for I/O. The only way to get rid of these processes
> > > >> is to wait or reboot. In my case this was a tape drive over a network
> > > >> filesystem.
> > > >>
> > > >> The OP can check if this is indeed the case by doing
> > > >>
> > > >>   $ ps uf
> > > >>
> > > >> If the "cp" process is in UNINTERRUPTIBLE SLEEP, the STATE of the
> > > >> process should be D. If not then you can ignore my comment.
> > > >>
> > > > What happens if you ifdown the nic (if you are on the console
> > > > obviously)?  Would that allow the cif mount and/or the cp to become
> > > > available for umount/kill?
> > > 
> > > That is a good question. I don't know what would happen then. I guess if
> > > the filesystem implementation is smart enough to return an error when
> > > the network goes down, then the I/O wait is over and the application
> > > gets file read error of some kind and "wakes up" from its
> > > UNINTERRUPTIBLE SLEEP. But then, this is just a hypothesis which I
> > > cannot test (I do not have admin privileges to test this).
> > > 
> > > -- 
> > > Suvayu
> > > 
> > > Open source is the future. It sets us free.
> > 
> > Thanks for your help
> > 
> > I will try all the suggestions next time, including disabling the NIC 
> > to see if it free's the CP or allows umount ?
> > 
> > 
> I have had another cp hung, with NIC disabled:-
> 
> killall -s SIGKILL cp     Doesn't kill the hung cp 
> 
> ps -uf  show the cp is in STATE  D  hence UNINTERRUPTIBLE SLEEP
> 
> So the only thing to do is restart though it will hang going down trying
> to umount the CIF mount  held my the hung cp. 
> Only thing then is to then hit the system reset and FORCE restart
> 
> Again how do you kill a  cp in UNINTERRUPTIBLE SLEEP !!!

You said it yourself: force restart. Uninterruptible means what it says.
It doesn't mean "high priority" or "only killable by root". It means
"the process is waiting on something which has to happen and there's no
Plan B if it doesn't happen". Needless to say, network disconnection or
latency is a fact of life, but if it wasn't anticipated by the designer
then you're screwed.

NFS allows you to "soft-mount" remote filesystems to avoid this sort of
problem. Not sure about CIF though.

Some background: http://lwn.net/Articles/288056/

poc



More information about the users mailing list