df hangs on down nfs server mounted with hard,intr, can't kill

Wade Hampton wade.hampton at nsc1.net
Mon Mar 8 17:01:38 UTC 2004


I have a Fedora server with kernel 2.4.22-1-2163 SMP mounting a
remote solaris server (hence choice of options):

   rsize=32768,ro,hard,intr,tcp,nfsvers=3

When the remote is down or disconnected, a "df" hangs (as expected),
but I can't kill it, even as root or with kill -9.  The docs for mount 
indicate that the INTR option should allow for killing apps mounted 
with HARD.

I also coded a test program that calls statvfs(2) and it hangs in the
on the statvfs(2) call when run against a down NFS server.  It too
can't be interrupted or killed.

My questions are:

1)  Is there a safe and reliable means to check for a down NFS server
     (e.g., is showmount -e <server> safe enough -- it is interruptable
     hence one could wrap this with a timer and it you timeout, the
     server would be down)?

2)  Is the non-interruptable operation (even with INTR option)
     a bug or feature?

3)  Is there a simple kernel call, /proc entry, or similar that can
    be used for this purpose?

4)  Is there a perl module to accomplish this?

This would be very useful for network monitoring, e.g., when the
server goes down and stays down for >1 minute, generate an SNMP
trap and write to a log file.  It would be good if you can't put an SNMP
agent on the server, but only on the client.  It is also useful for writing
a highly reliable client application. 

As I have no control over the remote system, when it went down,
I had to do a hard reboot of my Linux box to stop the hung apps.  This
is a Windows solution, not a Linux solution 

Note, I found this when writing some scripts for MRTG to check
the disk utilization of partitions.  My df's hung so I didn't even get
the proper values for my local partitions.  After a few days, I had
LOTS of hung MRTG apps.

Thanks
--
Wade Hampton





More information about the users mailing list