NFS mounts on Fedora 17 hang when reading lots of data
Guido Winkelmann
guido-fedora-users at unknownsite.de
Fri Sep 28 16:33:06 UTC 2012
Hi,
I am experiencing problems with NFS mounts locking up after some use on a
Fedora 17 machine. The symptoms are that all processes trying to access any
file will hang indefinitely. In ps, these processes are listed as being in
status D. There are no error messages, not even in dmesg.
When I unmount the NFS-share with umount -l (it won't work without the -l),
the processes will stay stuck but can be killed. In at least one case, I could
not even mount the share again after doing that. (The mount command would hang
indefinitely.)
(Indefinitely here means "I have waited several hours, and nothing has
happened".)
I cannot reliably reproduce the problem, but it seems to be happening more
often when reading a very large number (100000+) of 1 MB-sized files using a
large number (1000+) of concurrent requests over the NFS share. I have written
a script (originally for testing filesystem consistency using SHA1 checksums)
that does just that.
When the NFS lockup happens, the rest of the client machine keeps working,
except of course processes trying to access the share. (That includes things
like df).
The described problem has happened against two very different NFS servers:
Another Fedora 17 based server running on a Dell R210 II and an EMC VNX 5300.
The NFS protocol has been NFSv3 over TCP in all cases. I haven't yet tried to
repoduce the problem with UDP. NFSv4 is not really an option. (Because it
shows weird problems with chown and chmod.)
The kernel version on the client machine is 3.5.4-1.fc17.x86_64.
Does anybody have any idea what might be happening here or what I could do to
further debug the problem? Should I file a bug report?
Regards,
Guido
More information about the users
mailing list