NFS mounts on Fedora 17 hang when reading lots of data

Guido Winkelmann guido-fedora-users at unknownsite.de
Fri Sep 28 16:33:06 UTC 2012


Hi,

I am experiencing problems with NFS mounts locking up after some use on a 
Fedora 17 machine. The symptoms are that all processes trying to access any 
file will hang indefinitely. In ps, these processes are listed as being in 
status D. There are no error messages, not even in dmesg.

When I unmount the NFS-share with umount -l (it won't work without the -l), 
the processes will stay stuck but can be killed. In at least one case, I could 
not even mount the share again after doing that. (The mount command would hang 
indefinitely.)
(Indefinitely here means "I have waited several hours, and nothing has 
happened".)

I cannot reliably reproduce the problem, but it seems to be happening more 
often when reading a very large number (100000+) of 1 MB-sized files using a 
large number (1000+) of concurrent requests over the NFS share. I have written 
a script (originally for testing filesystem consistency using SHA1 checksums) 
that does just that.

When the NFS lockup happens, the rest of the client machine keeps working, 
except of course processes trying to access the share. (That includes things 
like df).

The described problem has happened against two very different NFS servers: 
Another Fedora 17 based server running on a Dell R210 II and an EMC VNX 5300.

The NFS protocol has been NFSv3 over TCP in all cases. I haven't yet tried to 
repoduce the problem with UDP. NFSv4 is not really an option. (Because it 
shows weird problems with chown and chmod.)

The kernel version on the client machine is 3.5.4-1.fc17.x86_64.

Does anybody have any idea what might be happening here or what I could do to 
further debug the problem? Should I file a bug report?

Regards,
	Guido



More information about the users mailing list