Problem with mount.nfs4 on latest Fedora 10 updates
howard at cohtech.com
Fri Aug 14 07:20:22 UTC 2009
Chuck Lever wrote:
> On Aug 13, 2009, at 12:50 PM, Howard Wilkinson wrote:
>> I have just upgraded a couple of servers from FC9 to FC10 and I am
>> seeing a major problem with mount.nfs4. This occurs when autofs calls
>> the mount program. It then runs at 100% CPU and never terminates.
>> I have VMs that are running similar configuration successfully, so
>> this is something driven by being on bare metal.
>> Kernel is 22.214.171.124-170.2.78.fc10.i686.PAE
>> nfs-utils is nfs-utils-1.1.4-8.fc10.i386
>> autofs is autofs-5.0.3-41.i386
>> Command running is
>> /sbin/mount.nfs4 battleaxe:/ /hosts/battleaxe -s -o
>> The autofs mount has worked and the directories under
>> /hosts/battleaxe have been successfully accessed prior to the problem
>> occuring - I suspect this is a remount after and expire has occurred.
>> Anybody seen this before?
>> Anybody know what I can do to get round this? [I am on the way to
>> FC11 but will have to live with FC10 for a while (a week or so)]
>> Any extra information I can acquire to diagnose this?
>> There is nothing in the log files to indicate anything going wrong, I
>> could turn debug on if I knew what to set and which messages to strip
>> once I do.
> You could start with "sudo rpcdebug -m nfs -s mount" and look in
> /var/log/messages, or you can strace the running mount command.
> Chuck Lever
The mount.nfs4 involvement is a red-herring! It would seem that the
problem is in the kernel - probably in the NFS4 code path. I have now
seem bash, df, and cfagent all exhibit the same failure. The processes
go to 100% and hang up probably in a kernel thread. This happens some
time after the kernel has booted so may still involve something to do
with the autofs timing out the mount.
If I revert the kernel (and nothing else) to the latest FC9 version then
everything goes back to working as it was.
Does anybody recognise these symptoms?
I am going to see if an strace will work, but once the system has failed
it is difficult to get other processes to run to completion.
More information about the users