Problem with mount.nfs4 on latest Fedora 10 updates
Howard Wilkinson
howard at cohtech.com
Fri Aug 14 07:20:22 UTC 2009
Chuck Lever wrote:
>
> On Aug 13, 2009, at 12:50 PM, Howard Wilkinson wrote:
>
>> I have just upgraded a couple of servers from FC9 to FC10 and I am
>> seeing a major problem with mount.nfs4. This occurs when autofs calls
>> the mount program. It then runs at 100% CPU and never terminates.
>>
>> I have VMs that are running similar configuration successfully, so
>> this is something driven by being on bare metal.
>>
>> Kernel is 2.6.27.29-170.2.78.fc10.i686.PAE
>> nfs-utils is nfs-utils-1.1.4-8.fc10.i386
>> autofs is autofs-5.0.3-41.i386
>>
>> Command running is
>>
>> /sbin/mount.nfs4 battleaxe:/ /hosts/battleaxe -s -o
>> rw,nosuid,nodev,tcp,rsize=32768,wsize=32768,hard,intr
>>
>> The autofs mount has worked and the directories under
>> /hosts/battleaxe have been successfully accessed prior to the problem
>> occuring - I suspect this is a remount after and expire has occurred.
>>
>> Anybody seen this before?
>> Anybody know what I can do to get round this? [I am on the way to
>> FC11 but will have to live with FC10 for a while (a week or so)]
>> Any extra information I can acquire to diagnose this?
>>
>> There is nothing in the log files to indicate anything going wrong, I
>> could turn debug on if I knew what to set and which messages to strip
>> once I do.
>
> You could start with "sudo rpcdebug -m nfs -s mount" and look in
> /var/log/messages, or you can strace the running mount command.
>
> --
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com
The mount.nfs4 involvement is a red-herring! It would seem that the
problem is in the kernel - probably in the NFS4 code path. I have now
seem bash, df, and cfagent all exhibit the same failure. The processes
go to 100% and hang up probably in a kernel thread. This happens some
time after the kernel has booted so may still involve something to do
with the autofs timing out the mount.
If I revert the kernel (and nothing else) to the latest FC9 version then
everything goes back to working as it was.
Does anybody recognise these symptoms?
I am going to see if an strace will work, but once the system has failed
it is difficult to get other processes to run to completion.
Howard.
More information about the users
mailing list