Thanks for the feedback everyone.
This is a very lightly loaded system with just 3 users ATM and very little going on across the network (just editing code files etc). The problem occurred again yesterday. For about 10 minutes my KDE desktop locked up in 20 second bursts and then the problem went away for the rest of the day. During that time the desktop and server were idle for 98.5% and pings continued fine. A kconsole window doing an "ls /home" every 5 seconds was locked up doing the ls. I had kconsole windows open doing the pings, top's and ls'es and although I couldn't operate the desktop (move virtual desktops etc) the ping and top windows were updating fine. No error messages in /var/log/messages on both systems and the sar stats showed nothing out of the ordinary.
I am pretty sure the Ethernet network is fine including cables, switches Ethernet adapters etc. Pings are fine etc. It just appears that the client programs get a huge (> 20 secs) delayed response to accesses to /home every now and then which points to NFS issues. Most of the system stats counters just give the amount of access, not the latency of an access which is what I need to track down the problem as there are few disk and network accesses going on.
As I said all has been fine on this system until about a month ago and the only obvious changes are the Fedora updates so I wondered if anyone new if there had been changes to the NFS stack recently and/or how to log peak NFS latencies ?
Terry On 26/09/2021 18:06, Roger Heflin wrote:
Make sure you have sar/sysstat enabled and changed to do 1 minute samples.
sar -d will show disk perf. If one of the disks "blips" at the firmware level (working on a hard to read block maybe), the util% on that device will be significantly higher than all other disks so will stand out. Then you can look deeper at the smart data.
sar generically will show your cpu/system time and sar -n DEV will show detailed network traffic, sar -n EDEV will show network errors.
With it set to 1 minute you should be able to detect most blips.
On Sun, Sep 26, 2021 at 10:26 AM Jamie Fargen jamie@fargenable.com wrote:
Are there network switches under your control? It sounds similar to what happens when MTU on the systems MTU do not match or one system MTU is set above the value on the switch ports.
Next time the issue occurs use ping with the do not fragment flag. ex $ ping -m DO -s 8972 ip.address
This example should be the highest value to work in the case of MTU size 9000, there is 28 byte overhead for IPv4 packets.
Second, are you sure no one is attaching to the network and duplicating the MAC address of your NFS server or perhaps the system that is stalled? If the switches are manageable you would have to insure that the MAC addresses are being learned on the correct ports.
-Jamie
On Sun, Sep 26, 2021 at 10:24 AM Tom Horsley horsley1953@gmail.com wrote:
On Sun, 26 Sep 2021 10:26:19 -0300 George N. White III wrote:
If you have cron jobs that use a lot of network bandwidth it may work fine until some network issue causing lots of retransmits bogs it down.
Which is why you should check the dumb stuff first! Has a critter chewed on the ethernet cable to the server? _______________________________________________ users mailing list -- users@lists.fedoraproject.org To unsubscribe send an email to users-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
users mailing list -- users@lists.fedoraproject.org To unsubscribe send an email to users-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
users mailing list -- users@lists.fedoraproject.org To unsubscribe send an email to users-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure