Here's a weird one: A system at work has (God knows why) a gazillion symlinks directly under / pointing to NFS mountpoints for filesystems (some of which might well have high latency).
If I run "df -l", using the -l option in the apparently vain hope that it might not timeout forever on some NFS mount, it hangs for a long time.
If I try to discover which mountpoint it hangs at by running "strace df -l", it no longer hangs. All the stat calls run fast, and df prints the info on the local filesystems.
There isn't supposed to be any difference running under strace (except maybe for setuid and such), any clues what weird rabbit hole this is going down?
I'm tempted to alias df to strace -o /dev/null df :-).
On 03/04/2015 05:01 PM, Tom Horsley wrote:
Here's a weird one: A system at work has (God knows why) a gazillion symlinks directly under / pointing to NFS mountpoints for filesystems (some of which might well have high latency).
If I run "df -l", using the -l option in the apparently vain hope that it might not timeout forever on some NFS mount, it hangs for a long time.
If I try to discover which mountpoint it hangs at by running "strace df -l", it no longer hangs. All the stat calls run fast, and df prints the info on the local filesystems.
There isn't supposed to be any difference running under strace (except maybe for setuid and such), any clues what weird rabbit hole this is going down?
I'm tempted to alias df to strace -o /dev/null df :-).
Methinks the "df -l" still walks down non-local filesystems but limits the _display_ to local systems. I haven't tried it, but that may be what it's doing.
Odds are that while df is traversing one of those NFS beasties, it is in a "D" state (verifiable by running top and looking at it). If so, it's waiting for some sort of signal (generally "I/O Complete" which may never come if the NFS server isn't answering the bell).
Since strace requires it to report what it's doing, it's getting interrupted a lot, so it doesn't hang. I mean, it's still waiting on the I/O to complete from NFS, but rather than waiting an interminate time, it's getting interrupts (signals) from strace rather than hanging on the one from the NFS system.
Just a wild guess. I stay out of rabbit holes (I'm claustrophobic). ---------------------------------------------------------------------- - Rick Stevens, Systems Engineer, AllDigital ricks@alldigital.com - - AIM/Skype: therps2 ICQ: 22643734 Yahoo: origrps2 - - - - If the enemy's in range...so are you! - ----------------------------------------------------------------------
On 04Mar2015 18:27, Rick Stevens ricks@alldigital.com wrote:
On 03/04/2015 05:01 PM, Tom Horsley wrote:
Here's a weird one: A system at work has (God knows why) a gazillion symlinks directly under / pointing to NFS mountpoints for filesystems (some of which might well have high latency).
If I run "df -l", using the -l option in the apparently vain hope that it might not timeout forever on some NFS mount, it hangs for a long time.
"Walks"? This is df, not du. Consult mount table, do fstats.
[...]
Since strace requires it to report what it's doing, it's getting interrupted a lot, so it doesn't hang. I mean, it's still waiting on the I/O to complete from NFS, but rather than waiting an interminate time, it's getting interrupts (signals) from strace rather than hanging on the one from the NFS system.
I'm fairly certain that strace does not work that way. The traced process is not doing work for the tracer.
Just a wild guess.
I think so too.
Tom:
- _after_ a fast straced df, is un unstraced df slow again? (thinking about cached answers to call, caches in the OS, possibly quite briefly)
- see if the result of strace's -T option is informative.
- since df's output is line buffered on a terminal, the presentation of the lines should tell you where it is hanging.
df makes pleasingly few system calls on a handy RHEL5 host. It opens /etc/mtab and essentially just calls statfs() on each name. This implies that determining localness is done based entirely on the contents of /etc/mtab.
Also, this shows that there are no other system calls between the write() reporting the prior filesystem and the statfs() inquiring about the next, so watching an unstraced on-a-terminal df should pinpoint the place of stallness.
Is it similar on your fedora box?
Cheers, Cameron Simpson cs@zip.com.au
Against stupidity....the Gods themselves contend in vain!
On Fri, 6 Mar 2015 08:27:03 +1100 Cameron Simpson wrote:
- _after_ a fast straced df, is un unstraced df slow again? (thinking about cached answers to call, caches in the OS, possibly quite briefly)
I was yesterday, but today the strace'ed version hung as well and I was able to find and unmount some slow filesystems.
I don't know why "df -l" even stat()s an NFS mountpoint at all, it could certainly look at /proc/mounts and find the local only filesystems and utterly ignore the network systems, but it apparently doesn't do that (because I certainly see the stat calls when I strace it).
Judging from the strace it gathers all the info first, then formats it for output, so when it hangs, it prints nothing.
On 05Mar2015 16:52, Tom Horsley horsley1953@gmail.com wrote:
On Fri, 6 Mar 2015 08:27:03 +1100 Cameron Simpson wrote:
- _after_ a fast straced df, is un unstraced df slow again? (thinking about cached answers to call, caches in the OS, possibly quite briefly)
I was yesterday, but today the strace'ed version hung as well and I was able to find and unmount some slow filesystems.
I don't know why "df -l" even stat()s an NFS mountpoint at all, it could certainly look at /proc/mounts and find the local only filesystems and utterly ignore the network systems, but it apparently doesn't do that (because I certainly see the stat calls when I strace it).
Interesting. The RHEL5 host I tested definitely did _not_ statfs() its NFS mounts when invoked with -l. Its df comes from coreutils-5.97.
NB: the RHEL5 one reads /etc/mtab (a regular file), not /proc/mounts.
Judging from the strace it gathers all the info first, then formats it for output, so when it hangs, it prints nothing.
Charming. Sounds buggy to me.
[...] Ok, fetched coreutils 5.97 (RHEL5 version) and 8.23 (latest version). This diff in df.c is huge, so let's just look at the 8.23 version, which should be close to if not identical with Fedora.
The get_dev() call honours the -l flag, returning immediately without work if the -l is supplied. HOWEVER, the caller, filter_mount_list(), stat()s _every_ mount point regardless. That will be where your slowness is come from. There's even a comment near the top of filter_mount_list suggesting that they know it does excessive work.
So, yes, modern GNU df unconditionally stat()s all your mounted filesystems.
Suggestion: see if this:
df /
stats only /. Repeat for other local filesystems as a test. Then write a tiny shell script for "df -l" that reads /etc/mtab and reports only non-NFS mounts.
Cheers, Cameron Simpson cs@zip.com.au
Share your knowledge. It is a way to achieve immortality. - The Dalai Lama
On Fri, 6 Mar 2015 19:08:44 -0500 Tom Horsley wrote:
Yep, giving it an argument definitely only look at that one filesystem. I've checked older systems and the -l option really did work once upon a time, avoiding all the stats of everything.
And now I've submitted this bug: