My home router machine was upgraded in late December to FC3. The machine began experiencing times when it would start having a load average of 1.00 all the time but using top would show no processes using any CPU. The CPU indicator said that it was at 100% in user space. No process was listed as being in device wait either.
I took the machine down and did a chkrootkit 0.44 and some other items to make sure it was not compromised. I then rebooted the box and watched the wire to see if anything strange was coming up to it. Nothing. I then went and turned off various processes that were turned on for small network usage. When I did a 'service nscd off' the box went into a kernel crash that went on for quite some time (looked like an infinite loop).
Turning off nscd has dropped the load average to a standard 0.00 again so I am guessing that there is something it is doing that is causing issues:
nscd-2.3.4-2.fc3 kernel-2.6.9-1.681_FC3 kernel-2.6.9-1.724_FC3
00:00.0 Host bridge: Intel Corp. 82810 DC-100 GMCH [Graphics Memory Controller Hub] (rev 03) 00:01.0 VGA compatible controller: Intel Corp. 82810 DC-100 CGC [Chipset Graphics Controller] (rev 03) 00:1e.0 PCI bridge: Intel Corp. 82801AA PCI Bridge (rev 02) 00:1f.0 ISA bridge: Intel Corp. 82801AA ISA Bridge (LPC) (rev 02) 00:1f.1 IDE interface: Intel Corp. 82801AA IDE (rev 02) 00:1f.2 USB Controller: Intel Corp. 82801AA USB (rev 02) 00:1f.3 SMBus: Intel Corp. 82801AA SMBus (rev 02) 01:08.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10) 01:0c.0 Ethernet controller: 3Com Corporation 3c905C-TX/TX-M [Tornado] (rev 78)
I am having problems getting serial console to work so I do not have a kernel oops at the moment :(.
I have also seen problems with nscd. On a stable FC3 machine with LDAP auth., it refused to login a user when nscd was turned on, it said the user didn't exist. UID#->username mapping worked, though.
Stopping nscd allowed the user to login instantaniously.
Using LDAP auth
man, 10.01.2005 kl. 21.48 skrev Stephen J. Smoogen:
My home router machine was upgraded in late December to FC3. The machine began experiencing times when it would start having a load average of 1.00 all the time but using top would show no processes using any CPU. The CPU indicator said that it was at 100% in user space. No process was listed as being in device wait either.
I took the machine down and did a chkrootkit 0.44 and some other items to make sure it was not compromised. I then rebooted the box and watched the wire to see if anything strange was coming up to it. Nothing. I then went and turned off various processes that were turned on for small network usage. When I did a 'service nscd off' the box went into a kernel crash that went on for quite some time (looked like an infinite loop).
Turning off nscd has dropped the load average to a standard 0.00 again so I am guessing that there is something it is doing that is causing issues:
nscd-2.3.4-2.fc3 kernel-2.6.9-1.681_FC3 kernel-2.6.9-1.724_FC3
00:00.0 Host bridge: Intel Corp. 82810 DC-100 GMCH [Graphics Memory Controller Hub] (rev 03) 00:01.0 VGA compatible controller: Intel Corp. 82810 DC-100 CGC [Chipset Graphics Controller] (rev 03) 00:1e.0 PCI bridge: Intel Corp. 82801AA PCI Bridge (rev 02) 00:1f.0 ISA bridge: Intel Corp. 82801AA ISA Bridge (LPC) (rev 02) 00:1f.1 IDE interface: Intel Corp. 82801AA IDE (rev 02) 00:1f.2 USB Controller: Intel Corp. 82801AA USB (rev 02) 00:1f.3 SMBus: Intel Corp. 82801AA SMBus (rev 02) 01:08.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10) 01:0c.0 Ethernet controller: 3Com Corporation 3c905C-TX/TX-M [Tornado] (rev 78)
I am having problems getting serial console to work so I do not have a kernel oops at the moment :(.
-- Stephen J Smoogen. CSIRT/Linux System Administrator
On Mon, Jan 10, 2005 at 11:14:23PM +0100, Kyrre Ness Sjobak wrote:
I have also seen problems with nscd. On a stable FC3 machine with LDAP auth., it refused to login a user when nscd was turned on, it said the user didn't exist. UID#->username mapping worked, though.
Stopping nscd allowed the user to login instantaniously.
Using LDAP auth
Possibly related is what I saw a while back, there was a local user and a NIS+ user with different uid etc. nsswitch.conf had passwd: files nisplus yet it used the nis+ user until nscd was stopped. clearing up the nscd caches ( rm -f /var/db/nscd/* ) made it work again even with nscd running.
Didn't get around to filing a bug, especially since the nis+ is "somewhat" custom, but nscd invalidating its caches on startup (either rm or nscd -i) might be a good idea nevertheless? Apart from that there still might be something funny in nscd that makes it lose information in the cache?
Pekka Pietikainen wrote:
Possibly related is what I saw a while back, there was a local user and a NIS+ user with different uid etc. nsswitch.conf had passwd: files nisplus yet it used the nis+ user until nscd was stopped. clearing up the nscd caches ( rm -f /var/db/nscd/* ) made it work again even with nscd running.
If the local user was only created shortly before that, this is the expected behavior. The whole point is to cache results. Only when they time out (as controlled in nscd.conf) will the entries be reloaded. And then they are searched in the usual way, not by preferring the service which previously provided the result.
Didn't get around to filing a bug, especially since the nis+ is "somewhat" custom, but nscd invalidating its caches on startup (either rm or nscd -i) might be a good idea nevertheless?
Then deselect "persistent" in nscd.conf for the database. This is one of the big new features of nscd in recent times (you haven't read the release notes, I gather). If you cannot use it because your local user organization is so flaky disable it. It is definitely of benefit to the fast majority of users.
On Tue, Jan 11, 2005 at 12:56:17PM -0800, Ulrich Drepper wrote:
Pekka Pietikainen wrote:
Possibly related is what I saw a while back, there was a local user and a NIS+ user with different uid etc. nsswitch.conf had passwd: files nisplus yet it used the nis+ user until nscd was stopped. clearing up the nscd caches ( rm -f /var/db/nscd/* ) made it work again even with nscd running.
If the local user was only created shortly before that, this is the expected behavior. The whole point is to cache results. Only when they time out (as controlled in nscd.conf) will the entries be reloaded. And then they are searched in the usual way, not by preferring the service which previously provided the result.
It was actually a freshly installed system, the user was "named" so created during the install, after the install it took some time until figuring out named wasn't starting up (permission problems since /var/named was setup for the local user). I'm pretty sure the problem persisted for longer than the 10 min timeout period that nscd had for passwd, but in any case the local user should have been there all the time...
Oh well, those cache files were nuked so there's no way of knowing what the thing was doing and whether the cache timeout was ever reached or not. In any case I'll keep an eye on similar issues and debug more heavily if something like this happens again. Could be just a local configuration issue or a real bug, who knows...
Kyrre Ness Sjobak wrote:
I have also seen problems with nscd. On a stable FC3 machine with LDAP auth., it refused to login a user when nscd was turned on, it said the user didn't exist. UID#->username mapping worked, though.
Stopping nscd allowed the user to login instantaniously.
We had a few people reporting problems but nobody ever was able or willing to provide any data. I use it on all my machines and never had problems.
So, if the problem is reproducible after
service nscd restart
then stop it again, and start nscd by hand with
/usr/sbin/nscd -d -d -d
After this perform the operation which fails and send the output (or better yet, create a bug).
On Tue, Jan 11, 2005 at 12:49:18PM -0800, Ulrich Drepper wrote:
We had a few people reporting problems but nobody ever was able or willing to provide any data. I use it on all my machines and never had problems.
So, if the problem is reproducible after
service nscd restart
then stop it again, and start nscd by hand with
/usr/sbin/nscd -d -d -d
After this perform the operation which fails and send the output (or better yet, create a bug).
Returning to an old issue :) We've been seeing nscd cache expiration problems every now and then like I mentioned a few months back with both passwd and hosts. Has happened often enough that it's not just a local configuration issue
It seems like positive hits just don't time out, and nscd -i is the only thing that clears things up. passwd/hosts "never" get touched (we're using NIS+, customized local install), so the entries don't get invalidated that way, which might be the reason not that many other people are seeing the same problem. Another symptom I've sometimes seen is a DDNS host have an old address for days (until nscd -i hosts). nslookup, which doesn't go through nscd, shows the current address. nscd is running with default timeouts (600 secs for passwd, 3600 for hosts)
Will be happy to get data to track this down. I suppose there's no database dumper that would tell what nscd is actually currently caching and how old the data is?
I've actually ran nscd for quite some time in debug mode, and the only situation it flushes entries is touch /etc/passwd and touch /etc/hosts.
Ah, there's a bugzilla #150748 about the same issue. Added some comments there too.