On (23/03/16 15:49), Patrick Coleman wrote:
We run sssd to bind a number of machines to LDAP for auth. On a subset
of these machines, we have software that makes several thousand IPv6
route changes per second.
Recently, we found that on these hosts the sssd_nss responder process
fails several times a day, and will not recover until sssd is
restarted. strace of the main sssd process indicates that sssd is
receiving many, many netlink messages - so many, in fact, that sssd
cannot process them fast enough and is receiving ENOBUFS from
The messages that are received seem to get forwarded to the sssd
responders over the unix socket and flood them until they fail.
From what I can see, the netlink code in
src/monitor/monitor_netlink.c:setup_netlink() subscribes to netlink
notifications with the aim of detecting things like wifi network
changes. This isn't something we'd find useful on our servers and
seems to have performance implications - is there any easy way of
turning off this functionality in sssd that I've missed?
We see this issue running sssd 1.11.7.
1. The failures look something like this. I have replaced our sss
domain with "ourdomain"
(Tue Mar 22 02:58:01 2016) [sssd[nss]] [accept_fd_handler] (0x0100):
(Tue Mar 22 02:58:01 2016) [sssd[nss]] [nss_cmd_initgroups] (0x0100):
Requesting info for [systemuser] from [<ALL>]
(Tue Mar 22 02:58:01 2016) [sssd[nss]] [nss_cmd_initgroups_search]
(0x0100): Requesting info for [systemuser@ourdomain]
(Tue Mar 22 02:59:04 2016) [sssd[nss]]
[nss_cmd_initgroups_dp_callback] (0x0040): Unable to get information
from Data Provider
Error: 3, 5, (null)
The real error is in sssd_$domain.log
neither sssd.log nor sssd_nss.log will help you.