[SSSD-users] Re: sssd responders fail regularly on busy servers

Wednesday, 23 March 2016

On (23/03/16 15:49), Patrick Coleman wrote:
...
Hi,

We run sssd to bind a number of machines to LDAP for auth. On a subset
of these machines, we have software that makes several thousand IPv6
route changes per second.

Recently, we found that on these hosts the sssd_nss responder process
fails several times a day[1], and will not recover until sssd is
restarted. strace[2] of the main sssd process indicates that sssd is
receiving many, many netlink messages - so many, in fact, that sssd
cannot process them fast enough and is receiving ENOBUFS from
recvmsg(2).

The messages that are received seem to get forwarded[3] to the sssd
responders over the unix socket and flood them until they fail.

From what I can see, the netlink code in
src/monitor/monitor_netlink.c:setup_netlink() subscribes to netlink
notifications with the aim of detecting things like wifi network
changes. This isn't something we'd find useful on our servers and
seems to have performance implications - is there any easy way of
turning off this functionality in sssd that I've missed?

We see this issue running sssd 1.11.7.

Cheers,

Patrick

1. The failures look something like this. I have replaced our sss
domain with "ourdomain"
/var/log/sssd/sssd_nss.log

(Tue Mar 22 02:58:01 2016) [sssd[nss]] [accept_fd_handler] (0x0100):
Client connected!
(Tue Mar 22 02:58:01 2016) [sssd[nss]] [nss_cmd_initgroups] (0x0100):
Requesting info for [systemuser] from [<ALL>]
(Tue Mar 22 02:58:01 2016) [sssd[nss]] [nss_cmd_initgroups_search]
(0x0100): Requesting info for [systemuser@ourdomain]
(Tue Mar 22 02:59:04 2016) [sssd[nss]]
[nss_cmd_initgroups_dp_callback] (0x0040): Unable to get information
from Data Provider
Error: 3, 5, (null) The real error is in sssd_$domain.log

neither sssd.log nor sssd_nss.log will help you.

@see https://fedorahosted.org/sssd/wiki/Troubleshooting

LS

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

[SSSD-users] Re: sssd responders fail regularly on busy servers