On Wed, Nov 16, 2016 at 09:33:44AM -0500, Justin Stephenson wrote:
On 11/16/2016 06:19 AM, richard.y.collins@aib.ie wrote:
We are still seeing random intermittent stoppages of the SSSD service.
Following Justin's advice I setup an stap script to catch what was killing sssd and it's related processes.
It turns out sssd is killing itself. See stap output below. Would there be any reason for this?
[Wed Nov 16 10:39:34 2016] SIGTERM was sent to sssd (pid:13831) by sssd uid:0
I found it a bit strange that sssd sends a signal to itself here. It almost looks like a graceful shutdown...
I agree with Justin that sssd debug logs would provide a bit more insight here as well.
Knowing what version you run might help, too.
[Wed Nov 16 10:39:34 2016] SIGTERM was sent to sssd_sudo (pid:13835) by sssd uid:0 [Wed Nov 16 10:39:34 2016] SIGTERM was sent to sssd_sudo (pid:13835) by sssd uid:0 [Wed Nov 16 10:39:34 2016] SIGTERM was sent to sssd_pam (pid:13834) by sssd uid:0 [Wed Nov 16 10:39:34 2016] SIGTERM was sent to sssd_pam (pid:13834) by sssd uid:0 [Wed Nov 16 10:39:34 2016] SIGTERM was sent to sssd_nss (pid:13833) by sssd uid:0 [Wed Nov 16 10:39:34 2016] SIGTERM was sent to sssd_nss (pid:13833) by sssd uid:0 [Wed Nov 16 10:39:34 2016] SIGTERM was sent to sssd_be (pid:13832) by sssd uid:0 [Wed Nov 16 10:39:34 2016] SIGTERM was sent to sssd_be (pid:13832) by sssd uid:0 [Wed Nov 16 10:39:34 2016] SIGTERM was sent to sssd (pid:13831) by sssd uid:0 [Wed Nov 16 10:39:34 2016] SIGTERM was sent to oddjobd (pid:10391) by oddjobd uid:0 [Wed Nov 16 10:39:35 2016] SIGTERM was sent to oddjobd (pid:22422) by oddjobd uid:0 _______________________________________________ sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org
The SSSD debug logs should give some indication of what's happening around the Nov 16 10:39:34 timeframe.
The primary SSSD service sends heartbeat pings to other SSSD services, if there is no response from 3 pings then SSSD will attempt to send a SIGTERM to the service.
Note: https://fedorahosted.org/sssd/wiki/DevelTips#WhenIdebuganSSSDprocessinadebug...
The 'timeout' value in sssd.conf configures the time interval between pings which defaults to 10 seconds but can be increased(it can be added to each section of sssd.conf).
For example:
[sssd] timeout = 60
[nss] timeout = 60
[pam] timeout = 60
[sudo] timeout = 60
[domain/MYDOMAIN] timeout = 60
Kind regards, Justin Stephenson
This is correct and useful information but only valid up to and including sssd-1.13. In 1.14, we switched to 'watchdog' which means the services are no longer watched using ping-pongs from the monitor, but have a built-in timer that resets periodically. If the timer doesn't reset within the 'timeout' interval, the service kills itself.
I amended the DevelTips page to include this information.