Hi guys.
One of the masters started recently to find SSSD dead and
says the killer is the WATCHDOG - but I'm not sure about that.
From sssd.log:
...
********************** BACKTRACE DUMP ENDS HERE
*********************************
(2022-07-21 7:11:01): [sssd] [svc_child_info] (0x0020):
Child [991] ('pac':'pac') was terminated by own WATCHDOG
* ... skipping repetitive backtrace ...
(2022-07-21 7:11:14): [sssd] [svc_child_info] (0x0020):
Child [984] ('abba.xx.priv.yy':'%BE_abba.xx.priv.yy') was
terminated by own WATCHDOG
* ... skipping repetitive backtrace ...
(2022-07-21 7:11:14): [sssd] [svc_child_info] (0x0040):
Child [9744] ('nss':'nss') exited with code [3]
********************** PREVIOUS MESSAGE WAS TRIGGERED BY THE
FOLLOWING BACKTRACE:
* (2022-07-21 7:11:14): [sssd]
[sbus_dispatch_reconnect] (0x0400): Connection lost.
Terminating active requests.
* (2022-07-21 7:11:14): [sssd]
[sbus_dispatch_reconnect] (0x4000): Remote client terminated
the connection. Releasing data...
* (2022-07-21 7:11:14): [sssd] [sbus_connection_free]
(0x4000): Connection 0x5576314d9180 will be freed during
next loop!
* (2022-07-21 7:11:14): [sssd] [mt_svc_restart]
(0x0400): Scheduling service abba.xx.priv.yy for restart 1
* (2022-07-21 7:11:14): [sssd] [get_provider_config]
(0x0100): Formed command '/usr/libexec/sssd/sssd_be --domain
abba.xx.priv.yy --uid 0 --gid 0 --logger=files' for provider
'%BE_abba.xx.priv.yy'
* (2022-07-21 7:11:14): [sssd] [start_service]
(0x0100): Queueing service abba.xx.priv.yy for startup
* (2022-07-21 7:11:14): [sssd] [mt_svc_exit_handler]
(0x1000): SIGCHLD handler of service nss called
* (2022-07-21 7:11:14): [sssd] [svc_child_info]
(0x0040): Child [9744] ('nss':'nss') exited with code [3]
********************** BACKTRACE DUMP ENDS HERE
*********************************
(2022-07-21 7:11:14): [sssd] [svc_child_info] (0x0040):
Child [9758] ('pac':'pac') exited with code [3]
* ... skipping repetitive backtrace ...
(2022-07-21 7:11:16): [sssd] [svc_child_info] (0x0040):
Child [9876] ('nss':'nss') exited with code [3]
* ... skipping repetitive backtrace ...
(2022-07-21 7:11:16): [sssd] [svc_child_info] (0x0040):
Child [9877] ('pac':'pac') exited with code [3]
* ... skipping repetitive backtrace ...
(2022-07-21 7:11:20): [sssd] [svc_child_info] (0x0040):
Child [9903] ('nss':'nss') exited with code [3]
* ... skipping repetitive backtrace ...
(2022-07-21 7:11:20): [sssd] [monitor_restart_service]
(0x0010): Process [nss], definitely stopped!
(2022-07-21 7:11:20): [sssd] [monitor_quit] (0x3f7c0):
Returned with: 1
(2022-07-21 7:11:20): [sssd] [monitor_quit] (0x3f7c0):
Terminating [pac][9904]
(2022-07-21 7:11:21): [sssd] [monitor_quit] (0x3f7c0):
Child [pac] terminated with a signal
(2022-07-21 7:11:21): [sssd] [monitor_quit] (0x3f7c0):
Terminating [abba.xx.priv.yy][9875]
(2022-07-21 7:11:21): [sssd] [monitor_quit] (0x3f7c0):
Child [abba.xx.priv.yy] exited gracefully
(2022-07-21 7:11:21): [sssd] [monitor_quit] (0x3f7c0):
Terminating [sudo][990]
(2022-07-21 7:11:21): [sssd] [monitor_quit] (0x3f7c0):
Child [sudo] exited gracefully
(2022-07-21 7:11:21): [sssd] [monitor_quit] (0x3f7c0):
Terminating [ssh][989]
(2022-07-21 7:11:21): [sssd] [monitor_quit] (0x3f7c0):
Child [ssh] exited gracefully
(2022-07-21 7:11:21): [sssd] [monitor_quit] (0x3f7c0):
Terminating [ifp][988]
(2022-07-21 7:11:21): [sssd] [monitor_quit] (0x3f7c0):
Child [ifp] exited gracefully
(2022-07-21 7:11:21): [sssd] [monitor_quit] (0x3f7c0):
Terminating [pam][987]
(2022-07-21 7:11:21): [sssd] [monitor_quit] (0x3f7c0):
Child [pam] exited gracefully
(2022-07-21 7:11:21): [sssd] [monitor_quit] (0x3f7c0):
Terminating [implicit_files][983]
(2022-07-21 7:11:21): [sssd] [monitor_quit] (0x3f7c0):
Child [implicit_files] exited gracefully
This "death" happens randomly, well, to me at least. Can be
just after reboot or several hours of uptime.
There is more in log files from /var/log/sssd but before I
clutter emails with more logs snippets I was hoping some
expert can share some thoughts.
many thanks, L.
Show replies by date