Hi,

> (2022-07-21  7:11:14): [sssd] [svc_child_info] (0x0020): Child [984] ('abba.xx.priv.yy':'%BE_abba.xx.priv.yy') was terminated by own WATCHDOG
  --  this means corresponding process - `sssd_be --domain abba.xx.priv.yy` in this case - was blocked too long on 'something' (longer than 3*timeout - see `man sssd.conf`).

You need to figure out what this operation is. For this enable `debug_level = 9` in [$domain] section of sssd.conf and let this happen again.
Then take the timestamp of '... was terminated by own WATCHDOG' message from sssd.log and spot the last operation before this timestamp in sssd_$domain.log.



On Thu, Jul 21, 2022 at 2:27 PM Rob Crittenden <rcritten@redhat.com> wrote:
>
> cc'ing the sssd users list.
>
> rob
>
> lejeczek via FreeIPA-users wrote:
> > Hi guys.
> >
> > One of the masters started recently to find SSSD dead and says the
> > killer is the WATCHDOG - but I'm not sure about that.
> > From sssd.log:
> > ...
> > ********************** BACKTRACE DUMP ENDS HERE
> > *********************************
> >
> > (2022-07-21  7:11:01): [sssd] [svc_child_info] (0x0020): Child [991]
> > ('pac':'pac') was terminated by own WATCHDOG
> >    *  ... skipping repetitive backtrace ...
> > (2022-07-21  7:11:14): [sssd] [svc_child_info] (0x0020): Child [984]
> > ('abba.xx.priv.yy':'%BE_abba.xx.priv.yy') was terminated by own WATCHDOG
> >    *  ... skipping repetitive backtrace ...
> > (2022-07-21  7:11:14): [sssd] [svc_child_info] (0x0040): Child [9744]
> > ('nss':'nss') exited with code [3]
> > ********************** PREVIOUS MESSAGE WAS TRIGGERED BY THE FOLLOWING
> > BACKTRACE:
> >    *  (2022-07-21  7:11:14): [sssd] [sbus_dispatch_reconnect] (0x0400):
> > Connection lost. Terminating active requests.
> >    *  (2022-07-21  7:11:14): [sssd] [sbus_dispatch_reconnect] (0x4000):
> > Remote client terminated the connection. Releasing data...
> >    *  (2022-07-21  7:11:14): [sssd] [sbus_connection_free] (0x4000):
> > Connection 0x5576314d9180 will be freed during next loop!
> >    *  (2022-07-21  7:11:14): [sssd] [mt_svc_restart] (0x0400):
> > Scheduling service abba.xx.priv.yy for restart 1
> >    *  (2022-07-21  7:11:14): [sssd] [get_provider_config] (0x0100):
> > Formed command '/usr/libexec/sssd/sssd_be --domain abba.xx.priv.yy --uid
> > 0 --gid 0 --logger=files' for provider '%BE_abba.xx.priv.yy'
> >    *  (2022-07-21  7:11:14): [sssd] [start_service] (0x0100): Queueing
> > service abba.xx.priv.yy for startup
> >    *  (2022-07-21  7:11:14): [sssd] [mt_svc_exit_handler] (0x1000):
> > SIGCHLD handler of service nss called
> >    *  (2022-07-21  7:11:14): [sssd] [svc_child_info] (0x0040): Child
> > [9744] ('nss':'nss') exited with code [3]
> > ********************** BACKTRACE DUMP ENDS HERE
> > *********************************
> >
> > (2022-07-21  7:11:14): [sssd] [svc_child_info] (0x0040): Child [9758]
> > ('pac':'pac') exited with code [3]
> >    *  ... skipping repetitive backtrace ...
> > (2022-07-21  7:11:16): [sssd] [svc_child_info] (0x0040): Child [9876]
> > ('nss':'nss') exited with code [3]
> >    *  ... skipping repetitive backtrace ...
> > (2022-07-21  7:11:16): [sssd] [svc_child_info] (0x0040): Child [9877]
> > ('pac':'pac') exited with code [3]
> >    *  ... skipping repetitive backtrace ...
> > (2022-07-21  7:11:20): [sssd] [svc_child_info] (0x0040): Child [9903]
> > ('nss':'nss') exited with code [3]
> >    *  ... skipping repetitive backtrace ...
> > (2022-07-21  7:11:20): [sssd] [monitor_restart_service] (0x0010):
> > Process [nss], definitely stopped!
> > (2022-07-21  7:11:20): [sssd] [monitor_quit] (0x3f7c0): Returned with: 1
> > (2022-07-21  7:11:20): [sssd] [monitor_quit] (0x3f7c0): Terminating
> > [pac][9904]
> > (2022-07-21  7:11:21): [sssd] [monitor_quit] (0x3f7c0): Child [pac]
> > terminated with a signal
> > (2022-07-21  7:11:21): [sssd] [monitor_quit] (0x3f7c0): Terminating
> > [abba.xx.priv.yy][9875]
> > (2022-07-21  7:11:21): [sssd] [monitor_quit] (0x3f7c0): Child
> > [abba.xx.priv.yy] exited gracefully
> > (2022-07-21  7:11:21): [sssd] [monitor_quit] (0x3f7c0): Terminating
> > [sudo][990]
> > (2022-07-21  7:11:21): [sssd] [monitor_quit] (0x3f7c0): Child [sudo]
> > exited gracefully
> > (2022-07-21  7:11:21): [sssd] [monitor_quit] (0x3f7c0): Terminating
> > [ssh][989]
> > (2022-07-21  7:11:21): [sssd] [monitor_quit] (0x3f7c0): Child [ssh]
> > exited gracefully
> > (2022-07-21  7:11:21): [sssd] [monitor_quit] (0x3f7c0): Terminating
> > [ifp][988]
> > (2022-07-21  7:11:21): [sssd] [monitor_quit] (0x3f7c0): Child [ifp]
> > exited gracefully
> > (2022-07-21  7:11:21): [sssd] [monitor_quit] (0x3f7c0): Terminating
> > [pam][987]
> > (2022-07-21  7:11:21): [sssd] [monitor_quit] (0x3f7c0): Child [pam]
> > exited gracefully
> > (2022-07-21  7:11:21): [sssd] [monitor_quit] (0x3f7c0): Terminating
> > [implicit_files][983]
> > (2022-07-21  7:11:21): [sssd] [monitor_quit] (0x3f7c0): Child
> > [implicit_files] exited gracefully
> >
> > This "death" happens randomly, well, to me at least. Can be just after
> > reboot or several hours of uptime.
> > There is more in log files from /var/log/sssd but before I clutter
> > emails with more logs snippets I was hoping some expert can share some
> > thoughts.
> >
> > many thanks, L.
> > _______________________________________________
> > FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
> > To unsubscribe send an email to freeipa-users-leave@lists.fedorahosted.org
> > Fedora Code of Conduct:
> > https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> > List Archives:
> > https://lists.fedorahosted.org/archives/list/freeipa-users@lists.fedorahosted.org
> >
> > Do not reply to spam on the list, report it:
> > https://pagure.io/fedora-infrastructure
> _______________________________________________
> sssd-users mailing list -- sssd-users@lists.fedorahosted.org
> To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org
> Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives: https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.org
> Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure