Hello,
I previously opened an issue on GitHub (https://github.com/SSSD/sssd/issues/7838) regarding this, but I feel the mailing list is more active and I may get a response here quicker. I am investigating an ongoing issue. We are working with vendor support also, but we have still not been able to find a solution.
SSSD Version: 2.9.4 OS: RHEL 7/8/9
SSSD is connected upstream to a RedHat IdM (FreeIPA) cluster.
There seems to be two related issues.
1. SSSD is being killed by watchdog. We think external load from backups is causing this to happen, but it is still unclear for certain. 2. SSSD is not restarted after being killed by Watchdog.
When this happens users become unable to login via SSH. We have tried the following to resolve the issue, but we continue to see SSSD get killed by Watchdog without being restarted.
- Upgrading SSSD to latest version available to RHEL. - Increasing SSSD timeout. - Adding 'Restart=on-failure' to the SSSD systemd unit. - Looking for selinux alerts and setting selinux to permissive. - Disabling third party security services. - Validating the configs. - Reviewing relevant logs. - As a temporary fix we added a cron job to restart the service, but this does not work reliably. I can collect logs, or configs, at request to further this investigation. I am seeking feedback regarding known issues or ways I may continue to look for root cause.
Feedback would be appreciated, thank you in advance.
I've also started seeing this as well on my Alma9 box.
sssd.log attached for what it is worth... It has a "backtrace"?
________________________________________ From: Mark Jackson via sssd-users sssd-users@lists.fedorahosted.org Sent: Thursday, February 20, 2025 3:31 PM To: sssd-users@lists.fedorahosted.org Cc: Mark Jackson Subject: [SSSD-users]SSSD Not Restarted After Being Killed By Watchdog
[EXTERNAL] – This message is from an external sender
Hello,
I previously opened an issue on GitHub (https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_SSSD_sssd_is... ) regarding this, but I feel the mailing list is more active and I may get a response here quicker. I am investigating an ongoing issue. We are working with vendor support also, but we have still not been able to find a solution.
SSSD Version: 2.9.4 OS: RHEL 7/8/9
SSSD is connected upstream to a RedHat IdM (FreeIPA) cluster.
There seems to be two related issues.
1. SSSD is being killed by watchdog. We think external load from backups is causing this to happen, but it is still unclear for certain. 2. SSSD is not restarted after being killed by Watchdog.
When this happens users become unable to login via SSH. We have tried the following to resolve the issue, but we continue to see SSSD get killed by Watchdog without being restarted.
- Upgrading SSSD to latest version available to RHEL. - Increasing SSSD timeout. - Adding 'Restart=on-failure' to the SSSD systemd unit. - Looking for selinux alerts and setting selinux to permissive. - Disabling third party security services. - Validating the configs. - Reviewing relevant logs. - As a temporary fix we added a cron job to restart the service, but this does not work reliably. I can collect logs, or configs, at request to further this investigation. I am seeking feedback regarding known issues or ways I may continue to look for root cause.
Feedback would be appreciated, thank you in advance. -- _______________________________________________ sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org Fedora Code of Conduct: https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.fedoraproject.org_... List Guidelines: https://urldefense.proofpoint.com/v2/url?u=https-3A__fedoraproject.org_wiki_... List Archives: https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.fedorahosted.org_... Do not reply to spam, report it: https://urldefense.proofpoint.com/v2/url?u=https-3A__pagure.io_fedora-2Dinfr...
On Mon, Feb 24, 2025 at 9:25 PM Patrick Riehecky via sssd-users sssd-users@lists.fedorahosted.org wrote:
I've also started seeing this as well on my Alma9 box.
sssd.log attached for what it is worth... It has a "backtrace"?
There are no indications in this log that SSSD components do not restart. You need to enable higher debug level (9) in 'nss' and 'domain' section of sssd.conf to see what it is blocked on.
sssd-users@lists.fedorahosted.org