Hi, folks,
I am trying to force reloads against LDAP and failing terribly with sss_cache -E. I keep getting the same long, long, long out-of-date information.
Is there anything more thorough than sss_cache -E to clear it out?
Thanks,
John A
Hi,
what kind of information (user, group, anything else)?
How do you test? Are you sure mem-cache isn't at play? (You can test prepending `SSS_NSS_USE_MEMCACHE=NO ...` before the command).
On Tue, Jan 21, 2025 at 8:44 PM Johnnie W Adams via sssd-users < sssd-users@lists.fedorahosted.org> wrote:
Hi, folks,
I am trying to force reloads against LDAP and failing terribly withsss_cache -E. I keep getting the same long, long, long out-of-date information.
Is there anything more thorough than sss_cache -E to clear it out?Thanks,
John A-- John Adams Senior Linux/Middleware Administrator | Information Technology Services +1-501-916-3010 | jxadams@ualr.edu | http://ualr.edu/itservices *UA Little Rock*
Reminder: IT Services will never ask for your password over the phone or in an email. Always be suspicious of requests for personal information that come via email, even from known contacts. For more information or to report suspicious email, visit IT Security
http://ualr.edu/itservices/security/.
sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.o... Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
For what it's worth we've had good success with "sssctl cache-remove" to clear things up. SSSD has a very very good cache, sometimes too good. 🙂
Nik Conwell Boston University
________________________________ From: Alexey Tikhonov via sssd-users sssd-users@lists.fedorahosted.org Sent: Tuesday, January 21, 2025 3:43 PM To: End-user discussions about the System Security Services Daemon sssd-users@lists.fedorahosted.org Cc: Johnnie W Adams jxadams@ualr.edu; Alexey Tikhonov atikhono@redhat.com Subject: [SSSD-users]Re: Is there anything more effective than sss_cache -E?
Hi,
what kind of information (user, group, anything else)?
How do you test? Are you sure mem-cache isn't at play? (You can test prepending `SSS_NSS_USE_MEMCACHE=NO ...` before the command).
On Tue, Jan 21, 2025 at 8:44 PM Johnnie W Adams via sssd-users <sssd-users@lists.fedorahosted.orgmailto:sssd-users@lists.fedorahosted.org> wrote: Hi, folks,
I am trying to force reloads against LDAP and failing terribly with sss_cache -E. I keep getting the same long, long, long out-of-date information.
Is there anything more thorough than sss_cache -E to clear it out?
Thanks,
John A
-- John Adams Senior Linux/Middleware Administrator | Information Technology Services +1-501-916-3010 | jxadams@ualr.edumailto:jxadams@ualr.edu | http://ualr.edu/itservices UA Little Rock
Reminder: IT Services will never ask for your password over the phone or in an email. Always be suspicious of requests for personal information that come via email, even from known contacts. For more information or to report suspicious email, visit IT Securityhttp://ualr.edu/itservices/security/.
-- _______________________________________________ sssd-users mailing list -- sssd-users@lists.fedorahosted.orgmailto:sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.orgmailto:sssd-users-leave@lists.fedorahosted.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.o... Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
On Tue, Jan 21, 2025 at 2:44 PM Johnnie W Adams via sssd-users sssd-users@lists.fedorahosted.org wrote:
I am trying to force reloads against LDAP and failing terribly with sss_cache -E. I keep getting the same long, long, long out-of-date information.
Is there anything more thorough than sss_cache -E to clear it out?
We regularly run into a similar issue with the AD provider, where sssd will cease to update the membership list of an already-cached AD group.
When the issue occurs, neither restarting sssd, nor using "sssctl cache-expire" will make sssd discard its stale group information. The only thing that we have found that will solve the problem is to:
1. stop sssd 2. remove all files and directories in /var/lib/sss not contributed by an RPM package 3. restart sssd
The problem is that this operation is only safe to do if one can guarantee that the host is online with the AD provider (network is up, any necessary VPNs are active).
One time I caught a host in this state, turned on full debugging in the sssd logs, and observed that sssd was failing to store entries with ENOENT (No such file or directory). Maddeningly, it was seeing all of the current members of the group, but any member that hit the ENOENT error was omitted from the list returned to NSS and thus did not appear in the output of "getent group group-name". But unfortunately, I urgently needed to resolve the issue with the host, so I could not debug further. Stopping sssd, nuking all caches files, and restarting sssd made the ENOENT errors go away and caused sssd to return the correct (current) group contents.
We have never seen this issue if sssd starts with a clean cache. This makes me suspicious that corner cases exist in reconciling changes to an AD group with the already-cached version of the group.
(Or else it could just be a plain old cache corruption bug, where the cache becomes corrupted (thus the ENOENT errors), but not severely enough to crash sssd.)
Sumit, if you’re following this thread, if you can provide the specific debugging information you would need to troubleshoot this issue, the next time I catch a host with sssd in this state, I will open a GitHub issue for this and provide the data. (Other than running "getent group problem-group" with full debugging enabled, I’m not sure what specific commands you would want to have executed to help debug the issue.)
Am Tue, Jan 21, 2025 at 08:44:12PM -0500 schrieb James Ralston via sssd-users:
On Tue, Jan 21, 2025 at 2:44 PM Johnnie W Adams via sssd-users sssd-users@lists.fedorahosted.org wrote:
I am trying to force reloads against LDAP and failing terribly with sss_cache -E. I keep getting the same long, long, long out-of-date information.
Is there anything more thorough than sss_cache -E to clear it out?
We regularly run into a similar issue with the AD provider, where sssd will cease to update the membership list of an already-cached AD group.
When the issue occurs, neither restarting sssd, nor using "sssctl cache-expire" will make sssd discard its stale group information. The only thing that we have found that will solve the problem is to:
- stop sssd
- remove all files and directories in /var/lib/sss not contributed by an RPM package
- restart sssd
The problem is that this operation is only safe to do if one can guarantee that the host is online with the AD provider (network is up, any necessary VPNs are active).
One time I caught a host in this state, turned on full debugging in the sssd logs, and observed that sssd was failing to store entries with ENOENT (No such file or directory). Maddeningly, it was seeing all of the current members of the group, but any member that hit the ENOENT error was omitted from the list returned to NSS and thus did not appear in the output of "getent group group-name". But unfortunately, I urgently needed to resolve the issue with the host, so I could not debug further. Stopping sssd, nuking all caches files, and restarting sssd made the ENOENT errors go away and caused sssd to return the correct (current) group contents.
We have never seen this issue if sssd starts with a clean cache. This makes me suspicious that corner cases exist in reconciling changes to an AD group with the already-cached version of the group.
(Or else it could just be a plain old cache corruption bug, where the cache becomes corrupted (thus the ENOENT errors), but not severely enough to crash sssd.)
Sumit, if you’re following this thread, if you can provide the specific debugging information you would need to troubleshoot this issue, the next time I catch a host with sssd in this state, I will open a GitHub issue for this and provide the data. (Other than running "getent group problem-group" with full debugging enabled, I’m not sure what specific commands you would want to have executed to help debug the issue.)
Hi,
I think you are right that the reason might be an issue in the way SSSD determines if a group read from AD needs an updater in the cache or not. Especially for large groups this might be quite time consuming and this was one of the reasons to split out the timestamp data into a separated cache file. To debug this it would be good to have the logs with 'debug_level = 9', the ldapsearch output of 'problem-group' ideally after a kinit with /etc/krb5.keytab to have the same permissions as SSSD and the data and timestamp cache files. Ideally with the following steps:
- safe cache files - call 'getent group problem-group' - call 'sss_cache -E' - safe cache files - call 'getent group problem-group' - safe cache files - call 'sss_cache -E' - restart SSSD - call 'getent group problem-group' - safe cache files
Please note that the data cache might contain password hashes, so you might prefer to send the cache file to me directly.
Thanks for your help.
bye, Sumit
-- _______________________________________________ sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.o... Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
On 1/21/2025 5:44 PM, James Ralston via sssd-users wrote:
When the issue occurs, neither restarting sssd, nor using "sssctl cache-expire" will make sssd discard its stale group information. The only thing that we have found that will solve the problem is to:
- stop sssd
- remove all files and directories in /var/lib/sss not contributed by an RPM package
- restart sssd
The problem is that this operation is only safe to do if one can guarantee that the host is online with the AD provider (network is up, any necessary VPNs are active).
You also have to have a root password (bad practice IMO) and know it as sudo will fail most often once the SSSD cache is borked.
I'm sad to say that I've seen this at every client I've had that has used SSSD. I can't recommend SSSD for this reason. It would be great software except for this problem.
sssd-users@lists.fedorahosted.org