We are running SUSE 12 SP3 which uses SSSD 1.13.4 which I believe is a LTM version.
Due to the large number of users and groups in our LDAP directory, and the limitations of some legacy Unix systems, we have some large groups that have been broken into “sub-groups” with the same GID but an incremental suffix. I don’t believe this is an uncommon solution, and it has worked fine for many years. There are efforts underway to patch some older systems such that they can handle very large groups so that we can collapse these sub-groups, but it is a slow process and there are a lot of servers.
Recently we upgraded some Linux systems to SUSE 12 SP3 and this has made us transition to using SSSD instead of configuring LDAP in /etc/ldap/conf. In the last few weeks we have encountered an issue related to these groups with the same GID. Most of the time, everything works as before, and for instance “getent group” commands using either GID or (sub-group) name return results. However at times those commands return an empty list and the following error appears in the system log:
sssd[nss]: More groups have the same GID [nnnn] in directory server. SSSD will not work correctly.
(group ID elided in this email per company policy)
Using sss_cache to expire the entire cache, group cache or specific group from cache has no effect. I understand that this expires the entries, not removes them, but subsequent getent calls do not overwrite what was there, the error persists. Stopping SSSD, removing the cache DB and restarting was effective, but this is not a viable solution in production. Since the problem clears itself eventually (only to come back later) I tried various strategies, one of which was to do a “getent group” on every sub-group, and this does clear the problem (until it returns).
Since I discovered this issue on SUSE, others in the company have verified that it also appears in RH 6 and 7. RH 7 is running 1.16.0, so the problem is still present up to that release, though the above error message does not appear in the messages log. Instead there is an error in the sssd_nss.log:
[sssd[nss]] [cache_req_search_cache] (0x0020): CR #1122: Multiple objects were found when only one was expected!
Gareth Beale (bemsid: 45600)
Enterprise High Performance Computing Service
Application Infrastructure Services
Global Information Technology Infrastrucure Services
Need help? http://iticket.web.boeing.com/secure/create.aspx?id=serverhpc / 425-234-0911