We are running SUSE 12 SP3 which uses SSSD 1.13.4 which I believe is a LTM version.
Due to the large number of users and groups in our LDAP directory, and the limitations of
some legacy Unix systems, we have some large groups that have been broken into
"sub-groups" with the same GID but an incremental suffix. I don't believe
this is an uncommon solution, and it has worked fine for many years. There are efforts
underway to patch some older systems such that they can handle very large groups so that
we can collapse these sub-groups, but it is a slow process and there are a lot of
servers.
Recently we upgraded some Linux systems to SUSE 12 SP3 and this has made us transition to
using SSSD instead of configuring LDAP in /etc/ldap/conf. In the last few weeks we have
encountered an issue related to these groups with the same GID. Most of the time,
everything works as before, and for instance "getent group" commands using
either GID or (sub-group) name return results. However at times those commands return an
empty list and the following error appears in the system log:
sssd[nss]: More groups have the same GID [nnnn] in directory server. SSSD will not work
correctly.
(group ID elided in this email per company policy)
Using sss_cache to expire the entire cache, group cache or specific group from cache has
no effect. I understand that this expires the entries, not removes them, but subsequent
getent calls do not overwrite what was there, the error persists. Stopping SSSD, removing
the cache DB and restarting was effective, but this is not a viable solution in
production. Since the problem clears itself eventually (only to come back later) I tried
various strategies, one of which was to do a "getent group" on every sub-group,
and this does clear the problem (until it returns).
Since I discovered this issue on SUSE, others in the company have verified that it also
appears in RH 6 and 7. RH 7 is running 1.16.0, so the problem is still present up to that
release, though the above error message does not appear in the messages log. Instead there
is an error in the sssd_nss.log:
[sssd[nss]] [cache_req_search_cache] (0x0020): CR #1122: Multiple objects were found when
only one was expected!
Gareth
Gareth Beale (bemsid: 45600)
Enterprise High Performance Computing Service
Application Infrastructure Services
Global Information Technology Infrastrucure Services
Need help?
http://iticket.web.boeing.com/secure/create.aspx?id=serverhpc / 425-234-0911