Hi,
this proposal might be controversial, but I think a little discussion wouldn’t hurt :-)
tl;dr: I propose we switch the default value of ignore_group_members from False to True by
default
For anyone not intimate with SSSD options, this would appear all groups to be effectivelly
empty. The effect that the end user sees is that the group resolution is very fast,
because only the group object, not the members have to be processed. As part of my dayjob,
I’m involved in triaging RH customer cases and many of them are about performance.
Ignoring the group members is quite often the first step and quite a few people are using
that in production.
The reason I’m proposing this is that many calls, like getgr* issued by different command
line utilities (id, ls -l, …) typically only care about the GID-to-name translation. At
the same time, the list of group members is both inaccurate (because we stop at a deep
enough group nesting) and really not needed except for an admin to quickly see what
members belong to a group. For access control, what is really used is the result of
initgroups/getgroupslist. This is the call where we care about precision, because the list
of groups the user is a member of can only be set during login. But getgr* returns the
list of users who are members of a group and is not really required to be precise.
Returning the full list of group members from getgr* by default is very time consuming and
inefficient. While we did a lot of work on the SSSD side to speed things up, like using
the timestamps cache and only actually save something to the database when the object
changes and we have even more ideas floating around (e.g. don’t parse the whole LDAPResult
at once, but use a lazy parsing and return the attributes on demand), just fetching the
large group objects and traversing the group hierarchy is bound to be very expensive. Even
if SSSD is smart and throws away the data it doesn’t need without a lot of processing, the
directory server side will incur a heavy load when returning the group members.
In an ideal world, I would prefer to follow up on a suggestion made by some Suse engineers
several years ago (I’m sorry I can no longer find the link nor do I remember their
names..) which was to add a getgrnam2/getgrgid2 call to libc, which would only provide the
GID-to-name translation and patch popular applications in distributions to use that
instead of getgrnam/gegrgid. But this is such an uphill battle that I don’t think it’s
very realistic to implement.
Of course, we would have to both document and make it obvious from the debug logs why are
SSSD groups coming back empty, but I think the performance benefit would outweight the
confusion.
For some additional details and discussion, please see e.g.
https://pagure.io/389-ds-base/issue/49951