On Fri, Jul 24, 2015 at 03:26:33PM +0200, Sumit Bose wrote:
Hi,
the attached two patches should fix https://fedorahosted.org/sssd/ticket/2731 . If an object is looked up by a POSIX UID or GID we always assume a multi-domain search and only have a global, i.e. for all domains, negative cache.
Doing a multi-domain search makes sense because there is no such thing as a fully-qualified UID and using fixed ranges for every domain might not be possible in the general case, e.g. in AD forests where the POSIX IDs are managed by AD. We might want to use some information we have in the IPA case, but I think this way some backend data would leak into the responders and there are better ways to fix the general case.
For the typical POSIX calls like getpwuid() and getgrgid() there already is no issue because if a matching object is found it is added to the memory cache and for some time no searches hit the responders.
For other calls, especially the SID-by-ID calls which is exposed by libwbclient-sssd to samba and used regularly here, where there is no memory cache (yet) the current processing is pretty expensive. Because if the ID was not found in the cache of the first domain SSSD will ask the backend of the first domain which causes an LDAP request. If the ID was not found the second domain (or sub-domain) is checked first in the cache and if not found via the backend. If sooner or later a matching ID is found it is save in the cache of the corresponding domain. But the next request for the same ID would cause the same sequence again because we call the backend before checking the caches of the other domains.
The proper solution would be a memory cache for the SID related requests which is tracked https://fedorahosted.org/sssd/ticket/2727 . As a short term fix I made the negative cache for UIDs and GIDs domain aware. Now a second request which comes shortly after the first will see the ID on the negative cache of the other domain and will find the cached entry of the right domain without calling the backends.
The patches look good to me and also to automated tools, CI: http://sssd-ci.duckdns.org/logs/job/19/46/summary.html (Debian failures are unrelated, it's a installation issue on the CI VM)
Coverity didn't find any issue either. Subsequent calls to pysss_nss_idmap.getsidbyid($uid) hit the IPA domain's negcache first and only then DP.
So ACK to the patches.
I just wonder if you considered using the negcache for getgrgid() and getpwnam() requests? If not and we can rely on the memcache, I would prefer to add a short explanation about this in the commit message.