We're working on transitioning from RHEL5 to RHEL6 and have run into a bit of a problem with sssd and our ldap integration.
We have a number of groups with a very large number of members, which took excessively long with nss_ldap to retrieve. We implemented the nss_getgrent_skipmembers feature for nss_ldap, got it accepted into the PADL upstream, talked Red Hat into backporting it, and have been using it for years. Basically, this feature allows you to not request the member attribute for a group lookup, the group shows up with no members. However, for the purposes of initgroups, membership is still taken into account and users belong to the correct groups. This works perfectly for our needs.
Unfortunately, we have the exact same issue with sssd:
# time getent group members [...] real 1m29.589s user 0m0.006s sys 0m0.003s
# time getent group students [...] real 0m44.735s user 0m0.007s sys 0m0.002s
# time id -a cpcrudo [...] real 2m14.719s
In addition, any other lookups appear to be blocked during the delay, so the whole system is basically without naming services for minutes.
Is there any way to emulate the behavior of nss_ldap with the nss_getgrent_skipmembers option enabled with sssd? If not, would there be any objection to adding such a feature?
Thanks...
On 10/24/2012 05:49 PM, Paul B. Henson wrote:
We're working on transitioning from RHEL5 to RHEL6 and have run into a bit of a problem with sssd and our ldap integration.
We have a number of groups with a very large number of members, which took excessively long with nss_ldap to retrieve. We implemented the nss_getgrent_skipmembers feature for nss_ldap, got it accepted into the PADL upstream, talked Red Hat into backporting it, and have been using it for years. Basically, this feature allows you to not request the member attribute for a group lookup, the group shows up with no members. However, for the purposes of initgroups, membership is still taken into account and users belong to the correct groups. This works perfectly for our needs.
Unfortunately, we have the exact same issue with sssd:
# time getent group members [...] real 1m29.589s user 0m0.006s sys 0m0.003s
# time getent group students [...] real 0m44.735s user 0m0.007s sys 0m0.002s
# time id -a cpcrudo [...] real 2m14.719s
In addition, any other lookups appear to be blocked during the delay, so the whole system is basically without naming services for minutes.
Is there any way to emulate the behavior of nss_ldap with the nss_getgrent_skipmembers option enabled with sssd? If not, would there be any objection to adding such a feature?
Thanks...
I will leave it to the SSSD gurus to reply about the availability of the similar capability in SSSD but I suspect that the answer is no. SSSD has the ability to use enumeration. In this case you pay the price once in advance and then the cache is updated in the background. That might be a solution for your use case.
Also would you mind trying 1.9.2? It has a bunch of performance improvements and it might be that the results you would get with 1.9.2 are much more acceptable than you see now.
We can consider the feature but where we stand now it is unclear what would be the time frame. You can always get back to nss-ldap but I suspect the version that is available in RHEL6 is different code from PADL and might not have the specific feature you are talking about.
If all options above are exhausted and if you a RHEL customer I suggest you file a case with Red Hat support. That would trigger the proper escalation sequence to make sure the issue is addressed in RHEL and you have a clear migration path from the solution you have to some adequate solution on RHEL6.
On 10/24/2012 3:13 PM, Dmitri Pal wrote:
SSSD has the ability to use enumeration. In this case you pay the price once in advance and then the cache is updated in the background. That might be a solution for your use case.
Honestly, I have zero interest in trying to replicate my entire LDAP directory on each and every server 8-/. It's kinda big ;). Most servers are probably only going to touch pieces of it, improving performance by caching those is good, but trying to replicate it entirely feels wrong. I'm not even sure it would work on our LDAP server, which resource restricts anonymous connections.
Also would you mind trying 1.9.2?
I could try it out to see if it changes anything, but the only reason we run RHEL is for proprietary products that require it, and not running the stock supported rpm's kind of defeats that purpose :). If 1.9.2 is better I'd have to get them to back port whatever changes were relevant to the resolution, which would likely be more trouble than getting them to backport a single feature such as skipping group membership lookups.
It's not intractable to handle our groups in a timely fashion, our Solaris 10 boxes do the same getent lookup in about 2/10 of a second. But considering we don't actually need the membership returned, just not including it seems a quicker solution.
We can consider the feature but where we stand now it is unclear what would be the time frame.
We would implement it ourselves, but I don't want to go to the trouble unless there's some reasonable chance upstream will accept it.
You can always get back to nss-ldap but I suspect the version that is available in RHEL6 is different code from PADL and might not have the specific feature you are talking about.
RHEL6 includes nss-pam-ldapd rather than stock nss_ldap. While forked from nss_ldap, it does indeed not include the skipmembers feature.
If all options above are exhausted and if you a RHEL customer I suggest you file a case with Red Hat support.
I already opened a case; the initial response was:
----- Could you please provide us the output of the strace command so that we can check where exactly the command is taking more time?
You will have to execute below commands and provide us the resulting *.out files:
# strace -o /tmp/getent_group_members.out getent group members -----
Level 1 support has never exactly impressed me 8-/. Disregarding the fact that strace on getent is only going to show a long delay between the connect to /var/lib/sss/pipes/nss and the response (it's not exactly a mystery what's taking so much time), they didn't even request the -t flag so the output won't even include timing information <sigh>.
That would trigger the proper escalation sequence to make sure the issue is addressed in RHEL and you have a clear migration path from the solution you have to some adequate solution on RHEL6.
I find the most effective method to get something fixed through Red Hat support is to go fix it ourselves, get the fix accepted in the upstream project, hand it to them gift wrapped, and then have a lot of patience for the six months to a year it takes to show up in an officially supported RPM...
I see you have an @redhat address :), please don't interpret my grumbling as derogatory to your company; you guys have some great engineers and do a lot of good stuff -- but your support absolutely drives me up the wall.
Thanks much for your input...
On 10/24/2012 08:09 PM, Paul B. Henson wrote:
On 10/24/2012 3:13 PM, Dmitri Pal wrote:
SSSD has the ability to use enumeration. In this case you pay the price once in advance and then the cache is updated in the background. That might be a solution for your use case.
Honestly, I have zero interest in trying to replicate my entire LDAP directory on each and every server 8-/. It's kinda big ;). Most servers are probably only going to touch pieces of it, improving performance by caching those is good, but trying to replicate it entirely feels wrong. I'm not even sure it would work on our LDAP server, which resource restricts anonymous connections.
BTW SSSD connects in an authenticated way.
Also would you mind trying 1.9.2?
I could try it out to see if it changes anything, but the only reason we run RHEL is for proprietary products that require it, and not running the stock supported rpm's kind of defeats that purpose :). If 1.9.2 is better I'd have to get them to back port whatever changes were relevant to the resolution, which would likely be more trouble than getting them to backport a single feature such as skipping group membership lookups.
It's not intractable to handle our groups in a timely fashion, our Solaris 10 boxes do the same getent lookup in about 2/10 of a second. But considering we don't actually need the membership returned, just not including it seems a quicker solution.
I might not have been clear. 1.9.2 is coming in RHEL 6.4 as a supported rpm so if it solves the problem for you it will show up in several months and might save everybody some time and effort.
We can consider the feature but where we stand now it is unclear what would be the time frame.
We would implement it ourselves, but I don't want to go to the trouble unless there's some reasonable chance upstream will accept it.
As was mentioned in other mails we have this request in plans but if 1.9.2 works for you you might not need to do the work.
You can always get back to nss-ldap but I suspect the version that is available in RHEL6 is different code from PADL and might not have the specific feature you are talking about.
RHEL6 includes nss-pam-ldapd rather than stock nss_ldap. While forked from nss_ldap, it does indeed not include the skipmembers feature.
That is what I thought. Thanks for confirming.
If all options above are exhausted and if you a RHEL customer I suggest you file a case with Red Hat support.
I already opened a case; the initial response was:
Could you please provide us the output of the strace command so that we can check where exactly the command is taking more time?
You will have to execute below commands and provide us the resulting *.out files:
# strace -o /tmp/getent_group_members.out getent group members
Level 1 support has never exactly impressed me 8-/. Disregarding the fact that strace on getent is only going to show a long delay between the connect to /var/lib/sss/pipes/nss and the response (it's not exactly a mystery what's taking so much time), they didn't even request the -t flag so the output won't even include timing information <sigh>.
Sigh.
That would trigger the proper escalation sequence to make sure the issue is addressed in RHEL and you have a clear migration path from the solution you have to some adequate solution on RHEL6.
I find the most effective method to get something fixed through Red Hat support is to go fix it ourselves, get the fix accepted in the upstream project, hand it to them gift wrapped, and then have a lot of patience for the six months to a year it takes to show up in an officially supported RPM...
I see you have an @redhat address :), please don't interpret my grumbling as derogatory to your company; you guys have some great engineers and do a lot of good stuff -- but your support absolutely drives me up the wall.
Yes I am from Red Hat and you filing the ticket would help me to help our support to learn how better handle cases like this in future. Thank you for your patience.
Thanks much for your input...
On 10/25/2012 9:41 AM, Dmitri Pal wrote:
BTW SSSD connects in an authenticated way.
I assume you mean it supports connecting with authentication; considering I have provided it no credentials I would be surprised and disconcerted if it was doing anything other than an anonymous bind in my current deployment :).
I might not have been clear. 1.9.2 is coming in RHEL 6.4 as a supported rpm so if it solves the problem for you it will show up in several months and might save everybody some time and effort.
Ah, I did not realize that. I did try 1.9.2 and it does have drastically improved performance which should be sufficient for our deployment. I'll update our ticket and request early access to the 1.9.2 rpm for our prototyping and testing.
As was mentioned in other mails we have this request in plans but if 1.9.2 works for you you might not need to do the work.
For efficiency I'd still prefer just not processing the members, even if the delay to do so isn't unworkable. If no one else is working that RFE we might take a crack at it anyways...
Yes I am from Red Hat and you filing the ticket would help me to help our support to learn how better handle cases like this in future.
The case number is 00727783 if you wanted to take a look at it.
Completely off-topic, but 00728171 is another example of why I bang my head against the wall when I open support tickets -- the mcelog shipped with RHEL6 is broken on amd family 15 CPU's. In the ticket, I clearly state I'm using a family 15 CPU, that it worked perfectly under RHEL5, and provide a link to an upstream patch to fix the problem. Support's response -- a link to a KB article explaining that amd family *16* CPU's are not supported by mcelog 8-/.
Anyway, thanks for your help; much appreciated...
On 10/25/2012 06:38 PM, Paul B. Henson wrote:
On 10/25/2012 9:41 AM, Dmitri Pal wrote:
BTW SSSD connects in an authenticated way.
I assume you mean it supports connecting with authentication; considering I have provided it no credentials I would be surprised and disconcerted if it was doing anything other than an anonymous bind in my current deployment :).
This is strange. By default SSSD prefers strong authentication methods like GSSAPI and you really need to twist its arms to go with anonymous bind. It might not be the default for the LDAP provider (provider is SSSD component that actually talks to DS) though... only for the advanced providers like IPA and AD.
I might not have been clear. 1.9.2 is coming in RHEL 6.4 as a supported rpm so if it solves the problem for you it will show up in several months and might save everybody some time and effort.
Ah, I did not realize that. I did try 1.9.2 and it does have drastically improved performance which should be sufficient for our deployment. I'll update our ticket and request early access to the 1.9.2 rpm for our prototyping and testing.
Great. But patches welcome too ;-)
As was mentioned in other mails we have this request in plans but if 1.9.2 works for you you might not need to do the work.
For efficiency I'd still prefer just not processing the members, even if the delay to do so isn't unworkable. If no one else is working that RFE we might take a crack at it anyways...
Perfect!
Yes I am from Red Hat and you filing the ticket would help me to help our support to learn how better handle cases like this in future.
The case number is 00727783 if you wanted to take a look at it.
Completely off-topic, but 00728171 is another example of why I bang my head against the wall when I open support tickets -- the mcelog shipped with RHEL6 is broken on amd family 15 CPU's. In the ticket, I clearly state I'm using a family 15 CPU, that it worked perfectly under RHEL5, and provide a link to an upstream patch to fix the problem. Support's response -- a link to a KB article explaining that amd family *16* CPU's are not supported by mcelog 8-/.
Anyway, thanks for your help; much appreciated...
Thanks for the info.
On 10/25/2012 06:59 PM, Dmitri Pal wrote:
On 10/25/2012 06:38 PM, Paul B. Henson wrote:
On 10/25/2012 9:41 AM, Dmitri Pal wrote:
BTW SSSD connects in an authenticated way.
I assume you mean it supports connecting with authentication; considering I have provided it no credentials I would be surprised and disconcerted if it was doing anything other than an anonymous bind in my current deployment :).
This is strange. By default SSSD prefers strong authentication methods like GSSAPI and you really need to twist its arms to go with anonymous bind. It might not be the default for the LDAP provider (provider is SSSD component that actually talks to DS) though... only for the advanced providers like IPA and AD.
I just wanted to clarify this, because I think Dmitri is confused about the situation. There is NO requirement for authentication or encryption to perform LDAP id_provider lookups. By default we will use an unencrypted simple bind, because that will work in most cases.
We support using an authenticated connection (and indeed this is the default in the AD and IPA providers), but it is not required unless the LDAP server to which you are connecting has disabled or limited anonymous access. In this situation, you can use simple bind authentication to a known bind DN with a password or you can use a GSSAPI SASL bind to connect to the LDAP server.
This is in contrast to when the SSSD is using LDAP as an *authentication* provider. In this situation, we mandate that the LDAP connection be protected by encryption (one of LDAPS, LDAP+TLS or LDAP+GSSAPI) before we will allow it to perform a simple-bind auth for a user. This is done because the LDAP protocol will transport the simple-bind password in plaintext over the network, thereby making it very easy to snoop passwords. pam_ldap allowed this behavior but SSSD has forbidden it and will simply refuse to even attempt the authentication if the communication channel is not encrypted.
Obviously, if you are using Kerberos or another auth provider, the above is academic.
On 10/24/2012 05:49 PM, Paul B. Henson wrote:
We're working on transitioning from RHEL5 to RHEL6 and have run into a bit of a problem with sssd and our ldap integration.
We have a number of groups with a very large number of members, which took excessively long with nss_ldap to retrieve. We implemented the nss_getgrent_skipmembers feature for nss_ldap, got it accepted into the PADL upstream, talked Red Hat into backporting it, and have been using it for years. Basically, this feature allows you to not request the member attribute for a group lookup, the group shows up with no members. However, for the purposes of initgroups, membership is still taken into account and users belong to the correct groups. This works perfectly for our needs.
Paul, this has been proposed as https://fedorahosted.org/sssd/ticket/1376 which is currently slated for inclusion in SSSD 1.10. You're not the first person to request this functionality, but it just hasn't been implemented yet.
Also, as Dmitri has stated, in the case of initgroups (which can be tested with 'id -G username' SSSD 1.9.x has implemented several very serious performance increases.
Please test with 'id -G' and not just 'id', as the latter doesn't just get the user's group memberships but also retrieves the full contents of each of the groups.
On Thu, Oct 25, 2012 at 05:43:12AM -0400, Stephen Gallagher wrote:
On 10/24/2012 05:49 PM, Paul B. Henson wrote:
We're working on transitioning from RHEL5 to RHEL6 and have run into a bit of a problem with sssd and our ldap integration.
We have a number of groups with a very large number of members, which took excessively long with nss_ldap to retrieve. We implemented the nss_getgrent_skipmembers feature for nss_ldap, got it accepted into the PADL upstream, talked Red Hat into backporting it, and have been using it for years. Basically, this feature allows you to not request the member attribute for a group lookup, the group shows up with no members. However, for the purposes of initgroups, membership is still taken into account and users belong to the correct groups. This works perfectly for our needs.
Paul, this has been proposed as https://fedorahosted.org/sssd/ticket/1376 which is currently slated for inclusion in SSSD 1.10. You're not the first person to request this functionality, but it just hasn't been implemented yet.
Also, as Dmitri has stated, in the case of initgroups (which can be tested with 'id -G username' SSSD 1.9.x has implemented several very serious performance increases.
Please test with 'id -G' and not just 'id', as the latter doesn't just get the user's group memberships but also retrieves the full contents of each of the groups.
There has also been many performance improvements done during the 1.9 development. I would suggest that you try the 1.9 packages to see if the performance is acceptable for you.
On 10/25/2012 4:03 AM, Jakub Hrozek wrote:
There has also been many performance improvements done during the 1.9 development. I would suggest that you try the 1.9 packages to see if the performance is acceptable for you.
I compiled the latest 1.9.2 source release on a test RHEL6 system, and it does indeed have a dramatic performance improvement:
# time getent group members 1.8.0 -- 1m29.589s 1.9.2 -- 0m5.968s
# time getent group students 1.8.0 -- 0m44.735s 1.9.2 -- 0m4.543s
# time id -a cpcrudo 1.8.0 -- 2m14.719s 1.9.2 -- 0m12.508s
While still not as efficient as simply not processing the memberships, that's definitely a usable time; particularly as the delay is only incurred when it is not cached, once cached it returns in fractions of a second.
And I see in a different message I have not replied to yet that 1.9.2 is scheduled to be released with RHEL 6.4, so officially supported relief for this issue is coming...
Thanks much...
On Thu, Oct 25, 2012 at 03:21:26PM -0700, Paul B. Henson wrote:
On 10/25/2012 4:03 AM, Jakub Hrozek wrote:
There has also been many performance improvements done during the 1.9 development. I would suggest that you try the 1.9 packages to see if the performance is acceptable for you.
I compiled the latest 1.9.2 source release on a test RHEL6 system, and it does indeed have a dramatic performance improvement:
# time getent group members 1.8.0 -- 1m29.589s 1.9.2 -- 0m5.968s
# time getent group students 1.8.0 -- 0m44.735s 1.9.2 -- 0m4.543s
# time id -a cpcrudo 1.8.0 -- 2m14.719s 1.9.2 -- 0m12.508s
I assume you ran these with a cold cache?
The other feature of the 1.9.x series is a new in-memory cache which should return results pretty much instantly.
On 10/26/2012 2:25 AM, Jakub Hrozek wrote:
I compiled the latest 1.9.2 source release on a test RHEL6 system, and it does indeed have a dramatic performance improvement:
# time getent group members 1.8.0 -- 1m29.589s 1.9.2 -- 0m5.968s
# time getent group students 1.8.0 -- 0m44.735s 1.9.2 -- 0m4.543s
# time id -a cpcrudo 1.8.0 -- 2m14.719s 1.9.2 -- 0m12.508s
I assume you ran these with a cold cache?
Yes; after each run I shut down the sssd service, deleted the cache files from var/lib/sss/db, and restarted the sssd service.
On 10/25/2012 2:43 AM, Stephen Gallagher wrote:
Paul, this has been proposed as https://fedorahosted.org/sssd/ticket/1376 which is currently slated for inclusion in SSSD 1.10. You're not the first person to request this functionality, but it just hasn't been implemented yet.
Cool. Is anybody actively working/planning to work on this? I notice it is currently owned by "somebody" :). We're fairly hands on, if nobody else is currently working on this we might take a look at it.
Please test with 'id -G' and not just 'id', as the latter doesn't just get the user's group memberships but also retrieves the full contents of each of the groups.
initgroups() isn't a problem; there's no noticeable delay logging in. But I don't think I can reasonably prevent people from running 'id -a' (-G only provides less than informative gids), or even just 'ls -l' on an object owned by one of the large groups...
On 10/25/2012 06:13 PM, Paul B. Henson wrote:
On 10/25/2012 2:43 AM, Stephen Gallagher wrote:
Paul, this has been proposed as https://fedorahosted.org/sssd/ticket/1376 which is currently slated for inclusion in SSSD 1.10. You're not the first person to request this functionality, but it just hasn't been implemented yet.
Cool. Is anybody actively working/planning to work on this? I notice it is currently owned by "somebody" :). We're fairly hands on, if nobody else is currently working on this we might take a look at it.
Patches are very welcome indeed :-)
Please test with 'id -G' and not just 'id', as the latter doesn't just get the user's group memberships but also retrieves the full contents of each of the groups.
initgroups() isn't a problem; there's no noticeable delay logging in. But I don't think I can reasonably prevent people from running 'id -a' (-G only provides less than informative gids), or even just 'ls -l' on an object owned by one of the large groups...
On Thu, Oct 25, 2012 at 03:13:17PM -0700, Paul B. Henson wrote:
On 10/25/2012 2:43 AM, Stephen Gallagher wrote:
Paul, this has been proposed as https://fedorahosted.org/sssd/ticket/1376 which is currently slated for inclusion in SSSD 1.10. You're not the first person to request this functionality, but it just hasn't been implemented yet.
Cool. Is anybody actively working/planning to work on this? I notice it is currently owned by "somebody" :). We're fairly hands on, if nobody else is currently working on this we might take a look at it.
Nobody is working on that as far as I know.
We would certainly welcome patches, feel free to visit the #sssd channel on Freenode. Most of us hang around there -- some of the SSSD developers are in the EU timezones, some are in the US, so there would be a person to answer a question pretty much all the time. Don't hesitate to ask there on the sssd-devel mailing list!
A good place to start would be the "Contribute" page: https://fedorahosted.org/sssd/wiki/Contribute
There's also an assorted collection of developer-oriented tips: https://fedorahosted.org/sssd/wiki/DevRes
sssd-users@lists.fedorahosted.org