I'm not sure if artificially trimming the group list is a good idea. It wouldn't work for everyone and I would be wary of breaking access control mechanisms.
Noted. And yes I agree this (non-mandatory) config option wouldn't be useful for everyone, it's just something that fixes my particular problem (reduces ssh login times from 30 seconds to <5).
I may have to write my own patch and apply it to the SRPM as each official version of SSSD is released. It won't be supported by Red Hat obviously but my users won't be complaining about slow login times anymore. So partial win. :)
Just thought I'd contribute my results in case this helps with your investigation of the larger problem. I assume there are other organisations with huge AD/LDAP directories that are having similar issues with ssh authentication times.
I've finished my local patch and added a config option called: ldap_rfc2307bis_initgroups_filter
If not specified, sssd just reverts to normal behaviour (cn=*) during the initgroups run.
With no ldap_rfc2307bis_initgroups_filter:
# time ssh myhost groups xxxxdm xxxxdef xxxxgmt xxxx002 xxxx003 xxxxp xxxx001 xxxx002 xxxxt xxxxp xxxxange xxxxra xxxxb2 xxxxp xxxxd xxxxt xxxxp xxxxp xxxxp xxxxp xxxxd xxxxd xxxxp xxxxd xxxxd xxxxd xxxxp xxxxp xxxxd xxxxd xxxxp xxxxt xxxxd xxxxlemr xxxxp xxxxd xxxxp xxxxp xxxxd xxxxt xxxxd xxxxp xxxxd xxxxd xxxxt xxxxp xxxxt xxxxp xxxxd xxxxd xxxxt xxxxp xxxxd xxxxu xxxxp xxxxp xxxxp xxxxp xxxxd xxxxp xxxxp xxxxu xxxxp xxxxp xxxxt xxxxp xxxxd xxxxd xxxxt xxxxp xxxxd xxxxt xxxxt xxxxd xxxxt xxxxp xxxxp xxxxi xxxxd xxxxd xxxxp xxxxd xxxxp xxxxp xxxxd xxxxd xxxxp xxxxp xxxxd xxxxp xxxxd xxxxp xxxxd xxxxp xxxxp xxxxp xxxxp xxxxd xxxxd xxxxd xxxxd xxxxp xxxxp xxxxp xxxxd xxxxd xxxxd xxxxd xxxxd xxxxp xxxxp xxxxd xxxxd xxxxd xxxxd xxxxd xxxxd xxxxp xxxxp xxxxd xxxxp xxxxd xxxxd xxxxp xxxxd xxxxd xxxxd xxxxd xxxxd xxxxp xxxxd xxxxd xxxxd xxxxd xxxxd xxxxd xxxxd xxxxp xxxxd xxxxd xxxxp xxxxt xxxxp xxxxd xxxxd xxxxp xxxxd xxxxd xxxxd xxxxp xxxxd xxxxd
real 0m48.47s user 0m0.15s sys 0m0.02s
With ldap_rfc2307bis_initgroups_filter = (|(cn=xxxrd)(cn=xxxxp)(cn=xxxxd))
# time ssh myhost groups xxxxdm xxxxgmt xxxxd xxxxp xxxxd
real 0m5.11s user 0m0.15s sys 0m0.03s
This hack will have to do until a better solution is found. I'm hoping the fixes coming in 1.7.0 will do the trick. :)
Thanks to everyone who helped me get to this point.
Best regards,
Tim Gollschewsky.
This e-mail is sent by Suncorp Group Limited ABN 66 145 290 124 or one of its related entities "Suncorp". Suncorp may be contacted at Level 18, 36 Wickham Terrace, Brisbane or on 13 11 55 or at suncorp.com.au. The content of this e-mail is the view of the sender or stated author and does not necessarily reflect the view of Suncorp. The content, including attachments, is a confidential communication between Suncorp and the intended recipient. If you are not the intended recipient, any use, interference with, disclosure or copying of this e-mail, including attachments, is unauthorised and expressly prohibited. If you have received this e-mail in error please contact the sender immediately and delete the e-mail and any attachments from your system.
On Mon, 7 Nov 2011, GOLLSCHEWSKY, Tim wrote:
Just thought I'd contribute my results in case this helps with your investigation of the larger problem. I assume there are other organisations with huge AD/LDAP directories that are having similar issues with ssh authentication times.
Out of interest, what performance do you get in those two cases if you symlink cache_default.ldb to /dev/shm/cache_default.ldb?
jh
Just thought I'd contribute my results in case this helps with your investigation of the larger problem. I assume there are other organisations with huge AD/LDAP directories that are having similar issues with ssh authentication times.
Out of interest, what performance do you get in those two cases if you symlink cache_default.ldb to /dev/shm/cache_default.ldb?
I've re-run my tests, with and without the symlink to /dev/shm:
No symlink: Time Size of cache.ldb With ldap_rfc2307bis_initgroup_filter 6.43s 7553024 WithOUT ldap_rfc2307bis_initgroup_filter 43.95s 15974400
With symlink: Time Size of cache.ldb With ldap_rfc2307bis_initgroup_filter 4.86s 7528448 WithOUT ldap_rfc2307bis_initgroup_filter 26.82s 15962112
So the symlink does provide some improvement. But 26 seconds is still not acceptable for an ssh login time.
This is with sssd v1.5.15 BTW.
Best regards,
Tim Gollschewsky.
This e-mail is sent by Suncorp Group Limited ABN 66 145 290 124 or one of its related entities "Suncorp". Suncorp may be contacted at Level 18, 36 Wickham Terrace, Brisbane or on 13 11 55 or at suncorp.com.au. The content of this e-mail is the view of the sender or stated author and does not necessarily reflect the view of Suncorp. The content, including attachments, is a confidential communication between Suncorp and the intended recipient. If you are not the intended recipient, any use, interference with, disclosure or copying of this e-mail, including attachments, is unauthorised and expressly prohibited. If you have received this e-mail in error please contact the sender immediately and delete the e-mail and any attachments from your system.
Tim could you take a network trace while performing the initgroups operations ? I'd like to asses if there is a problem with network operations.
Simo.
On Tue, 2011-11-08 at 11:26 +1000, GOLLSCHEWSKY, Tim wrote:
Just thought I'd contribute my results in case this helps with your investigation of the larger problem. I assume there are other organisations with huge AD/LDAP directories that are having similar issues with ssh authentication times.
Out of interest, what performance do you get in those two cases if you symlink cache_default.ldb to /dev/shm/cache_default.ldb?
I've re-run my tests, with and without the symlink to /dev/shm:
No symlink: Time Size of cache.ldb With ldap_rfc2307bis_initgroup_filter 6.43s 7553024 WithOUT ldap_rfc2307bis_initgroup_filter 43.95s 15974400
With symlink: Time Size of cache.ldb With ldap_rfc2307bis_initgroup_filter 4.86s 7528448 WithOUT ldap_rfc2307bis_initgroup_filter 26.82s 15962112
So the symlink does provide some improvement. But 26 seconds is still not acceptable for an ssh login time.
This is with sssd v1.5.15 BTW.
Best regards,
Tim Gollschewsky.
This e-mail is sent by Suncorp Group Limited ABN 66 145 290 124 or one of its related entities "Suncorp". Suncorp may be contacted at Level 18, 36 Wickham Terrace, Brisbane or on 13 11 55 or at suncorp.com.au. The content of this e-mail is the view of the sender or stated author and does not necessarily reflect the view of Suncorp. The content, including attachments, is a confidential communication between Suncorp and the intended recipient. If you are not the intended recipient, any use, interference with, disclosure or copying of this e-mail, including attachments, is unauthorised and expressly prohibited. If you have received this e-mail in error please contact the sender immediately and delete the e-mail and any attachments from your system. _______________________________________________ sssd-devel mailing list sssd-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/sssd-devel
On Tue, 8 Nov 2011, GOLLSCHEWSKY, Tim wrote:
I've re-run my tests, with and without the symlink to /dev/shm:
No symlink: Time Size of cache.ldb With ldap_rfc2307bis_initgroup_filter 6.43s 7553024 WithOUT ldap_rfc2307bis_initgroup_filter 43.95s 15974400
With symlink: Time Size of cache.ldb With ldap_rfc2307bis_initgroup_filter 4.86s 7528448 WithOUT ldap_rfc2307bis_initgroup_filter 26.82s 15962112
So the symlink does provide some improvement. But 26 seconds is still not acceptable for an ssh login time.
Sure. Just for info, picking on a user I know to be in a lot of groups (92) against AD with a reasonable amount of nested groups and no cache, id takes 5-10 seconds (varying with load on ldap servers I guess), and final cache.ldb size is 13627392 so not far off yours. That was tested with 1.6.1.
jh
On Tue, Nov 08, 2011 at 10:31:56AM +0000, John Hodrien wrote:
On Tue, 8 Nov 2011, GOLLSCHEWSKY, Tim wrote:
I've re-run my tests, with and without the symlink to /dev/shm:
No symlink: Time Size of cache.ldb With ldap_rfc2307bis_initgroup_filter 6.43s 7553024 WithOUT ldap_rfc2307bis_initgroup_filter 43.95s 15974400
With symlink: Time Size of cache.ldb With ldap_rfc2307bis_initgroup_filter 4.86s 7528448 WithOUT ldap_rfc2307bis_initgroup_filter 26.82s 15962112
So the symlink does provide some improvement. But 26 seconds is still not acceptable for an ssh login time.
Sure. Just for info, picking on a user I know to be in a lot of groups (92) against AD with a reasonable amount of nested groups and no cache, id takes 5-10 seconds (varying with load on ldap servers I guess), and final cache.ldb size is 13627392 so not far off yours. That was tested with 1.6.1.
jh
Just for the record, running plain "id" is not the best way to measure how long does the initgroups() operation take, "id -G" is much better.
The difference is that without -G, id would also run getgrgid() on all the returned GID numbers.
sssd-devel@lists.fedorahosted.org