I've been running some benchmarks, timing how long it takes nss_ldap and sssd-ldap configurations to map uid to user names, trying to get an sssd.conf configuration that performs more or less to the existing nss_ldap configuration in terms of system responsive and network utilization. For better or worse I work in an environment where it's not uncommon for folks to do ls -l on directories where hundreds or thousands of subdirectories have unique ids. These tests are run on SLES12sp5, and have the following relevant pkgs:
# rpm -q kernel-default sssd sssd-ldap nss_ldap kernel-default-4.12.14-122.136.1.x86_64 sssd-1.16.1-7.44.1.x86_64 sssd-ldap-1.16.1-7.44.1.x86_64 nss_ldap-265-35.12.x86_64
With the system configured for nss_ldap a "stat -c%U" on a parent directory containing 34735 uniquely owned subdirectories, takes 15 to 30 seconds to complete, and generates approximately 30MB of ldap traffic. This is consistent barring any account caching, e.g. nscd. I know the number of subdirectories is extreme, but I find it useful for "worst case" benchmark purposes.
With the system configured for sssd-ldap a "stat -C%U" on the same directory initially takes 8+ minutes, and generates ~53MB of ldap traffic. Subsequent runs of the stat command complete in as little as 5 seconds or as long as 45 seconds with the benefit of the sssd cache.
To achieve similar system and network performance to nss_ldap, I'm considering increasing entry_cache_timeout from 90 minutes to 4 hours, with refresh_expired_interval set at 3 hours in the hopes of not exceeding the amount of network traffic I see with nss_ldap. I'm also comtemplating pre-populating the sssd cache, when provisioning new systems in order to prevent the initial time it takes to map uid to user name? If so, does it make sense to copy recent copies of the files under /var/lib/sss/db to newly deployed systems?
Can I attribute the variance in time it takes to run the stat command, anywhere from 5 to 45 seconds, to the background refresh process being in the "middle" of it's refresh? Should I accept that this will always be the case depending on when the "stat -c%U" is run? If the sssd cache is near expiring and refresh_expired_interval is in the middle of the refresh process I'm going to see stat take longer to complete than if the cache is mostly current? Is there anything I can do to avoid this on systems, e.g. run a cron job that does something similar to the stat command?
Are there other options I should consider in order to get sssd-ldap to perform similarily to nss_ldap?
NSS_LDAP
/etc/ldap.conf base dc=example,dc=com SIZELIMIT 0 scope sub uri ldaps://ldap1 ldaps://ldap2 ldap_version 3 bind_policy soft binddn cn=bind_dn,ou=profile,dc=example,dc=com bindpw PASSWORD DEREF never timelimit 30 bind_timelimit 15 idle_timelimit 3600 nss_base_passwd ou=People,dc=example,dc=com?one nss_base_group ou=group,dc=example,dc=com?one nss_base_netgroup ou=netgroup,dc=example,dc=com?one nss_schema rfc2307 tls_ciphers ECDHE-RSA-AES256-SHA:DHE-RSA-AES256-SHA nss_initgroups_ignoreusers root,bin,daemon,nobody
SSSD
/etc/sssd/sssd.conf [sssd] config_file_version = 2 services = nss, pam domains = LDAP [nss] [pam] [domain/LDAP] id_provider = ldap auth_provider = krb5 krb5_realm = EXAMPLE.COM ldap_schema = rfc2307 ldap_uri = ldaps://ldap1, ldaps://ldap2 ldap_search_base = dc=example,dc=com ldap_user_search_base = ou=People,dc=example,dc=com?onelevel? ldap_group_search_base = ou=group,dc=example,dc=com?onelevel? ldap_netgroup_search_base = ou=netgroup,dc=example,dc=com?onelevel? ldap_default_bind_dn = cn=bind_dn,ou=profile,dc=example,dc=com ldap_default_authtok = PASSWORD lookup_family_order = ipv4_only # following set for testing, otherwise 14400 and 10800 respectively entry_cache_timeout = 600 refresh_expired_interval = 450
Thanks, Mark
On 11/30/22 21:46, Christian, Mark wrote:
Are there other options I should consider in order to get sssd-ldap to perform similarily to nss_ldap?
NSS_LDAP
/etc/ldap.conf
It seems you're using PADL's classic nss_ldap. If yes, then you've also enabled nscd I guess. While I dislike nscd for historic reasons its NSS map query performance is the maximum you can expect. (I took this as reference when benchmarking my custom NSS/PAM demon for Æ-DIR.)
IIRC it's not supported to cache passwd and group maps served by libnss_sss with nscd.
The first thing I'd try is to enable full enumeration of the maps in sssd.conf. IIRC this can lead to other problems if you have several ten thousands of users and groups. YMMV.
Ciao, Michael.
sssd-users@lists.fedorahosted.org