I've been running some benchmarks, timing how long it takes nss_ldap
and sssd-ldap configurations to map uid to user names, trying to get an
sssd.conf configuration that performs more or less to the existing
nss_ldap configuration in terms of system responsive and network
utilization. For better or worse I work in an environment where it's
not uncommon for folks to do ls -l on directories where hundreds or
thousands of subdirectories have unique ids. These tests are run on
SLES12sp5, and have the following relevant pkgs:
# rpm -q kernel-default sssd sssd-ldap nss_ldap
kernel-default-4.12.14-122.136.1.x86_64
sssd-1.16.1-7.44.1.x86_64
sssd-ldap-1.16.1-7.44.1.x86_64
nss_ldap-265-35.12.x86_64
With the system configured for nss_ldap a "stat -c%U" on a parent
directory containing 34735 uniquely owned subdirectories, takes 15 to
30 seconds to complete, and generates approximately 30MB of ldap
traffic. This is consistent barring any account caching, e.g. nscd. I
know the number of subdirectories is extreme, but I find it useful for
"worst case" benchmark purposes.
With the system configured for sssd-ldap a "stat -C%U" on the same
directory initially takes 8+ minutes, and generates ~53MB of ldap
traffic. Subsequent runs of the stat command complete in as little as 5
seconds or as long as 45 seconds with the benefit of the sssd cache.
To achieve similar system and network performance to nss_ldap, I'm
considering increasing entry_cache_timeout from 90 minutes to 4 hours,
with refresh_expired_interval set at 3 hours in the hopes of not
exceeding the amount of network traffic I see with nss_ldap. I'm also
comtemplating pre-populating the sssd cache, when provisioning new
systems in order to prevent the initial time it takes to map uid to
user name? If so, does it make sense to copy recent copies of the
files under /var/lib/sss/db to newly deployed systems?
Can I attribute the variance in time it takes to run the stat command,
anywhere from 5 to 45 seconds, to the background refresh process being
in the "middle" of it's refresh? Should I accept that this will always
be the case depending on when the "stat -c%U" is run? If the sssd cache
is near expiring and refresh_expired_interval is in the middle of the
refresh process I'm going to see stat take longer to complete than if
the cache is mostly current? Is there anything I can do to avoid this
on systems, e.g. run a cron job that does something similar to the stat
command?
Are there other options I should consider in order to get sssd-ldap to
perform similarily to nss_ldap?
NSS_LDAP
/etc/ldap.conf
base dc=example,dc=com
SIZELIMIT 0
scope sub
uri ldaps://ldap1 ldaps://ldap2
ldap_version 3
bind_policy soft
binddn cn=bind_dn,ou=profile,dc=example,dc=com
bindpw PASSWORD
DEREF never
timelimit 30
bind_timelimit 15
idle_timelimit 3600
nss_base_passwd ou=People,dc=example,dc=com?one
nss_base_group ou=group,dc=example,dc=com?one
nss_base_netgroup ou=netgroup,dc=example,dc=com?one
nss_schema rfc2307
tls_ciphers ECDHE-RSA-AES256-SHA:DHE-RSA-AES256-SHA
nss_initgroups_ignoreusers root,bin,daemon,nobody
SSSD
/etc/sssd/sssd.conf
[sssd]
config_file_version = 2
services = nss, pam
domains = LDAP
[nss]
[pam]
[domain/LDAP]
id_provider = ldap
auth_provider = krb5
krb5_realm =
EXAMPLE.COM
ldap_schema = rfc2307
ldap_uri = ldaps://ldap1, ldaps://ldap2
ldap_search_base = dc=example,dc=com
ldap_user_search_base = ou=People,dc=example,dc=com?onelevel?
ldap_group_search_base = ou=group,dc=example,dc=com?onelevel?
ldap_netgroup_search_base = ou=netgroup,dc=example,dc=com?onelevel?
ldap_default_bind_dn = cn=bind_dn,ou=profile,dc=example,dc=com
ldap_default_authtok = PASSWORD
lookup_family_order = ipv4_only
# following set for testing, otherwise 14400 and 10800 respectively
entry_cache_timeout = 600
refresh_expired_interval = 450
Thanks,
Mark