New subject: sssd: AD service discovery and invalidating cache

Friday, 4 January 2019

(re-sending as I initially sent to ssd-users-owners in error)

For an AD environment using service discovery.

Periodically sssd will invalidate its cache at unexpected times.  Digging
around debug logs and sources leads me to understand the following:

Every 15 minutes (or as defined by ldap_connection_expire_timeout) sssd
re-establishes the connection to LDAP, closing the exiting collection.
When sssd is configured to auto discover (via DNS _srv_ records, where the
priority is the same for each server); auto-discovery might return a
different LDAP server, at which point sssd's stored uSNChanged values are
invalid (as these are unique to each server), the cached values are
cleared, and enumeration is run - essentially afresh - against the new LDAP
server.

Is this outcome expected by design?

This behaviour is rather unfortunate as sssd_be will become CPU hog as it
rebuilds the cache again.

It is possible to work around the behaviour e.g.:

1) by not using service discovery, i.e.

ad_server = server1
ad_backup = server2

which is fairly tiresome to maintain across an estate - separate
configurations for different sites etc, faking load balancing by swapping
configurations.

2) having different priorities for each AD server in a given site, losing
load balancing - unless DNS gave out different priorities depending on the
source of the request, but this seems messy.

A better approach might be to patch sssd's auto discovery to "stick" to the
previously bound LDAP server, currently the first server in the list of
primary servers returned by ad_sort_servers_by_dns().  I have a proof of
concept patch that is straight forward, and fairly well contained, the
behaviour is controlled by an ad_sticky option in sssd.conf.

Is there a better solution to this problem?   Would a patch - as vaguely
outlined above - likely gain acceptance?

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

sssd: AD service discovery and invalidating cache