I expected kind of this answer. But good to have it confirmed.
Have a nice day,
Von: Michael Ströder <michael(a)stroeder.com>
Gesendet: Montag, 23. November 2020 11:33
An: End-user discussions about the System Security Services Daemon
<sssd-users(a)lists.fedorahosted.org>; Jochen Schaefer
Betreff: Re: [SSSD-users] primary LDAP server reconnect timeout after failover to backup
On 11/23/20 10:23 AM, Jochen Schaefer wrote:
I have following design problem regarding the primary LDAP server
reconnect timeout value:
from time to time we need to recreate the DB's of the primary ldap
server via sync repl. Therefor we are stopping the primary LDAP,
deleting it's db files and starting it again.
The sssd client behaves as expected:
* failover to the backup LDAP server
* check after internal timeout 31 seconds if primary is available again
* switch back to the primary LDAP server
The problem here is - the primary is still not ready with its sync
This is a general problem with OpenLDAP taking some time in refresh
phase. Same like with any other database server and significant amount
of DB entries to be replicated during initialization.
You could also try to reduce the amount of time needed for initializing
the replica (maybe you already did). But the time period of the refresh
phase will never be zero.
I'd recommend to solve that with an operational procedure which blocks
LDAP access (e.g. with temporary host-based firewall rule) from regular
LDAP clients until monitoring shows that the replica is in sync again.
More sophistic approaches would involve using load-balancer(s) with
sophistic replica health checks.