Hi everyone, I'm stuck with a broken replica. I had a setup with two ipa server in replica (ipa-server-4.6.4 on CentOS 7.6), let's say "idc01" and "idc02".
Due to heavy load idc01 crashed many times, and was not working anymore.
So I tried to redo the replica again. At first I tried to "ipa-replica-manage re-initialize", with no success.
Now I'm trying to redo from scratch the replica setup: on idc02 I removed the segments (ipa topologysegment-del, for both ca and domain suffix), on idc01 I removed everything (ipa-server-install --uninstall), then I joined domain (ipa-client-install), and everything is working so far.
When doing "ipa-replica-install" on idc01 I get:
[...] [28/41]: setting up initial replication Starting replication, please wait until this has completed. Update in progress, 22 seconds elapsed [ldap://idc02.my.dom.ain:389] reports: Update failed! Status: [Error (-11) connection error: Unknown connection error (-11) - Total update aborted]
And on idc02 (the working server), in /var/log/dirsrv/slapd-MY-DOM-AIN/errors I find lines stating:
[20/Mar/2019:09:28:06.545187923 +0100] - INFO - NSMMReplicationPlugin - repl5_tot_run - Beginning total update of replica "agmt="cn=meToidc01.my.dom.ain" (idc01:389)". [20/Mar/2019:09:28:26.528046160 +0100] - ERR - NSMMReplicationPlugin - perform_operation - agmt="cn=meToidc01.my.dom.ain" (idc01:389): Failed to send extended operation: LDAP error -1 (Can't contact LDAP server) [20/Mar/2019:09:28:26.530763939 +0100] - ERR - NSMMReplicationPlugin - repl5_tot_log_operation_failure - agmt="cn=meToidc01.my.dom.ain" (idc01:389): Received error -1 (Can't contact LDAP server): for total update operation [20/Mar/2019:09:28:26.532678072 +0100] - ERR - NSMMReplicationPlugin - release_replica - agmt="cn=meToidc01.my.dom.ain" (idc01:389): Unable to send endReplication extended operation (Can't contact LDAP server) [20/Mar/2019:09:28:26.534307539 +0100] - ERR - NSMMReplicationPlugin - repl5_tot_run - Total update failed for replica "agmt="cn=meToidc01.my.dom.ain" (idc01:389)", error (-11) [20/Mar/2019:09:28:26.561763168 +0100] - INFO - NSMMReplicationPlugin - bind_and_check_pwp - agmt="cn=meToidc01.my.dom.ain" (idc01:389): Replication bind with GSSAPI auth resumed [20/Mar/2019:09:28:26.582389258 +0100] - WARN - NSMMReplicationPlugin - repl5_inc_run - agmt="cn=meToidc01.my.dom.ain" (idc01:389): The remote replica has a different database generation ID than the local database. You may have to reinitialize the remote replica, or the local replica.
It seems that idc02 remembers something about the old replica.
Any hint?
Thank you in advance, Giulio