Apologies for the length/verbosity of the lasts message.

I've read there can be a situation on IPA startup where the KDC server isn't fully up, but LDAP is up.  At that point in time LDAP can get swamped causing failures to connect to the KDC?  This appears to be my problem.

https://pagure.io/freeipa/issue/8544

Please, anyone, let me know if this is the wrong conclusion.

Thank you to the IPA folks answering questions on this list.

Scott


On 7/28/21 2:58 PM, Scott Serr via FreeIPA-users wrote:
I'm running 5 ipa servers with (the latest on CentOS 8) 4.9.2.

Synchronization had stopped yesterday and also 3 days ago.  It actually stopped yesterday after I stopped / modified / started "ipa1" to configure rotating logs longer so I could track down what happened 3 days ago.

2021-07-27 17:22:46 ipactl stop
2021-07-27 17:22:59 emacs dse.ldif    # Modify to access and error log rotation values
2021-07-27 17:23:45 ipactl start

Below seems to be what kicked off the bad behavior.  I've seen a few posts about removing the keys out of dse.ldif when this happens.  I'm a bit leery of doing this, as I don't fully understand what is going on.  (is it comparable to clearing out known_host entries when using ssh?)

[27/Jul/2021:17:23:49.818525015 -0600] - ERR - attrcrypt_unwrap_key - Failed to unwrap key for cipher AES
[27/Jul/2021:17:23:49.820422259 -0600] - ERR - attrcrypt_cipher_init - Symmetric key failed to unwrap with the private key; Cert might have been renewed since the
 key is wrapped.  To recover the encrypted contents, keep the wrapped symmetric key value.
[27/Jul/2021:17:23:50.040967207 -0600] - ERR - attrcrypt_unwrap_key - Failed to unwrap key for cipher 3DES
[27/Jul/2021:17:23:50.043074553 -0600] - ERR - attrcrypt_cipher_init - Symmetric key failed to unwrap with the private key; Cert might have been renewed since the
 key is wrapped.  To recover the encrypted contents, keep the wrapped symmetric key value.
[27/Jul/2021:17:23:50.044268421 -0600] - ERR - attrcrypt_init - All prepared ciphers are not available. Please disable attribute encryption.
[27/Jul/2021:17:23:50.263786473 -0600] - ERR - attrcrypt_unwrap_key - Failed to unwrap key for cipher AES
[27/Jul/2021:17:23:50.266090934 -0600] - ERR - attrcrypt_cipher_init - Symmetric key failed to unwrap with the private key; Cert might have been renewed since the key is wrapped.  To recover the encrypted contents, keep the wrapped symmetric key value.
[27/Jul/2021:17:23:50.470918523 -0600] - ERR - attrcrypt_unwrap_key - Failed to unwrap key for cipher 3DES
[27/Jul/2021:17:23:50.472915669 -0600] - ERR - attrcrypt_cipher_init - Symmetric key failed to unwrap with the private key; Cert might have been renewed since the key is wrapped.  To recover the encrypted contents, keep the wrapped symmetric key value.
[27/Jul/2021:17:23:50.474282471 -0600] - ERR - attrcrypt_init - All prepared ciphers are not available. Please disable attribute encryption.
[27/Jul/2021:17:23:50.891048127 -0600] - ERR - schema-compat-plugin - scheduled schema-compat-plugin tree scan in about 5 seconds after the server startup!

Then ipa1 can't talk to the replicas (ipa2,ipa3,ipa5,ipa6) shown below:

[27/Jul/2021:17:23:51.081696109 -0600] - ERR - set_krb5_creds - Could not get initial credentials for principal [ldap/ipa1.hpc.example.com@HPC.EXAMPLE.COM] in keytab [FILE:/etc/dirsrv/ds.keytab]: -1765328228 (Cannot contact any KDC for requested realm)
[27/Jul/2021:17:23:51.086755379 -0600] - ERR - NSMMReplicationPlugin - bind_and_check_pwp - agmt="cn=meToipa4.hpc.example.com" (ipa4:389) - Replication bind with GSSAPI auth failed: LDAP error -1 (Can't contact LDAP server) ()
[27/Jul/2021:17:23:51.091748474 -0600] - ERR - set_krb5_creds - Could not get initial credentials for principal [ldap/ipa1.hpc.example.com@HPC.EXAMPLE.COM] in keytab [FILE:/etc/dirsrv/ds.keytab]: -1765328228 (Cannot contact any KDC for requested realm)
[27/Jul/2021:17:23:51.093430455 -0600] - ERR - NSMMReplicationPlugin - bind_and_check_pwp - agmt="cn=ipa1.hpc.example.com-to-ipa6.hpc.example.com" (ipa6:389) - Replication bind with GSSAPI auth failed: LDAP error -1 (Can't contact LDAP server) ()
[27/Jul/2021:17:23:51.094725291 -0600] - ERR - schema-compat-plugin - schema-compat-plugin tree scan will start in about 5 seconds!
[27/Jul/2021:17:23:51.096059194 -0600] - ERR - set_krb5_creds - Could not get initial credentials for principal [ldap/ipa1.hpc.example.com@HPC.EXAMPLE.COM] in keytab [FILE:/etc/dirsrv/ds.keytab]: -1765328228 (Cannot contact any KDC for requested realm)
[27/Jul/2021:17:23:51.097152619 -0600] - INFO - slapd_daemon - slapd started.  Listening on All Interfaces port 389 for LDAP requests
[27/Jul/2021:17:23:51.098356748 -0600] - INFO - slapd_daemon - Listening on All Interfaces port 636 for LDAPS requests
[27/Jul/2021:17:23:51.099577958 -0600] - INFO - slapd_daemon - Listening on /var/run/slapd-HPC-EXAMPLE-COM.socket for LDAPI requests
[27/Jul/2021:17:23:51.100701349 -0600] - ERR - NSMMReplicationPlugin - bind_and_check_pwp - agmt="cn=caToipa3.hpc.example.com" (ipa3:389) - Replication bind with GSSAPI auth failed: LDAP error -1 (Can't contact LDAP server) ()
[27/Jul/2021:17:23:51.101782194 -0600] - ERR - set_krb5_creds - Could not get initial credentials for principal [ldap/ipa1.hpc.example.com@HPC.EXAMPLE.COM] in keytab [FILE:/etc/dirsrv/ds.keytab]: -1765328228 (Cannot contact any KDC for requested realm)
[27/Jul/2021:17:23:51.103848248 -0600] - ERR - NSMMReplicationPlugin - bind_and_check_pwp - agmt="cn=caToipa5.hpc.example.com" (ipa5:389) - Replication bind with GSSAPI auth failed: LDAP error -1 (Can't contact LDAP server) ()
[27/Jul/2021:17:23:58.152621025 -0600] - ERR - schema-compat-plugin - Finished plugin initialization.
[27/Jul/2021:17:24:21.201225830 -0600] - ERR - NSMMReplicationPlugin - bind_and_check_pwp - agmt="cn=meToipa2.hpc.example.com" (ipa2:389) - Replication bind with GSSAPI auth failed: LDAP error -1 (Can't contact LDAP server) ()
[27/Jul/2021:17:24:21.203158794 -0600] - ERR - NSMMReplicationPlugin - bind_and_check_pwp - agmt="cn=ipa1.hpc.example.com-to-ipa6.hpc.example.com" (ipa6:389) - Replication bind with GSSAPI auth failed: LDAP error -1 (Can't contact LDAP server) ()
[27/Jul/2021:17:24:21.204833314 -0600] - ERR - NSMMReplicationPlugin - bind_and_check_pwp - agmt="cn=meToipa3.hpc.example.com" (ipa3:389) - Replication bind with GSSAPI auth failed: LDAP error -1 (Can't contact LDAP server) ()
[27/Jul/2021:17:24:21.206099975 -0600] - ERR - NSMMReplicationPlugin - bind_and_check_pwp - agmt="cn=meToipa5.hpc.example.com" (ipa5:389) - Replication bind with GSSAPI auth failed: LDAP error -1 (Can't contact LDAP server) ()
[27/Jul/2021:17:54:03.675297221 -0600] - ERR - NSMMReplicationPlugin - bind_and_check_pwp - agmt="cn=caToipa2.hpc.example.com" (ipa2:389) - Replication bind with GSSAPI auth failed: LDAP error -1 (Can't contact LDAP server) ()

After realizing I had a problem this morning, I rebooted ipa1 but it did not help syncing.  I re-initialized ipa1 from ipa3, this got them all authenticating to each other and in sync.

[28/Jul/2021:08:09:10.347094254 -0600] - INFO - NSMMReplicationPlugin - bind_and_check_pwp - agmt="cn=caToipa3.hpc.inl.gov" (ipa3:389): Replication bind with GSSAPI auth resumed
[28/Jul/2021:08:09:10.449170075 -0600] - INFO - NSMMReplicationPlugin - bind_and_check_pwp - agmt="cn=meToipa3.hpc.inl.gov" (ipa3:389): Replication bind with GSSAPI auth resumed
[....]

I changed the Data Manager password with "dsconf" -- but that was between the first failure and the second.  Could that be causing problems?  What direction to go from here?  Thank you!

Scott


_______________________________________________
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to freeipa-users-leave@lists.fedorahosted.org
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedorahosted.org/archives/list/freeipa-users@lists.fedorahosted.org
Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure