Hi,
This is in a IPA deployment. We have three masters/replicas in a triangular topology, A-B,
B-C, C-A.
The systems are called: rotte, linge and iparep4.
rotte is CentOS 7, with 389-ds-base-1.3.9.1-13.el7_7.x86_64
linge and iparep4 are CentOS 8 Stream, with
389-ds-base-1.4.3.23-2.module_el8.5.0+835+5d54734c.x86_64
Yesterday I removed some members from a user group on rotte. This caused the follow
errors
on linge (and on iparep4).
Jul 26 11:44:37
linge.example.com ns-slapd[282944]: [26/Jul/2021:11:44:37.947738548 +0200]
- ERR - NSMMReplicationPlugin - changelog program - _cl5WriteOperationTxn - retry (49) the
transaction (csn=60fe8535001000030000) failed (rc=-30993 (BDB0068 DB_LOCK_DEADLOCK: Locker
killed to resolve a deadlock))
Jul 26 11:44:38
linge.example.com ns-slapd[282944]: [26/Jul/2021:11:44:38.000964611 +0200]
- ERR - NSMMReplicationPlugin - changelog program - _cl5WriteOperationTxn - Failed to
write entry with csn (60fe8535001000030000); db error - -30993 BDB0068 DB_LOCK_DEADLOCK:
Locker killed to resolve a deadlock
Jul 26 11:44:38
linge.example.com ns-slapd[282944]: [26/Jul/2021:11:44:38.025996273 +0200]
- ERR - NSMMReplicationPlugin - write_changelog_and_ruv - Can't add a change for
cn=vpn_users,cn=groups,cn=accounts,dc=example,dc=com (uniqid:
31283c01-a16511e9-93cf90e8-ab7c8ee8, optype: 8) to changelog csn 60fe8535001000030000
Jul 26 11:44:38
linge.example.com ns-slapd[282944]: [26/Jul/2021:11:44:38.062640602 +0200]
- ERR - NSMMReplicationPlugin - process_postop - Failed to apply update
(60fe8535001000030000) error (1). Aborting replication session(conn=53596 op=65)
On rotte
jul 26 11:44:39
rotte.example.com ns-slapd[2705]: [26/Jul/2021:11:44:39.055890736 +0200] -
WARN - NSMMReplicationPlugin - repl5_inc_update_from_op_result -
agmt="cn=meTolinge.example.com" (linge:389): Consumer failed to replay change
(uniqueid 31283c01-a16511e9-93cf90e8-ab7c8ee8, CSN 60fe8535001000030000): Operations error
(1). Will retry later.
jul 26 11:44:39
rotte.example.com ns-slapd[2705]: [26/Jul/2021:11:44:39.058198988 +0200] -
WARN - NSMMReplicationPlugin - repl5_inc_update_from_op_result -
agmt="cn=meTolinge.example.com" (linge:389): Consumer failed to replay change
(uniqueid 31283c01-a16511e9-93cf90e8-ab7c8ee8, CSN 60fe8535003300030000): Operations
error(1). Will retry later.
jul 26 11:44:39
rotte.example.com ns-slapd[2705]: [26/Jul/2021:11:44:39.069825407 +0200] -
ERR - NSMMReplicationPlugin - release_replica - agmt="cn=meTolinge.example.com"
(linge:389): Unable to send endReplication extended operation (Operations error)
jul 26 11:44:46
rotte.example.com ns-slapd[2705]: [26/Jul/2021:11:44:46.561562313 +0200] -
INFO - NSMMReplicationPlugin - bind_and_check_pwp -
agmt="cn=meTolinge.example.com" (linge:389): Replication bind with GSSAPI auth
resumed
As far as I can see the user group is correctly modified on all replicas. But it
doesn't
look healthy to me.
Is there anything I can do to see what went wrong? Is there something to improve
in the configuration?
--
Kees