Hi,
I've continued testing 389ds v.1.3.2.24 on CentOS 7. I really have an impression that everything works fine (plugins etc) but the replication seems to be a little fragile. Both of the tickets i've already opened concern replication partially or completely (https://fedorahosted.org/389/ticket/47942 and https://fedorahosted.org/389/ticket/47950).
Here is another issue with replication : i have two servers with multi-master agreements on each of them (the same configuration as in ticket https://fedorahosted.org/389/ticket/47942).
We add/delete a lot of groups (943, to be exact). Each group may contain a large number of referenced entries, up to ~250 (uniqueMember: dn). MemberOf plugin is activated and works fine. Referential integrity plugin is also activated but of course it is of any sense only when deleting groups (or renaming them). It goes on for a long time (20-30 minutes or more). Some time after the beginning of the operations (typically 5-8 minutes) we have replication erros and inconsistency of the replica concerning the entries mentioned in error log.
When adding and deleting groups the supplier is ok. Howevere the consumer has several (from one to four or five) groupe deletions/adds that are not replicated. The error on the supplier:
[12/Nov/2014:16:46:42 +0100] NSMMReplicationPlugin - agmt="cn=Replication from ldap-edev.polytechnique.fr to ldap-model.polytechnique.fr" (ldap-model:636): Consumer failed to replay change (uniqueid fa90219d-6a8211e4-a42c901a-94623bee, CSN 546380d6000000020000): Operations error (1). Will retry later. [12/Nov/2014:16:47:55 +0100] NSMMReplicationPlugin - agmt="cn=Replication from ldap-edev.polytechnique.fr to ldap-model.polytechnique.fr" (ldap-model:636): Consumer failed to replay change (uniqueid 1e5367ae-6a8311e4-a42c901a-94623bee, CSN 54638125000000020000): Operations error (1). Will retry later. [12/Nov/2014:16:53:14 +0100] NSMMReplicationPlugin - agmt="cn=Replication from ldap-edev.polytechnique.fr to ldap-model.polytechnique.fr" (ldap-model:636): Consumer failed to replay change (uniqueid f4e70b85-6a8311e4-a42c901a-94623bee, CSN 54638260000000020000): Operations error (1). Will retry later. [12/Nov/2014:16:55:12 +0100] NSMMReplicationPlugin - agmt="cn=Replication from ldap-edev.polytechnique.fr to ldap-model.polytechnique.fr" (ldap-model:636): Consumer failed to replay change (uniqueid 3c6d978a-6a8411e4-a42c901a-94623bee, CSN 546382d6000400020000): Operations error (1). Will retry later. [12/Nov/2014:16:56:31 +0100] NSMMReplicationPlugin - agmt="cn=Replication from ldap-edev.polytechnique.fr to ldap-model.polytechnique.fr" (ldap-model:636): Consumer failed to replay change (uniqueid 6030dd93-6a8411e4-a42c901a-94623bee, CSN 54638325000000020000): Operations error (1). Will retry later. [12/Nov/2014:16:57:22 +0100] NSMMReplicationPlugin - agmt="cn=Replication from ldap-edev.polytechnique.fr to ldap-model.polytechnique.fr" (ldap-model:636): Consumer failed to replay change (uniqueid 83f42395-6a8411e4-a42c901a-94623bee, CSN 5463835d000000020000): Operations error (1). Will retry later.
The corresponding errors on the consumer seem to hint deadlocks in these cases: [12/Nov/2014:16:46:41 +0100] NSMMReplicationPlugin - changelog program - _cl5WriteOperationTxn: retry (49) the transaction (csn=546380d6000000020000) failed (rc=-30993 (BDB0068 DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock)) [12/Nov/2014:16:46:41 +0100] NSMMReplicationPlugin - changelog program - _cl5WriteOperationTxn: failed to write entry with csn (546380d6000000020000); db error - -30993 BDB0068 DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock [12/Nov/2014:16:46:41 +0100] NSMMReplicationPlugin - write_changelog_and_ruv: can't add a change for cn=LAN452ESP-2014,ou=2014,ou=Cours,ou=Enseignement,ou=Groupes,dc=id,dc=polytechnique,dc=edu (uniqid: fa90219d-6a8211e4-a42c901a-94623bee, optype: 16) to changelog csn 546380d6000000020000 [12/Nov/2014:16:47:54 +0100] NSMMReplicationPlugin - changelog program - _cl5WriteOperationTxn: retry (49) the transaction (csn=54638125000000020000) failed (rc=-30993 (BDB0068 DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock)) [12/Nov/2014:16:47:54 +0100] NSMMReplicationPlugin - changelog program - _cl5WriteOperationTxn: failed to write entry with csn (54638125000000020000); db error - -30993 BDB0068 DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock [12/Nov/2014:16:47:54 +0100] NSMMReplicationPlugin - write_changelog_and_ruv: can't add a change for cn=LAN472EFLE-2014,ou=2014,ou=Cours,ou=Enseignement,ou=Groupes,dc=id,dc=polytechnique,dc=edu (uniqid: 1e5367ae-6a8311e4-a42c901a-94623bee, optype: 16) to changelog csn 54638125000000020000 [12/Nov/2014:16:53:13 +0100] NSMMReplicationPlugin - changelog program - _cl5WriteOperationTxn: retry (49) the transaction (csn=54638260000000020000) failed (rc=-30993 (BDB0068 DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock)) [12/Nov/2014:16:53:13 +0100] NSMMReplicationPlugin - changelog program - _cl5WriteOperationTxn: failed to write entry with csn (54638260000000020000); db error - -30993 BDB0068 DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock [12/Nov/2014:16:53:13 +0100] NSMMReplicationPlugin - write_changelog_and_ruv: can't add a change for cn=MAT471-2014,ou=2014,ou=Cours,ou=Enseignement,ou=Groupes,dc=id,dc=polytechnique,dc=edu (uniqid: f4e70b85-6a8311e4-a42c901a-94623bee, optype: 16) to changelog csn 54638260000000020000 [12/Nov/2014:16:55:11 +0100] NSMMReplicationPlugin - changelog program - _cl5WriteOperationTxn: retry (49) the transaction (csn=546382d6000400020000) failed (rc=-30993 (BDB0068 DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock)) [12/Nov/2014:16:55:11 +0100] NSMMReplicationPlugin - changelog program - _cl5WriteOperationTxn: failed to write entry with csn (546382d6000400020000); db error - -30993 BDB0068 DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock [12/Nov/2014:16:55:11 +0100] NSMMReplicationPlugin - write_changelog_and_ruv: can't add a change for cn=MEC592-2014,ou=2014,ou=Cours,ou=Enseignement,ou=Groupes,dc=id,dc=polytechnique,dc=edu (uniqid: 3c6d978a-6a8411e4-a42c901a-94623bee, optype: 16) to changelog csn 546382d6000400020000 [12/Nov/2014:16:56:29 +0100] NSMMReplicationPlugin - changelog program - _cl5WriteOperationTxn: retry (49) the transaction (csn=54638325000000020000) failed (rc=-30993 (BDB0068 DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock)) [12/Nov/2014:16:56:29 +0100] NSMMReplicationPlugin - changelog program - _cl5WriteOperationTxn: failed to write entry with csn (54638325000000020000); db error - -30993 BDB0068 DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock [12/Nov/2014:16:56:29 +0100] NSMMReplicationPlugin - write_changelog_and_ruv: can't add a change for cn=PHY566-2014,ou=2014,ou=Cours,ou=Enseignement,ou=Groupes,dc=id,dc=polytechnique,dc=edu (uniqid: 6030dd93-6a8411e4-a42c901a-94623bee, optype: 16) to changelog csn 54638325000000020000 [12/Nov/2014:16:57:20 +0100] NSMMReplicationPlugin - changelog program - _cl5WriteOperationTxn: retry (49) the transaction (csn=5463835d000000020000) failed (rc=-30993 (BDB0068 DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock)) [12/Nov/2014:16:57:20 +0100] NSMMReplicationPlugin - changelog program - _cl5WriteOperationTxn: failed to write entry with csn (5463835d000000020000); db error - -30993 BDB0068 DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock [12/Nov/2014:16:57:20 +0100] NSMMReplicationPlugin - write_changelog_and_ruv: can't add a change for cn=PHY651K-2014,ou=2014,ou=Cours,ou=Enseignement,ou=Groupes,dc=id,dc=polytechnique,dc=edu (uniqid: 83f42395-6a8411e4-a42c901a-94623bee, optype: 16) to changelog csn 5463835d000000020000
Each time the failed group entries are different.
The replication agreements look liek this : 0 cn=Replication from ldap-edev.polytechnique.fr to ldap-model.polytechnique.fr,cn=replica,cn=dc\3Did\2Cdc\3Dpolytechnique\2Cdc\3Dedu,cn=mapping tree,cn=config objectClass: top objectClass: nsDS5ReplicationAgreement cn: Replication from ldap-edev.polytechnique.fr to ldap-model.polytechnique.fr description: Replication agreement from server ldap-edev.polytechnique.fr to server ldap-model.polytechnique.fr nsDS5ReplicaHost: ldap-model.polytechnique.fr nsDS5ReplicaRoot: dc=id,dc=polytechnique,dc=edu nsDS5ReplicaPort: 636 nsDS5ReplicaTransportInfo: SSL nsDS5ReplicaBindDN: cn=RepliX,cn=config nsDS5ReplicaBindMethod: simple nsDS5ReplicatedAttributeList: (objectclass=*) $ EXCLUDE entryusn memberOf nsDS5ReplicatedAttributeListTotal: (objectclass=*) $ EXCLUDE entryusn nsds5ReplicaStripAttrs: modifiersName modifyTimestamp internalModifiersName internalModifyTimestamp internalCreatorsname nsds5replicaBusyWaitTime: 3 nsds5replicaTimeout: 30 nsDS5ReplicaCredentials: {DES}... nsds5ReplicaEnabled: on nsds50ruv: {replicageneration} 54636df3000000020000 nsds50ruv: {replica 1 ldap://ldap-model.polytechnique.fr:389} 54637b3c000000010000 54637dce000000010000 nsds50ruv: {replica 2 ldap://ldap-edev.polytechnique.fr:389} 54636ffe000000020000 54637d2d000000020000 nsruvReplicaLastModified: {replica 1 ldap://ldap-model.polytechnique.fr:389} 00000000 nsruvReplicaLastModified: {replica 2 ldap://ldap-edev.polytechnique.fr:389} 00000000 nsds5replicareapactive: 0 nsds5replicaLastUpdateStart: 20141112173922Z nsds5replicaLastUpdateEnd: 20141112173922Z nsds5replicaChangesSentSinceStartup: 2:1899/21484 nsds5replicaLastUpdateStatus: 0 Replica acquired successfully: Incremental update succeeded nsds5replicaUpdateInProgress: FALSE nsds5replicaLastInitStart: 0 nsds5replicaLastInitEnd: 0
The DNA plug-in is disabled (excluding https://fedorahosted.org/389/ticket/47410)
I've made a ticket concerning this problem (https://fedorahosted.org/389/ticket/47410)
Another problem when doing these tests, with replication agreements suspended is the following message in the error log:
12/Nov/2014:16:15:35 +0100] - Retry count exceeded in modify [12/Nov/2014:16:15:38 +0100] - Retry count exceeded in modify [12/Nov/2014:16:15:40 +0100] - Retry count exceeded in modify [12/Nov/2014:16:15:43 +0100] - Retry count exceeded in modify [12/Nov/2014:16:15:45 +0100] - Retry count exceeded in modify [12/Nov/2014:16:15:47 +0100] - Retry count exceeded in modify [12/Nov/2014:16:15:50 +0100] - Retry count exceeded in modify [12/Nov/2014:16:15:52 +0100] - Retry count exceeded in modify [12/Nov/2014:16:15:55 +0100] - Retry count exceeded in modify [12/Nov/2014:16:15:57 +0100] - Retry count exceeded in modify [12/Nov/2014:16:15:59 +0100] - Retry count exceeded in modify [12/Nov/2014:16:16:02 +0100] - Retry count exceeded in modify [12/Nov/2014:16:16:04 +0100] - Retry count exceeded in modify [12/Nov/2014:16:16:06 +0100] - Retry count exceeded in modify [12/Nov/2014:16:16:09 +0100] - Retry count exceeded in modify [12/Nov/2014:16:16:11 +0100] - Retry count exceeded in modify [12/Nov/2014:16:16:14 +0100] - Retry count exceeded in modify [12/Nov/2014:16:16:17 +0100] - Retry count exceeded in modify [12/Nov/2014:16:16:19 +0100] - Retry count exceeded in modify [12/Nov/2014:16:16:22 +0100] - Retry count exceeded in modify [12/Nov/2014:16:16:24 +0100] - Retry count exceeded in modify ... (it goes on for approx 5 minutes) [12/Nov/2014:16:20:31 +0100] - Retry count exceeded in modify [12/Nov/2014:16:20:33 +0100] - Retry count exceeded in modify [12/Nov/2014:16:20:36 +0100] - Retry count exceeded in modify [12/Nov/2014:16:20:42 +0100] - Retry count exceeded in modify
During these error messages the server doe snot consume CPU but it is does not allow any changes to continue. Do i need to open another ticket for this second problem?
Thank you!
Regards, Andrey
----- Mail original -----
De: "Ivanov Andrey (M.)" andrey.ivanov@polytechnique.fr À: "General discussion list for the 389 Directory server project." 389-users@lists.fedoraproject.org Envoyé: Mercredi 12 Novembre 2014 18:52:44 Objet: [389-users] 389ds v.1.3.2.24 replication deadlocks/retry count exceeded
Hi,
I've continued testing 389ds v.1.3.2.24 on CentOS 7. I really have an impression that everything works fine (plugins etc) but the replication seems to be a little fragile. Both of the tickets i've already opened concern replication partially or completely (https://fedorahosted.org/389/ticket/47942 and https://fedorahosted.org/389/ticket/47950).
Here is another issue with replication : i have two servers with multi-master agreements on each of them (the same configuration as in ticket https://fedorahosted.org/389/ticket/47942).
We add/delete a lot of groups (943, to be exact). Each group may contain a large number of referenced entries, up to ~250 (uniqueMember: dn). MemberOf plugin is activated and works fine. Referential integrity plugin is also activated but of course it is of any sense only when deleting groups (or renaming them). It goes on for a long time (20-30 minutes or more). Some time after the beginning of the operations (typically 5-8 minutes) we have replication erros and inconsistency of the replica concerning the entries mentioned in error log.
When adding and deleting groups the supplier is ok. Howevere the consumer has several (from one to four or five) groupe deletions/adds that are not replicated. The error on the supplier:
[12/Nov/2014:16:46:42 +0100] NSMMReplicationPlugin - agmt="cn=Replication from ldap-edev.polytechnique.fr to ldap-model.polytechnique.fr" (ldap-model:636): Consumer failed to replay change (uniqueid fa90219d-6a8211e4-a42c901a-94623bee, CSN 546380d6000000020000): Operations error (1). Will retry later. [12/Nov/2014:16:47:55 +0100] NSMMReplicationPlugin - agmt="cn=Replication from ldap-edev.polytechnique.fr to ldap-model.polytechnique.fr" (ldap-model:636): Consumer failed to replay change (uniqueid 1e5367ae-6a8311e4-a42c901a-94623bee, CSN 54638125000000020000): Operations error (1). Will retry later. [12/Nov/2014:16:53:14 +0100] NSMMReplicationPlugin - agmt="cn=Replication from ldap-edev.polytechnique.fr to ldap-model.polytechnique.fr" (ldap-model:636): Consumer failed to replay change (uniqueid f4e70b85-6a8311e4-a42c901a-94623bee, CSN 54638260000000020000): Operations error (1). Will retry later. [12/Nov/2014:16:55:12 +0100] NSMMReplicationPlugin - agmt="cn=Replication from ldap-edev.polytechnique.fr to ldap-model.polytechnique.fr" (ldap-model:636): Consumer failed to replay change (uniqueid 3c6d978a-6a8411e4-a42c901a-94623bee, CSN 546382d6000400020000): Operations error (1). Will retry later. [12/Nov/2014:16:56:31 +0100] NSMMReplicationPlugin - agmt="cn=Replication from ldap-edev.polytechnique.fr to ldap-model.polytechnique.fr" (ldap-model:636): Consumer failed to replay change (uniqueid 6030dd93-6a8411e4-a42c901a-94623bee, CSN 54638325000000020000): Operations error (1). Will retry later. [12/Nov/2014:16:57:22 +0100] NSMMReplicationPlugin - agmt="cn=Replication from ldap-edev.polytechnique.fr to ldap-model.polytechnique.fr" (ldap-model:636): Consumer failed to replay change (uniqueid 83f42395-6a8411e4-a42c901a-94623bee, CSN 5463835d000000020000): Operations error (1). Will retry later.
The corresponding errors on the consumer seem to hint deadlocks in these cases: [12/Nov/2014:16:46:41 +0100] NSMMReplicationPlugin - changelog program - _cl5WriteOperationTxn: retry (49) the transaction (csn=546380d6000000020000) failed (rc=-30993 (BDB0068 DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock)) [12/Nov/2014:16:46:41 +0100] NSMMReplicationPlugin - changelog program - _cl5WriteOperationTxn: failed to write entry with csn (546380d6000000020000); db error - -30993 BDB0068 DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock [12/Nov/2014:16:46:41 +0100] NSMMReplicationPlugin - write_changelog_and_ruv: can't add a change for cn=LAN452ESP-2014,ou=2014,ou=Cours,ou=Enseignement,ou=Groupes,dc=id,dc=polytechnique,dc=edu (uniqid: fa90219d-6a8211e4-a42c901a-94623bee, optype: 16) to changelog csn 546380d6000000020000 [12/Nov/2014:16:47:54 +0100] NSMMReplicationPlugin - changelog program - _cl5WriteOperationTxn: retry (49) the transaction (csn=54638125000000020000) failed (rc=-30993 (BDB0068 DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock)) [12/Nov/2014:16:47:54 +0100] NSMMReplicationPlugin - changelog program - _cl5WriteOperationTxn: failed to write entry with csn (54638125000000020000); db error - -30993 BDB0068 DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock [12/Nov/2014:16:47:54 +0100] NSMMReplicationPlugin - write_changelog_and_ruv: can't add a change for cn=LAN472EFLE-2014,ou=2014,ou=Cours,ou=Enseignement,ou=Groupes,dc=id,dc=polytechnique,dc=edu (uniqid: 1e5367ae-6a8311e4-a42c901a-94623bee, optype: 16) to changelog csn 54638125000000020000 [12/Nov/2014:16:53:13 +0100] NSMMReplicationPlugin - changelog program - _cl5WriteOperationTxn: retry (49) the transaction (csn=54638260000000020000) failed (rc=-30993 (BDB0068 DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock)) [12/Nov/2014:16:53:13 +0100] NSMMReplicationPlugin - changelog program - _cl5WriteOperationTxn: failed to write entry with csn (54638260000000020000); db error - -30993 BDB0068 DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock [12/Nov/2014:16:53:13 +0100] NSMMReplicationPlugin - write_changelog_and_ruv: can't add a change for cn=MAT471-2014,ou=2014,ou=Cours,ou=Enseignement,ou=Groupes,dc=id,dc=polytechnique,dc=edu (uniqid: f4e70b85-6a8311e4-a42c901a-94623bee, optype: 16) to changelog csn 54638260000000020000 [12/Nov/2014:16:55:11 +0100] NSMMReplicationPlugin - changelog program - _cl5WriteOperationTxn: retry (49) the transaction (csn=546382d6000400020000) failed (rc=-30993 (BDB0068 DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock)) [12/Nov/2014:16:55:11 +0100] NSMMReplicationPlugin - changelog program - _cl5WriteOperationTxn: failed to write entry with csn (546382d6000400020000); db error - -30993 BDB0068 DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock [12/Nov/2014:16:55:11 +0100] NSMMReplicationPlugin - write_changelog_and_ruv: can't add a change for cn=MEC592-2014,ou=2014,ou=Cours,ou=Enseignement,ou=Groupes,dc=id,dc=polytechnique,dc=edu (uniqid: 3c6d978a-6a8411e4-a42c901a-94623bee, optype: 16) to changelog csn 546382d6000400020000 [12/Nov/2014:16:56:29 +0100] NSMMReplicationPlugin - changelog program - _cl5WriteOperationTxn: retry (49) the transaction (csn=54638325000000020000) failed (rc=-30993 (BDB0068 DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock)) [12/Nov/2014:16:56:29 +0100] NSMMReplicationPlugin - changelog program - _cl5WriteOperationTxn: failed to write entry with csn (54638325000000020000); db error - -30993 BDB0068 DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock [12/Nov/2014:16:56:29 +0100] NSMMReplicationPlugin - write_changelog_and_ruv: can't add a change for cn=PHY566-2014,ou=2014,ou=Cours,ou=Enseignement,ou=Groupes,dc=id,dc=polytechnique,dc=edu (uniqid: 6030dd93-6a8411e4-a42c901a-94623bee, optype: 16) to changelog csn 54638325000000020000 [12/Nov/2014:16:57:20 +0100] NSMMReplicationPlugin - changelog program - _cl5WriteOperationTxn: retry (49) the transaction (csn=5463835d000000020000) failed (rc=-30993 (BDB0068 DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock)) [12/Nov/2014:16:57:20 +0100] NSMMReplicationPlugin - changelog program - _cl5WriteOperationTxn: failed to write entry with csn (5463835d000000020000); db error - -30993 BDB0068 DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock [12/Nov/2014:16:57:20 +0100] NSMMReplicationPlugin - write_changelog_and_ruv: can't add a change for cn=PHY651K-2014,ou=2014,ou=Cours,ou=Enseignement,ou=Groupes,dc=id,dc=polytechnique,dc=edu (uniqid: 83f42395-6a8411e4-a42c901a-94623bee, optype: 16) to changelog csn 5463835d000000020000
Each time the failed group entries are different.
The replication agreements look liek this : 0 cn=Replication from ldap-edev.polytechnique.fr to ldap-model.polytechnique.fr,cn=replica,cn=dc\3Did\2Cdc\3Dpolytechnique\2Cdc\3Dedu,cn=mapping tree,cn=config objectClass: top objectClass: nsDS5ReplicationAgreement cn: Replication from ldap-edev.polytechnique.fr to ldap-model.polytechnique.fr description: Replication agreement from server ldap-edev.polytechnique.fr to server ldap-model.polytechnique.fr nsDS5ReplicaHost: ldap-model.polytechnique.fr nsDS5ReplicaRoot: dc=id,dc=polytechnique,dc=edu nsDS5ReplicaPort: 636 nsDS5ReplicaTransportInfo: SSL nsDS5ReplicaBindDN: cn=RepliX,cn=config nsDS5ReplicaBindMethod: simple nsDS5ReplicatedAttributeList: (objectclass=*) $ EXCLUDE entryusn memberOf nsDS5ReplicatedAttributeListTotal: (objectclass=*) $ EXCLUDE entryusn nsds5ReplicaStripAttrs: modifiersName modifyTimestamp internalModifiersName internalModifyTimestamp internalCreatorsname nsds5replicaBusyWaitTime: 3 nsds5replicaTimeout: 30 nsDS5ReplicaCredentials: {DES}... nsds5ReplicaEnabled: on nsds50ruv: {replicageneration} 54636df3000000020000 nsds50ruv: {replica 1 ldap://ldap-model.polytechnique.fr:389} 54637b3c000000010000 54637dce000000010000 nsds50ruv: {replica 2 ldap://ldap-edev.polytechnique.fr:389} 54636ffe000000020000 54637d2d000000020000 nsruvReplicaLastModified: {replica 1 ldap://ldap-model.polytechnique.fr:389} 00000000 nsruvReplicaLastModified: {replica 2 ldap://ldap-edev.polytechnique.fr:389} 00000000 nsds5replicareapactive: 0 nsds5replicaLastUpdateStart: 20141112173922Z nsds5replicaLastUpdateEnd: 20141112173922Z nsds5replicaChangesSentSinceStartup: 2:1899/21484 nsds5replicaLastUpdateStatus: 0 Replica acquired successfully: Incremental update succeeded nsds5replicaUpdateInProgress: FALSE nsds5replicaLastInitStart: 0 nsds5replicaLastInitEnd: 0
The DNA plug-in is disabled (excluding https://fedorahosted.org/389/ticket/47410)
I've made a ticket concerning this problem (https://fedorahosted.org/389/ticket/47410)
The ticket for this problem is of course https://fedorahosted.org/389/ticket/47954, not the one i've written in the original mail.
Another problem when doing these tests, with replication agreements suspended is the following message in the error log:
12/Nov/2014:16:15:35 +0100] - Retry count exceeded in modify [12/Nov/2014:16:15:38 +0100] - Retry count exceeded in modify [12/Nov/2014:16:15:40 +0100] - Retry count exceeded in modify [12/Nov/2014:16:15:43 +0100] - Retry count exceeded in modify [12/Nov/2014:16:15:45 +0100] - Retry count exceeded in modify [12/Nov/2014:16:15:47 +0100] - Retry count exceeded in modify [12/Nov/2014:16:15:50 +0100] - Retry count exceeded in modify [12/Nov/2014:16:15:52 +0100] - Retry count exceeded in modify [12/Nov/2014:16:15:55 +0100] - Retry count exceeded in modify [12/Nov/2014:16:15:57 +0100] - Retry count exceeded in modify [12/Nov/2014:16:15:59 +0100] - Retry count exceeded in modify [12/Nov/2014:16:16:02 +0100] - Retry count exceeded in modify [12/Nov/2014:16:16:04 +0100] - Retry count exceeded in modify [12/Nov/2014:16:16:06 +0100] - Retry count exceeded in modify [12/Nov/2014:16:16:09 +0100] - Retry count exceeded in modify [12/Nov/2014:16:16:11 +0100] - Retry count exceeded in modify [12/Nov/2014:16:16:14 +0100] - Retry count exceeded in modify [12/Nov/2014:16:16:17 +0100] - Retry count exceeded in modify [12/Nov/2014:16:16:19 +0100] - Retry count exceeded in modify [12/Nov/2014:16:16:22 +0100] - Retry count exceeded in modify [12/Nov/2014:16:16:24 +0100] - Retry count exceeded in modify ... (it goes on for approx 5 minutes) [12/Nov/2014:16:20:31 +0100] - Retry count exceeded in modify [12/Nov/2014:16:20:33 +0100] - Retry count exceeded in modify [12/Nov/2014:16:20:36 +0100] - Retry count exceeded in modify [12/Nov/2014:16:20:42 +0100] - Retry count exceeded in modify
During these error messages the server doe snot consume CPU but it is does not allow any changes to continue. Do i need to open another ticket for this second problem?
Thank you!
Regards, Andrey
-- 389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users
On 11/12/2014 10:55 AM, Ivanov Andrey (M.) wrote:
*De: *"Ivanov Andrey (M.)" <andrey.ivanov@polytechnique.fr> *À: *"General discussion list for the 389 Directory server project." <389-users@lists.fedoraproject.org> *Envoyé: *Mercredi 12 Novembre 2014 18:52:44 *Objet: *[389-users] 389ds v.1.3.2.24 replication deadlocks/retry count exceeded Hi, I've continued testing 389ds v.1.3.2.24 on CentOS 7. I really have an impression that everything works fine (plugins etc) but the replication seems to be a little fragile. Both of the tickets i've already opened concern replication partially or completely (https://fedorahosted.org/389/ticket/47942 and https://fedorahosted.org/389/ticket/47950). Here is another issue with replication : i have two servers with multi-master agreements on each of them (the same configuration as in ticket https://fedorahosted.org/389/ticket/47942). We add/delete a lot of groups (943, to be exact). Each group may contain a large number of referenced entries, up to ~250 (uniqueMember: dn). MemberOf plugin is activated and works fine. Referential integrity plugin is also activated but of course it is of any sense only when deleting groups (or renaming them). It goes on for a long time (20-30 minutes or more). Some time after the beginning of the operations (typically 5-8 minutes) we have replication erros and inconsistency of the replica concerning the entries mentioned in error log. When adding and deleting groups the supplier is ok. Howevere the consumer has several (from one to four or five) groupe deletions/adds that are not replicated. The error on the supplier: [12/Nov/2014:16:46:42 +0100] NSMMReplicationPlugin - agmt="cn=Replication from ldap-edev.polytechnique.fr to ldap-model.polytechnique.fr" (ldap-model:636): Consumer failed to replay change (uniqueid fa90219d-6a8211e4-a42c901a-94623bee, CSN 546380d6000000020000): Operations error (1). Will retry later. [12/Nov/2014:16:47:55 +0100] NSMMReplicationPlugin - agmt="cn=Replication from ldap-edev.polytechnique.fr to ldap-model.polytechnique.fr" (ldap-model:636): Consumer failed to replay change (uniqueid 1e5367ae-6a8311e4-a42c901a-94623bee, CSN 54638125000000020000): Operations error (1). Will retry later. [12/Nov/2014:16:53:14 +0100] NSMMReplicationPlugin - agmt="cn=Replication from ldap-edev.polytechnique.fr to ldap-model.polytechnique.fr" (ldap-model:636): Consumer failed to replay change (uniqueid f4e70b85-6a8311e4-a42c901a-94623bee, CSN 54638260000000020000): Operations error (1). Will retry later. [12/Nov/2014:16:55:12 +0100] NSMMReplicationPlugin - agmt="cn=Replication from ldap-edev.polytechnique.fr to ldap-model.polytechnique.fr" (ldap-model:636): Consumer failed to replay change (uniqueid 3c6d978a-6a8411e4-a42c901a-94623bee, CSN 546382d6000400020000): Operations error (1). Will retry later. [12/Nov/2014:16:56:31 +0100] NSMMReplicationPlugin - agmt="cn=Replication from ldap-edev.polytechnique.fr to ldap-model.polytechnique.fr" (ldap-model:636): Consumer failed to replay change (uniqueid 6030dd93-6a8411e4-a42c901a-94623bee, CSN 54638325000000020000): Operations error (1). Will retry later. [12/Nov/2014:16:57:22 +0100] NSMMReplicationPlugin - agmt="cn=Replication from ldap-edev.polytechnique.fr to ldap-model.polytechnique.fr" (ldap-model:636): Consumer failed to replay change (uniqueid 83f42395-6a8411e4-a42c901a-94623bee, CSN 5463835d000000020000): Operations error (1). Will retry later. The corresponding errors on the consumer seem to hint deadlocks in these cases: [12/Nov/2014:16:46:41 +0100] NSMMReplicationPlugin - changelog program - _cl5WriteOperationTxn: retry (49) the transaction (csn=546380d6000000020000) failed (rc=-30993 (BDB0068 DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock)) [12/Nov/2014:16:46:41 +0100] NSMMReplicationPlugin - changelog program - _cl5WriteOperationTxn: failed to write entry with csn (546380d6000000020000); db error - -30993 BDB0068 DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock [12/Nov/2014:16:46:41 +0100] NSMMReplicationPlugin - write_changelog_and_ruv: can't add a change for cn=LAN452ESP-2014,ou=2014,ou=Cours,ou=Enseignement,ou=Groupes,dc=id,dc=polytechnique,dc=edu (uniqid: fa90219d-6a8211e4-a42c901a-94623bee, optype: 16) to changelog csn 546380d6000000020000 [12/Nov/2014:16:47:54 +0100] NSMMReplicationPlugin - changelog program - _cl5WriteOperationTxn: retry (49) the transaction (csn=54638125000000020000) failed (rc=-30993 (BDB0068 DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock)) [12/Nov/2014:16:47:54 +0100] NSMMReplicationPlugin - changelog program - _cl5WriteOperationTxn: failed to write entry with csn (54638125000000020000); db error - -30993 BDB0068 DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock [12/Nov/2014:16:47:54 +0100] NSMMReplicationPlugin - write_changelog_and_ruv: can't add a change for cn=LAN472EFLE-2014,ou=2014,ou=Cours,ou=Enseignement,ou=Groupes,dc=id,dc=polytechnique,dc=edu (uniqid: 1e5367ae-6a8311e4-a42c901a-94623bee, optype: 16) to changelog csn 54638125000000020000 [12/Nov/2014:16:53:13 +0100] NSMMReplicationPlugin - changelog program - _cl5WriteOperationTxn: retry (49) the transaction (csn=54638260000000020000) failed (rc=-30993 (BDB0068 DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock)) [12/Nov/2014:16:53:13 +0100] NSMMReplicationPlugin - changelog program - _cl5WriteOperationTxn: failed to write entry with csn (54638260000000020000); db error - -30993 BDB0068 DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock [12/Nov/2014:16:53:13 +0100] NSMMReplicationPlugin - write_changelog_and_ruv: can't add a change for cn=MAT471-2014,ou=2014,ou=Cours,ou=Enseignement,ou=Groupes,dc=id,dc=polytechnique,dc=edu (uniqid: f4e70b85-6a8311e4-a42c901a-94623bee, optype: 16) to changelog csn 54638260000000020000 [12/Nov/2014:16:55:11 +0100] NSMMReplicationPlugin - changelog program - _cl5WriteOperationTxn: retry (49) the transaction (csn=546382d6000400020000) failed (rc=-30993 (BDB0068 DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock)) [12/Nov/2014:16:55:11 +0100] NSMMReplicationPlugin - changelog program - _cl5WriteOperationTxn: failed to write entry with csn (546382d6000400020000); db error - -30993 BDB0068 DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock [12/Nov/2014:16:55:11 +0100] NSMMReplicationPlugin - write_changelog_and_ruv: can't add a change for cn=MEC592-2014,ou=2014,ou=Cours,ou=Enseignement,ou=Groupes,dc=id,dc=polytechnique,dc=edu (uniqid: 3c6d978a-6a8411e4-a42c901a-94623bee, optype: 16) to changelog csn 546382d6000400020000 [12/Nov/2014:16:56:29 +0100] NSMMReplicationPlugin - changelog program - _cl5WriteOperationTxn: retry (49) the transaction (csn=54638325000000020000) failed (rc=-30993 (BDB0068 DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock)) [12/Nov/2014:16:56:29 +0100] NSMMReplicationPlugin - changelog program - _cl5WriteOperationTxn: failed to write entry with csn (54638325000000020000); db error - -30993 BDB0068 DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock [12/Nov/2014:16:56:29 +0100] NSMMReplicationPlugin - write_changelog_and_ruv: can't add a change for cn=PHY566-2014,ou=2014,ou=Cours,ou=Enseignement,ou=Groupes,dc=id,dc=polytechnique,dc=edu (uniqid: 6030dd93-6a8411e4-a42c901a-94623bee, optype: 16) to changelog csn 54638325000000020000 [12/Nov/2014:16:57:20 +0100] NSMMReplicationPlugin - changelog program - _cl5WriteOperationTxn: retry (49) the transaction (csn=5463835d000000020000) failed (rc=-30993 (BDB0068 DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock)) [12/Nov/2014:16:57:20 +0100] NSMMReplicationPlugin - changelog program - _cl5WriteOperationTxn: failed to write entry with csn (5463835d000000020000); db error - -30993 BDB0068 DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock [12/Nov/2014:16:57:20 +0100] NSMMReplicationPlugin - write_changelog_and_ruv: can't add a change for cn=PHY651K-2014,ou=2014,ou=Cours,ou=Enseignement,ou=Groupes,dc=id,dc=polytechnique,dc=edu (uniqid: 83f42395-6a8411e4-a42c901a-94623bee, optype: 16) to changelog csn 5463835d000000020000
See https://bugzilla.redhat.com/show_bug.cgi?id=979169
Each time the failed group entries are different. The replication agreements look liek this : 0 cn=Replication from ldap-edev.polytechnique.fr to ldap-model.polytechnique.fr,cn=replica,cn=dc\\3Did\\2Cdc\\3Dpolytechnique\\2Cdc\\3Dedu,cn=mapping tree,cn=config objectClass: top objectClass: nsDS5ReplicationAgreement cn: Replication from ldap-edev.polytechnique.fr to ldap-model.polytechnique.fr description: Replication agreement from server ldap-edev.polytechnique.fr to server ldap-model.polytechnique.fr nsDS5ReplicaHost: ldap-model.polytechnique.fr nsDS5ReplicaRoot: dc=id,dc=polytechnique,dc=edu nsDS5ReplicaPort: 636 nsDS5ReplicaTransportInfo: SSL nsDS5ReplicaBindDN: cn=RepliX,cn=config nsDS5ReplicaBindMethod: simple nsDS5ReplicatedAttributeList: (objectclass=*) $ EXCLUDE entryusn memberOf nsDS5ReplicatedAttributeListTotal: (objectclass=*) $ EXCLUDE entryusn nsds5ReplicaStripAttrs: modifiersName modifyTimestamp internalModifiersName internalModifyTimestamp internalCreatorsname nsds5replicaBusyWaitTime: 3 nsds5replicaTimeout: 30 nsDS5ReplicaCredentials: {DES}... nsds5ReplicaEnabled: on nsds50ruv: {replicageneration} 54636df3000000020000 nsds50ruv: {replica 1 ldap://ldap-model.polytechnique.fr:389} 54637b3c000000010000 54637dce000000010000 nsds50ruv: {replica 2 ldap://ldap-edev.polytechnique.fr:389} 54636ffe000000020000 54637d2d000000020000 nsruvReplicaLastModified: {replica 1 ldap://ldap-model.polytechnique.fr:389} 00000000 nsruvReplicaLastModified: {replica 2 ldap://ldap-edev.polytechnique.fr:389} 00000000 nsds5replicareapactive: 0 nsds5replicaLastUpdateStart: 20141112173922Z nsds5replicaLastUpdateEnd: 20141112173922Z nsds5replicaChangesSentSinceStartup: 2:1899/21484 nsds5replicaLastUpdateStatus: 0 Replica acquired successfully: Incremental update succeeded nsds5replicaUpdateInProgress: FALSE nsds5replicaLastInitStart: 0 nsds5replicaLastInitEnd: 0 The DNA plug-in is disabled (excluding https://fedorahosted.org/389/ticket/47410) I've made a ticket concerning this problem (https://fedorahosted.org/389/ticket/47410)
The ticket for this problem is of course https://fedorahosted.org/389/ticket/47954, not the one i've written in the original mail.
Another problem when doing these tests, with replication agreements suspended is the following message in the error log: 12/Nov/2014:16:15:35 +0100] - Retry count exceeded in modify [12/Nov/2014:16:15:38 +0100] - Retry count exceeded in modify [12/Nov/2014:16:15:40 +0100] - Retry count exceeded in modify [12/Nov/2014:16:15:43 +0100] - Retry count exceeded in modify [12/Nov/2014:16:15:45 +0100] - Retry count exceeded in modify [12/Nov/2014:16:15:47 +0100] - Retry count exceeded in modify [12/Nov/2014:16:15:50 +0100] - Retry count exceeded in modify [12/Nov/2014:16:15:52 +0100] - Retry count exceeded in modify [12/Nov/2014:16:15:55 +0100] - Retry count exceeded in modify [12/Nov/2014:16:15:57 +0100] - Retry count exceeded in modify [12/Nov/2014:16:15:59 +0100] - Retry count exceeded in modify [12/Nov/2014:16:16:02 +0100] - Retry count exceeded in modify [12/Nov/2014:16:16:04 +0100] - Retry count exceeded in modify [12/Nov/2014:16:16:06 +0100] - Retry count exceeded in modify [12/Nov/2014:16:16:09 +0100] - Retry count exceeded in modify [12/Nov/2014:16:16:11 +0100] - Retry count exceeded in modify [12/Nov/2014:16:16:14 +0100] - Retry count exceeded in modify [12/Nov/2014:16:16:17 +0100] - Retry count exceeded in modify [12/Nov/2014:16:16:19 +0100] - Retry count exceeded in modify [12/Nov/2014:16:16:22 +0100] - Retry count exceeded in modify [12/Nov/2014:16:16:24 +0100] - Retry count exceeded in modify ... (it goes on for approx 5 minutes) [12/Nov/2014:16:20:31 +0100] - Retry count exceeded in modify [12/Nov/2014:16:20:33 +0100] - Retry count exceeded in modify [12/Nov/2014:16:20:36 +0100] - Retry count exceeded in modify [12/Nov/2014:16:20:42 +0100] - Retry count exceeded in modify During these error messages the server doe snot consume CPU but it is does not allow any changes to continue. Do i need to open another ticket for this second problem? Thank you! Regards, Andrey -- 389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users
-- 389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users
----- Mail original ----- The corresponding errors on the consumer seem to hint deadlocks in these cases:
[12/Nov/2014:16:46:41 +0100] NSMMReplicationPlugin - changelog program - _cl5WriteOperationTxn: retry (49) the transaction (csn=546380d6000000020000) failed (rc=-30993 (BDB0068 DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock))
[12/Nov/2014:16:46:41 +0100] NSMMReplicationPlugin - changelog program - _cl5WriteOperationTxn: failed to write entry with csn (546380d6000000020000); db error - -30993 BDB0068 DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock
[12/Nov/2014:16:46:41 +0100] NSMMReplicationPlugin - write_changelog_and_ruv: can't add a change for cn=LAN452ESP-2014,ou=2014,ou=Cours,ou=Enseignement,ou=Groupes,dc=id,dc=polytechnique,dc=edu (uniqid: fa90219d-6a8211e4-a42c901a-94623bee, optype: 16) to changelog csn 546380d6000000020000
[12/Nov/2014:16:47:54 +0100] NSMMReplicationPlugin - changelog program - _cl5WriteOperationTxn: retry (49) the transaction (csn=54638125000000020000) failed (rc=-30993 (BDB0068 DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock))
[12/Nov/2014:16:47:54 +0100] NSMMReplicationPlugin - changelog program - _cl5WriteOperationTxn: failed to write entry with csn (54638125000000020000); db error - -30993 BDB0068 DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock
[12/Nov/2014:16:47:54 +0100] NSMMReplicationPlugin - write_changelog_and_ruv: can't add a change for cn=LAN472EFLE-2014,ou=2014,ou=Cours,ou=Enseignement,ou=Groupes,dc=id,dc=polytechnique,dc=edu (uniqid: 1e5367ae-6a8311e4-a42c901a-94623bee, optype: 16) to changelog csn 54638125000000020000
[12/Nov/2014:16:53:13 +0100] NSMMReplicationPlugin - changelog program - _cl5WriteOperationTxn: retry (49) the transaction (csn=54638260000000020000) failed (rc=-30993 (BDB0068 DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock))
[12/Nov/2014:16:53:13 +0100] NSMMReplicationPlugin - changelog program - _cl5WriteOperationTxn: failed to write entry with csn (54638260000000020000); db error - -30993 BDB0068 DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock
[12/Nov/2014:16:53:13 +0100] NSMMReplicationPlugin - write_changelog_and_ruv: can't add a change for cn=MAT471-2014,ou=2014,ou=Cours,ou=Enseignement,ou=Groupes,dc=id,dc=polytechnique,dc=edu (uniqid: f4e70b85-6a8311e4-a42c901a-94623bee, optype: 16) to changelog csn 54638260000000020000
[12/Nov/2014:16:55:11 +0100] NSMMReplicationPlugin - changelog program - _cl5WriteOperationTxn: retry (49) the transaction (csn=546382d6000400020000) failed (rc=-30993 (BDB0068 DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock))
[12/Nov/2014:16:55:11 +0100] NSMMReplicationPlugin - changelog program - _cl5WriteOperationTxn: failed to write entry with csn (546382d6000400020000); db error - -30993 BDB0068 DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock
[12/Nov/2014:16:55:11 +0100] NSMMReplicationPlugin - write_changelog_and_ruv: can't add a change for cn=MEC592-2014,ou=2014,ou=Cours,ou=Enseignement,ou=Groupes,dc=id,dc=polytechnique,dc=edu (uniqid: 3c6d978a-6a8411e4-a42c901a-94623bee, optype: 16) to changelog csn 546382d6000400020000
[12/Nov/2014:16:56:29 +0100] NSMMReplicationPlugin - changelog program - _cl5WriteOperationTxn: retry (49) the transaction (csn=54638325000000020000) failed (rc=-30993 (BDB0068 DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock))
[12/Nov/2014:16:56:29 +0100] NSMMReplicationPlugin - changelog program - _cl5WriteOperationTxn: failed to write entry with csn (54638325000000020000); db error - -30993 BDB0068 DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock
[12/Nov/2014:16:56:29 +0100] NSMMReplicationPlugin - write_changelog_and_ruv: can't add a change for cn=PHY566-2014,ou=2014,ou=Cours,ou=Enseignement,ou=Groupes,dc=id,dc=polytechnique,dc=edu (uniqid: 6030dd93-6a8411e4-a42c901a-94623bee, optype: 16) to changelog csn 54638325000000020000
[12/Nov/2014:16:57:20 +0100] NSMMReplicationPlugin - changelog program - _cl5WriteOperationTxn: retry (49) the transaction (csn=5463835d000000020000) failed (rc=-30993 (BDB0068 DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock))
[12/Nov/2014:16:57:20 +0100] NSMMReplicationPlugin - changelog program - _cl5WriteOperationTxn: failed to write entry with csn (5463835d000000020000); db error - -30993 BDB0068 DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock
[12/Nov/2014:16:57:20 +0100] NSMMReplicationPlugin - write_changelog_and_ruv: can't add a change for cn=PHY651K-2014,ou=2014,ou=Cours,ou=Enseignement,ou=Groupes,dc=id,dc=polytechnique,dc=edu (uniqid: 83f42395-6a8411e4-a42c901a-94623bee, optype: 16) to changelog csn 5463835d000000020000
Ok, thank you Rich, i'll give it a try tomorrow.
389-users@lists.fedoraproject.org