[389-users] 389ds v.1.3.2.24 replication deadlocks/retry count exceeded
Rich Megginson
rmeggins at redhat.com
Wed Nov 12 17:59:25 UTC 2014
On 11/12/2014 10:55 AM, Ivanov Andrey (M.) wrote:
>
>
>
>
>
> ------------------------------------------------------------------------
>
> *De: *"Ivanov Andrey (M.)" <andrey.ivanov at polytechnique.fr>
> *À: *"General discussion list for the 389 Directory server
> project." <389-users at lists.fedoraproject.org>
> *Envoyé: *Mercredi 12 Novembre 2014 18:52:44
> *Objet: *[389-users] 389ds v.1.3.2.24 replication deadlocks/retry
> count exceeded
>
> Hi,
>
> I've continued testing 389ds v.1.3.2.24 on CentOS 7. I really have
> an impression that everything works fine (plugins etc) but the
> replication seems to be a little fragile. Both of the tickets i've
> already opened concern replication partially or completely
> (https://fedorahosted.org/389/ticket/47942 and
> https://fedorahosted.org/389/ticket/47950).
>
> Here is another issue with replication :
> i have two servers with multi-master agreements on each of them
> (the same configuration as in ticket
> https://fedorahosted.org/389/ticket/47942).
>
> We add/delete a lot of groups (943, to be exact). Each group may
> contain a large number of referenced entries, up to ~250
> (uniqueMember: dn). MemberOf plugin is activated and works fine.
> Referential integrity plugin is also activated but of course it is
> of any sense only when deleting groups (or renaming them). It goes
> on for a long time (20-30 minutes or more). Some time after the
> beginning of the operations (typically 5-8 minutes) we have
> replication erros and inconsistency of the replica concerning the
> entries mentioned in error log.
>
> When adding and deleting groups the supplier is ok. Howevere the
> consumer has several (from one to four or five) groupe
> deletions/adds that are not replicated. The error on the supplier:
>
> [12/Nov/2014:16:46:42 +0100] NSMMReplicationPlugin -
> agmt="cn=Replication from ldap-edev.polytechnique.fr to
> ldap-model.polytechnique.fr" (ldap-model:636): Consumer failed to
> replay change (uniqueid fa90219d-6a8211e4-a42c901a-94623bee, CSN
> 546380d6000000020000): Operations error (1). Will retry later.
> [12/Nov/2014:16:47:55 +0100] NSMMReplicationPlugin -
> agmt="cn=Replication from ldap-edev.polytechnique.fr to
> ldap-model.polytechnique.fr" (ldap-model:636): Consumer failed to
> replay change (uniqueid 1e5367ae-6a8311e4-a42c901a-94623bee, CSN
> 54638125000000020000): Operations error (1). Will retry later.
> [12/Nov/2014:16:53:14 +0100] NSMMReplicationPlugin -
> agmt="cn=Replication from ldap-edev.polytechnique.fr to
> ldap-model.polytechnique.fr" (ldap-model:636): Consumer failed to
> replay change (uniqueid f4e70b85-6a8311e4-a42c901a-94623bee, CSN
> 54638260000000020000): Operations error (1). Will retry later.
> [12/Nov/2014:16:55:12 +0100] NSMMReplicationPlugin -
> agmt="cn=Replication from ldap-edev.polytechnique.fr to
> ldap-model.polytechnique.fr" (ldap-model:636): Consumer failed to
> replay change (uniqueid 3c6d978a-6a8411e4-a42c901a-94623bee, CSN
> 546382d6000400020000): Operations error (1). Will retry later.
> [12/Nov/2014:16:56:31 +0100] NSMMReplicationPlugin -
> agmt="cn=Replication from ldap-edev.polytechnique.fr to
> ldap-model.polytechnique.fr" (ldap-model:636): Consumer failed to
> replay change (uniqueid 6030dd93-6a8411e4-a42c901a-94623bee, CSN
> 54638325000000020000): Operations error (1). Will retry later.
> [12/Nov/2014:16:57:22 +0100] NSMMReplicationPlugin -
> agmt="cn=Replication from ldap-edev.polytechnique.fr to
> ldap-model.polytechnique.fr" (ldap-model:636): Consumer failed to
> replay change (uniqueid 83f42395-6a8411e4-a42c901a-94623bee, CSN
> 5463835d000000020000): Operations error (1). Will retry later.
>
> The corresponding errors on the consumer seem to hint deadlocks in
> these cases:
> [12/Nov/2014:16:46:41 +0100] NSMMReplicationPlugin - changelog
> program - _cl5WriteOperationTxn: retry (49) the transaction
> (csn=546380d6000000020000) failed (rc=-30993 (BDB0068
> DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock))
> [12/Nov/2014:16:46:41 +0100] NSMMReplicationPlugin - changelog
> program - _cl5WriteOperationTxn: failed to write entry with csn
> (546380d6000000020000); db error - -30993 BDB0068
> DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock
> [12/Nov/2014:16:46:41 +0100] NSMMReplicationPlugin -
> write_changelog_and_ruv: can't add a change for
> cn=LAN452ESP-2014,ou=2014,ou=Cours,ou=Enseignement,ou=Groupes,dc=id,dc=polytechnique,dc=edu
> (uniqid: fa90219d-6a8211e4-a42c901a-94623bee, optype: 16) to
> changelog csn 546380d6000000020000
> [12/Nov/2014:16:47:54 +0100] NSMMReplicationPlugin - changelog
> program - _cl5WriteOperationTxn: retry (49) the transaction
> (csn=54638125000000020000) failed (rc=-30993 (BDB0068
> DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock))
> [12/Nov/2014:16:47:54 +0100] NSMMReplicationPlugin - changelog
> program - _cl5WriteOperationTxn: failed to write entry with csn
> (54638125000000020000); db error - -30993 BDB0068
> DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock
> [12/Nov/2014:16:47:54 +0100] NSMMReplicationPlugin -
> write_changelog_and_ruv: can't add a change for
> cn=LAN472EFLE-2014,ou=2014,ou=Cours,ou=Enseignement,ou=Groupes,dc=id,dc=polytechnique,dc=edu
> (uniqid: 1e5367ae-6a8311e4-a42c901a-94623bee, optype: 16) to
> changelog csn 54638125000000020000
> [12/Nov/2014:16:53:13 +0100] NSMMReplicationPlugin - changelog
> program - _cl5WriteOperationTxn: retry (49) the transaction
> (csn=54638260000000020000) failed (rc=-30993 (BDB0068
> DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock))
> [12/Nov/2014:16:53:13 +0100] NSMMReplicationPlugin - changelog
> program - _cl5WriteOperationTxn: failed to write entry with csn
> (54638260000000020000); db error - -30993 BDB0068
> DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock
> [12/Nov/2014:16:53:13 +0100] NSMMReplicationPlugin -
> write_changelog_and_ruv: can't add a change for
> cn=MAT471-2014,ou=2014,ou=Cours,ou=Enseignement,ou=Groupes,dc=id,dc=polytechnique,dc=edu
> (uniqid: f4e70b85-6a8311e4-a42c901a-94623bee, optype: 16) to
> changelog csn 54638260000000020000
> [12/Nov/2014:16:55:11 +0100] NSMMReplicationPlugin - changelog
> program - _cl5WriteOperationTxn: retry (49) the transaction
> (csn=546382d6000400020000) failed (rc=-30993 (BDB0068
> DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock))
> [12/Nov/2014:16:55:11 +0100] NSMMReplicationPlugin - changelog
> program - _cl5WriteOperationTxn: failed to write entry with csn
> (546382d6000400020000); db error - -30993 BDB0068
> DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock
> [12/Nov/2014:16:55:11 +0100] NSMMReplicationPlugin -
> write_changelog_and_ruv: can't add a change for
> cn=MEC592-2014,ou=2014,ou=Cours,ou=Enseignement,ou=Groupes,dc=id,dc=polytechnique,dc=edu
> (uniqid: 3c6d978a-6a8411e4-a42c901a-94623bee, optype: 16) to
> changelog csn 546382d6000400020000
> [12/Nov/2014:16:56:29 +0100] NSMMReplicationPlugin - changelog
> program - _cl5WriteOperationTxn: retry (49) the transaction
> (csn=54638325000000020000) failed (rc=-30993 (BDB0068
> DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock))
> [12/Nov/2014:16:56:29 +0100] NSMMReplicationPlugin - changelog
> program - _cl5WriteOperationTxn: failed to write entry with csn
> (54638325000000020000); db error - -30993 BDB0068
> DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock
> [12/Nov/2014:16:56:29 +0100] NSMMReplicationPlugin -
> write_changelog_and_ruv: can't add a change for
> cn=PHY566-2014,ou=2014,ou=Cours,ou=Enseignement,ou=Groupes,dc=id,dc=polytechnique,dc=edu
> (uniqid: 6030dd93-6a8411e4-a42c901a-94623bee, optype: 16) to
> changelog csn 54638325000000020000
> [12/Nov/2014:16:57:20 +0100] NSMMReplicationPlugin - changelog
> program - _cl5WriteOperationTxn: retry (49) the transaction
> (csn=5463835d000000020000) failed (rc=-30993 (BDB0068
> DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock))
> [12/Nov/2014:16:57:20 +0100] NSMMReplicationPlugin - changelog
> program - _cl5WriteOperationTxn: failed to write entry with csn
> (5463835d000000020000); db error - -30993 BDB0068
> DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock
> [12/Nov/2014:16:57:20 +0100] NSMMReplicationPlugin -
> write_changelog_and_ruv: can't add a change for
> cn=PHY651K-2014,ou=2014,ou=Cours,ou=Enseignement,ou=Groupes,dc=id,dc=polytechnique,dc=edu
> (uniqid: 83f42395-6a8411e4-a42c901a-94623bee, optype: 16) to
> changelog csn 5463835d000000020000
>
See https://bugzilla.redhat.com/show_bug.cgi?id=979169
>
>
> Each time the failed group entries are different.
>
> The replication agreements look liek this :
> 0 cn=Replication from ldap-edev.polytechnique.fr to
> ldap-model.polytechnique.fr,cn=replica,cn=dc\\3Did\\2Cdc\\3Dpolytechnique\\2Cdc\\3Dedu,cn=mapping
> tree,cn=config
> objectClass: top
> objectClass: nsDS5ReplicationAgreement
> cn: Replication from ldap-edev.polytechnique.fr to
> ldap-model.polytechnique.fr
> description: Replication agreement from server
> ldap-edev.polytechnique.fr to server ldap-model.polytechnique.fr
> nsDS5ReplicaHost: ldap-model.polytechnique.fr
> nsDS5ReplicaRoot: dc=id,dc=polytechnique,dc=edu
> nsDS5ReplicaPort: 636
> nsDS5ReplicaTransportInfo: SSL
> nsDS5ReplicaBindDN: cn=RepliX,cn=config
> nsDS5ReplicaBindMethod: simple
> nsDS5ReplicatedAttributeList: (objectclass=*) $ EXCLUDE entryusn
> memberOf
> nsDS5ReplicatedAttributeListTotal: (objectclass=*) $ EXCLUDE entryusn
> nsds5ReplicaStripAttrs: modifiersName modifyTimestamp
> internalModifiersName internalModifyTimestamp internalCreatorsname
> nsds5replicaBusyWaitTime: 3
> nsds5replicaTimeout: 30
> nsDS5ReplicaCredentials: {DES}...
> nsds5ReplicaEnabled: on
> nsds50ruv: {replicageneration} 54636df3000000020000
> nsds50ruv: {replica 1 ldap://ldap-model.polytechnique.fr:389}
> 54637b3c000000010000 54637dce000000010000
> nsds50ruv: {replica 2 ldap://ldap-edev.polytechnique.fr:389}
> 54636ffe000000020000 54637d2d000000020000
> nsruvReplicaLastModified: {replica 1
> ldap://ldap-model.polytechnique.fr:389} 00000000
> nsruvReplicaLastModified: {replica 2
> ldap://ldap-edev.polytechnique.fr:389} 00000000
> nsds5replicareapactive: 0
> nsds5replicaLastUpdateStart: 20141112173922Z
> nsds5replicaLastUpdateEnd: 20141112173922Z
> nsds5replicaChangesSentSinceStartup: 2:1899/21484
> nsds5replicaLastUpdateStatus: 0 Replica acquired successfully:
> Incremental update succeeded
> nsds5replicaUpdateInProgress: FALSE
> nsds5replicaLastInitStart: 0
> nsds5replicaLastInitEnd: 0
>
>
> The DNA plug-in is disabled (excluding
> https://fedorahosted.org/389/ticket/47410)
>
>
> I've made a ticket concerning this problem
> (https://fedorahosted.org/389/ticket/47410)
>
> The ticket for this problem is of course
> https://fedorahosted.org/389/ticket/47954, not the one i've written in
> the original mail.
>
>
>
>
>
>
> Another problem when doing these tests, with replication
> agreements suspended is the following message in the error log:
>
> 12/Nov/2014:16:15:35 +0100] - Retry count exceeded in modify
> [12/Nov/2014:16:15:38 +0100] - Retry count exceeded in modify
> [12/Nov/2014:16:15:40 +0100] - Retry count exceeded in modify
> [12/Nov/2014:16:15:43 +0100] - Retry count exceeded in modify
> [12/Nov/2014:16:15:45 +0100] - Retry count exceeded in modify
> [12/Nov/2014:16:15:47 +0100] - Retry count exceeded in modify
> [12/Nov/2014:16:15:50 +0100] - Retry count exceeded in modify
> [12/Nov/2014:16:15:52 +0100] - Retry count exceeded in modify
> [12/Nov/2014:16:15:55 +0100] - Retry count exceeded in modify
> [12/Nov/2014:16:15:57 +0100] - Retry count exceeded in modify
> [12/Nov/2014:16:15:59 +0100] - Retry count exceeded in modify
> [12/Nov/2014:16:16:02 +0100] - Retry count exceeded in modify
> [12/Nov/2014:16:16:04 +0100] - Retry count exceeded in modify
> [12/Nov/2014:16:16:06 +0100] - Retry count exceeded in modify
> [12/Nov/2014:16:16:09 +0100] - Retry count exceeded in modify
> [12/Nov/2014:16:16:11 +0100] - Retry count exceeded in modify
> [12/Nov/2014:16:16:14 +0100] - Retry count exceeded in modify
> [12/Nov/2014:16:16:17 +0100] - Retry count exceeded in modify
> [12/Nov/2014:16:16:19 +0100] - Retry count exceeded in modify
> [12/Nov/2014:16:16:22 +0100] - Retry count exceeded in modify
> [12/Nov/2014:16:16:24 +0100] - Retry count exceeded in modify
> ...
> (it goes on for approx 5 minutes)
> [12/Nov/2014:16:20:31 +0100] - Retry count exceeded in modify
> [12/Nov/2014:16:20:33 +0100] - Retry count exceeded in modify
> [12/Nov/2014:16:20:36 +0100] - Retry count exceeded in modify
> [12/Nov/2014:16:20:42 +0100] - Retry count exceeded in modify
>
>
> During these error messages the server doe snot consume CPU but it
> is does not allow any changes to continue. Do i need to open
> another ticket for this second problem?
>
> Thank you!
>
>
> Regards,
> Andrey
>
> --
> 389 users mailing list
> 389-users at lists.fedoraproject.org
> https://admin.fedoraproject.org/mailman/listinfo/389-users
>
>
>
>
> --
> 389 users mailing list
> 389-users at lists.fedoraproject.org
> https://admin.fedoraproject.org/mailman/listinfo/389-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.fedoraproject.org/pipermail/389-users/attachments/20141112/c9e15f4d/attachment.html>
More information about the 389-users
mailing list