[389-users] changelog deadlock replication failures with DNA

thierry bordaz tbordaz at redhat.com
Thu Jun 13 10:59:10 UTC 2013


On 06/12/2013 10:00 PM, Mahadevan, Venkat wrote:
>
> Hello,
>
> While doing multiple adds using POSIX uidNumbers and the DNA plugin,
>
> I have noticed errors such as the following:
>
> [12/Jun/2013:11:43:24 -0700] NSMMReplicationPlugin - changelog program 
> - _cl5WriteOperationTxn: retry (49) the transaction 
> (csn=51b8c148001e02be0000) failed (rc=-30994 (DB_LOCK_DEADLOCK: Locker 
> killed to resolve a deadlock))
>
> [12/Jun/2013:11:43:24 -0700] NSMMReplicationPlugin - changelog program 
> - _cl5WriteOperationTxn: failed to write entry with csn 
> (51b8c148001e02be0000); db error - -30994 DB_LOCK_DEADLOCK: Locker 
> killed to resolve a deadlock
>
> [12/Jun/2013:11:43:24 -0700] NSMMReplicationPlugin - 
> write_changelog_and_ruv: can't add a change for 
> uid=jmeter429,dc=tst,dc=id,dc=ubc,dc=ca (uniqid: 
> e62c908c-d38f11e2-96fdeacd-f14f05d6, optype: 16) to changelog csn 
> 51b8c148001e02be0000
>
> [12/Jun/2013:11:43:36 -0700] NSMMReplicationPlugin - changelog program 
> - _cl5WriteOperationTxn: retry (49) the transaction 
> (csn=51b8c154004002be0000) failed (rc=-30994 (DB_LOCK_DEADLOCK: Locker 
> killed to resolve a deadlock))
>
> [12/Jun/2013:11:43:36 -0700] NSMMReplicationPlugin - 
> write_changelog_and_ruv: can't add a change for 
> uid=jmeter797,dc=tst,dc=id,dc=ubc,dc=ca (uniqid: 
> e62c9143-d38f11e2-96fdeacd-f14f05d6, optype: 16) to changelog csn 
> 51b8c154004002be0000
>
Hi Mahadevan,

    This means that server was unabled (because of too many retries due
    to deadlock) to write the update in the changelog.
    This triggers the failure of the operation. If it is on the
    consumer, that means that the supplier will retry later to send the
    update and as you have 2 differents CSN in the log I think the
    updates are also progressing on consumers side. Now they can be late.
    I do not know why it is occurring. Deadlock is quite rare because
    under default deployment threads are synchronized by a backend lock.
    Is the dse.ldif available somewhere ?

best regards
thierry


> The net effect of these errors is that an entry will be added to the 
> Replication master but
>
> will not sync down to any of the consumers. I am assuming because it 
> is not added
>
> to the changelog database correctly. Doing a bit of research, I 
> tracked this down:
>
> https://bugzilla.redhat.com/show_bug.cgi?format=multiple&id=907985
>
> And there is also an advisory from RedHat that this bug has been 
> fixed: https://rhn.redhat.com/errata/RHSA-2013-0742.html
>
> “A problem in the lock timing in the DNA plug-in caused a deadlock if the
> DNA operation was executed with other plug-ins. This update moves the
> release timing of the problematic lock, and the DNA plug-in does not cause
> the deadlock. (BZ#929196)”
>
> I am running RHEL 6.4
>
> and 389-ds-base.x86_64 1.2.11.15-14.el6_4 @rhel-x86_64-server-6
>
> So this bug should not be occurring? Should I upgrade to a version of 
> 389-ds-base supplied by EPEL instead of Redhat? Any
>
> insight is most appreciated. Thank you.
>
> Kind regards,
>
> VM
>
>
>
> --
> 389 users mailing list
> 389-users at lists.fedoraproject.org
> https://admin.fedoraproject.org/mailman/listinfo/389-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.fedoraproject.org/pipermail/389-users/attachments/20130613/d0c10df9/attachment.html>


More information about the 389-users mailing list