[389-users] MMR issue

Grzegorz Dwornicki gd1100 at gmail.com
Tue Aug 7 17:11:51 UTC 2012


Hi

I must say this ldap replication connections look quite unusual. Can you
provide more information about:
- type of replication servers? Some servers i guest are masters and some
are maybe slaves?
- Does errors occur when you try to initiate replication manually?

Some errors suggests that there maybe other replication/ldap operations in
progress, then target server sends message about lockout:

[19/Jul/2012:07:28:50 -0300] NSMMReplicationPlugin - conn=7835 op=160267
repl="o=BASE": Replica in use locking_purl=conn=7831 id=3

[19/Jul/2012:07:28:50 -0300] NSMMReplicationPlugin - conn=7835 op=160267
replica="o=BASE": Unable to acquire replica: error: replica busy locked by
conn=7831 id=3 for incremental update
[19/Jul/2012:07:28:50 -0300] NSMMReplicationPlugin - conn=7835 op=160267
repl="o=base": StartNSDS90ReplicationRequest: response=1 rc=0

Other error suggest that there mey be no connection between servers. Maybe
target server is to busy to respond or maybe network/firewall problem:

[19/Jul/2012:13:28:48 -0300] NSMMReplicationPlugin - agmt="cn=A-to-B"
(A:389): Unable to receive the response for a startReplication extended
operation to consumer (Timed out). Will retry later.
[19/Jul/2012:13:34:17 -0300] NSMMReplicationPlugin - agmt="cn=A-to-B"
(A:389): Unable to receive the response for a startReplication extended
operation to consumer (Can't contact LDAP server). Will retry later.
(...)

Please provide infromation about replication types. Try manually initiated
replication and monitor logs carefully. This may provide more information.
If you want to push updates from one server to others, then please consider
using multi-master connections and hub server (look in red hat docs for
more details)

Greg.

2012/8/7 Reinhard Nappert <rnappert at juniper.net>

> Has somebody seen this problem as well?****
>
> ** **
>
> -Reinhard****
>
> ** **
>
> *From:* 389-users-bounces at lists.fedoraproject.org [mailto:
> 389-users-bounces at lists.fedoraproject.org] *On Behalf Of *Reinhard Nappert
> *Sent:* Friday, August 03, 2012 2:51 PM
> *To:* 389-users at lists.fedoraproject.org
> *Subject:* [389-users] MMR issue****
>
> ** **
>
> Hi,****
>
> ** **
>
> I have the following 389 DS version deployed:  389-Directory/1.2.8.2B2011.130.190
> ****
>
> ** **
>
> I have a 3 box multi-master replication setup in a ring:        ****
>
> ** **
>
> ** **
>
>               \     /        \     /           \     /       \     /
>    \     /   ****
>
>            …   C   -----   A    -----    B   -----  C   ----- A …****
>
>               /      \       /      \          /      \     /      \
>   /      \****
>
> ** **
>
> The replication agreements for “A” and “C” and for “B” and “C” work fine,
> but I have an issue for the agreements for the “A” and “B” connection.****
>
> ** **
>
> I see the following in the errors file:****
>
> ** **
>
> Server A:****
>
> [19/Jul/2012:07:28:50 -0300] NSMMReplicationPlugin - conn=7835 op=160267
> repl="o=base": Begin incremental protocol****
>
> [19/Jul/2012:07:28:50 -0300] - csngen_adjust_time: gen state before
> 5007e1610000:1342693727:0:2****
>
> [19/Jul/2012:07:28:50 -0300] - _csngen_adjust_local_time: gen state before
> 5007e1610000:1342693727:0:2****
>
> [19/Jul/2012:07:28:50 -0300] - _csngen_adjust_local_time: gen state after
> 5007e1640000:1342693730:0:2****
>
> [19/Jul/2012:07:28:50 -0300] NSMMReplicationPlugin - conn=7835 op=160267
> repl="o=BASE": Replica in use locking_purl=conn=7831 id=3****
>
> [19/Jul/2012:07:28:50 -0300] NSMMReplicationPlugin - conn=7835 op=160267
> replica="o=BASE": Unable to acquire replica: error: replica busy locked by
> conn=7831 id=3 for incremental update****
>
> [19/Jul/2012:07:28:50 -0300] NSMMReplicationPlugin - conn=7835 op=160267
> repl="o=base": StartNSDS90ReplicationRequest: response=1 rc=0****
>
> ** **
>
> This kind of error is logged in an interval of about 1 second, where the
> local_time differs 5007e1610000:1342693727:0:2****
>
> ** **
>
> ** **
>
> Server B:****
>
> [19/Jul/2012:13:28:48 -0300] NSMMReplicationPlugin - agmt="cn=A-to-B"
> (A:389): Unable to receive the response for a startReplication extended
> operation to consumer (Timed out). Will retry later.****
>
> [19/Jul/2012:13:34:17 -0300] NSMMReplicationPlugin - agmt="cn=A-to-B"
> (A:389): Unable to receive the response for a startReplication extended
> operation to consumer (Can't contact LDAP server). Will retry later.****
>
> [19/Jul/2012:13:44:25 -0300] slapi_ldap_bind - Error: timeout after [0.0]
> seconds reading bind response for [cn=replication,cn=config] mech [SIMPLE]
> ****
>
> [19/Jul/2012:13:44:25 -0300] NSMMReplicationPlugin - agmt="cn=A-to-B"
> (A:389): Replication bind with SIMPLE auth failed: LDAP error 85 (Timed
> out) ((null))****
>
> [19/Jul/2012:13:44:25 -0300] NSMMReplicationPlugin - agmt="cn=A-to-B"
> (A:389): Replication bind with SIMPLE auth resumed****
>
> ** **
>
> Sometimes, I also see the following error****
>
> [20/Jul/2012:11:28:39 -0300] slapi_ldap_bind - Error: could not send bind
> request for id [cn= replication,cn=config] mech [SIMPLE]: error 91 (Can't
> connect to the LDAP server) -5961 (TCP connection reset by peer.) 115
> (Operation now in progress)****
>
> [20/Jul/2012:11:28:39 -0300] NSMMReplicationPlugin - agmt="cn=A-to-B"
> (A:389): Replication bind with SIMPLE auth failed: LDAP error 91 (Can't
> connect to the LDAP server) ((null))****
>
> [20/Jul/2012:11:30:30 -0300] NSMMReplicationPlugin - agmt="cn=A-to-B"
> (A:389): Replication bind with SIMPLE auth resumed****
>
> ** **
>
> I don’t see any indication that Server B was down at that time.****
>
> ** **
>
> I did see the Bug 571677 (
> https://bugzilla.redhat.com/show_bug.cgi?id=571677), but there was no
> deletion of a replicaconflict object.****
>
> ** **
>
> Did anybody encounter this kind of issue? The next question would be: How
> to recover the MMR environment.****
>
> ** **
>
> Thanks,****
>
> -Reinhard****
>
> ** **
>
> ** **
>
> ** **
>
> --
> 389 users mailing list
> 389-users at lists.fedoraproject.org
> https://admin.fedoraproject.org/mailman/listinfo/389-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.fedoraproject.org/pipermail/389-users/attachments/20120807/a2ab8656/attachment.html>


More information about the 389-users mailing list