[389-users] Error code 51 and replication errors

Wed Oct 22 17:10:52 UTC 2014

On 10/22/2014 10:58 AM, Shilen Patel wrote:
> 1.2.11.15 is a couple of years old?

Yes and no.  1.2.11.15 was the starting point for EL6.  However, many, 
many features and fixes have been backported from later versions into 
1.2.11.15-47 in EL 6.6.

> I had to upgrade to the latest in copr because of another issue that I 
> think was fixed in 1.2.11.30.

Has that issue been fixed in 1.2.11.15-47 in EL 6.6?  I know a lot of 
389 community members running on EL6 were using fedorapeople/copr repos 
because they could not wait until those fixes/features were available in 
EL 6.6.  Now that EL 6.6 is out, I encourage you (and anyone else in 
this situation) to stop using fedorapeople/copr builds and instead use 
1.2.11.15-47 in EL 6.6.

> If I’m misunderstanding version numbers in EL vs copr, please let me know.

See above.

> But my main question is the second question regarding best practices 
> for detecting replication failures and I think that applies to all 
> versions?

nsds5replicaLastUpdateStatus is the documented way to get replication 
status.  The fact that this error is not being reported that way seems 
like a bug.
You can also monitor the errors logs.

As for this particular problem, see 
https://fedorahosted.org/389/ticket/47409

>
> Thanks!
>
> — Shilen
>
> From: Rich Megginson <rmeggins at redhat.com <mailto:rmeggins at redhat.com>>
> Reply-To: "389-users at lists.fedoraproject.org 
> <mailto:389-users at lists.fedoraproject.org>" 
> <389-users at lists.fedoraproject.org 
> <mailto:389-users at lists.fedoraproject.org>>
> Date: Wednesday, October 22, 2014 at 12:14 PM
> To: "389-users at lists.fedoraproject.org 
> <mailto:389-users at lists.fedoraproject.org>" 
> <389-users at lists.fedoraproject.org 
> <mailto:389-users at lists.fedoraproject.org>>
> Subject: Re: [389-users] Error code 51 and replication errors
>
>     On 10/22/2014 10:10 AM, Shilen Patel wrote:
>>
>>     389-ds-base-1.2.11.32-1.el6.x86_64
>>
>
>     I would strongly encourage you to use the version provided with EL
>     6.6, which is 389-ds-base-1.2.11.15-47.  It looks like you are
>     using a build from the old rmeggins repo or the newer copr repo. 
>     These are really only for those users who needed critical fixes or
>     features not yet in the "supported" EL6.6 version.  I don't know
>     if that will fix your problem, but it will make it a lot easier to
>     support.
>
>
>>
>>     Thanks!
>>
>>     — Shilen
>>
>>     From: Rich Megginson <rmeggins at redhat.com
>>     <mailto:rmeggins at redhat.com>>
>>     Reply-To: "389-users at lists.fedoraproject.org
>>     <mailto:389-users at lists.fedoraproject.org>"
>>     <389-users at lists.fedoraproject.org
>>     <mailto:389-users at lists.fedoraproject.org>>
>>     Date: Wednesday, October 22, 2014 at 12:07 PM
>>     To: "389-users at lists.fedoraproject.org
>>     <mailto:389-users at lists.fedoraproject.org>"
>>     <389-users at lists.fedoraproject.org
>>     <mailto:389-users at lists.fedoraproject.org>>
>>     Subject: Re: [389-users] Error code 51 and replication errors
>>
>>         On 10/22/2014 09:54 AM, Shilen Patel wrote:
>>>         Hi,
>>>
>>>         I’m running 1.2.11.32.
>>
>>         What is output of rpm -q 389-ds-base?
>>
>>>         I have 6 replicas (two of which are read-only).  I ran into
>>>         an issue where a DELETE operation failed on a server with
>>>         error code 51 (ldap busy).
>>>
>>>         [21/Oct/2014:23:44:44 -0400] conn=78160 op=39510 RESULT
>>>         err=51 tag=107 nentries=0 etime=3 csn=5447282c000300050000
>>>
>>>
>>>         The application retried the delete several times for a
>>>         couple of hours (while the server wasn’t getting any other
>>>         requests) and the result was always the same (err=51).  Each
>>>         time that happened, the error log had the following:
>>>
>>>         [21/Oct/2014:23:44:44 -0400] - Retry count exceeded in delete
>>>
>>>
>>>         My first question is, what would cause a problem like this?
>>>
>>>         I simply restarted that directory and then the update
>>>         succeeded.  However, when the update went to the other 5
>>>         servers, they failed in the same way and the same error was
>>>         logged in their log files.  But the update wasn’t retried.
>>>          It was just skipped and future updates via replication
>>>         succeeded on those 5 servers.
>>>
>>>         My second question is, what’s the best way to monitor for
>>>         these types of replication errors?  In this
>>>         case, nsds5replicaLastUpdateStatus did not indicate a
>>>         problem.  If I had not been looking at the error file on
>>>         those 5 hosts, I’m wondering how I would have known that a
>>>         delete failed to replicate to them.  If the answer is to
>>>         just have something monitoring the error log files, are
>>>         there specific search strings to look for to separate out
>>>         updates that have failed and won’t be retried from other
>>>         errors (e.g. temporary connection issues)?  Just curious if
>>>         there is a best practice here.
>>>
>>>         Thanks!
>>>
>>>         — Shilen
>>>
>>>
>>>         --
>>>         389 users mailing list
>>>         389-users at lists.fedoraproject.orghttps://admin.fedoraproject.org/mailman/listinfo/389-users
>>
>>
>>
>>     --
>>     389 users mailing list
>>     389-users at lists.fedoraproject.orghttps://admin.fedoraproject.org/mailman/listinfo/389-users
>
>
>
> --
> 389 users mailing list
> 389-users at lists.fedoraproject.org
> https://admin.fedoraproject.org/mailman/listinfo/389-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.fedoraproject.org/pipermail/389-users/attachments/20141022/18ea6da1/attachment.html>