[389-users] Error code 51 and replication errors
Rich Megginson
rmeggins at redhat.com
Wed Oct 22 17:10:52 UTC 2014
On 10/22/2014 10:58 AM, Shilen Patel wrote:
> 1.2.11.15 is a couple of years old?
Yes and no. 1.2.11.15 was the starting point for EL6. However, many,
many features and fixes have been backported from later versions into
1.2.11.15-47 in EL 6.6.
> I had to upgrade to the latest in copr because of another issue that I
> think was fixed in 1.2.11.30.
Has that issue been fixed in 1.2.11.15-47 in EL 6.6? I know a lot of
389 community members running on EL6 were using fedorapeople/copr repos
because they could not wait until those fixes/features were available in
EL 6.6. Now that EL 6.6 is out, I encourage you (and anyone else in
this situation) to stop using fedorapeople/copr builds and instead use
1.2.11.15-47 in EL 6.6.
> If I’m misunderstanding version numbers in EL vs copr, please let me know.
See above.
> But my main question is the second question regarding best practices
> for detecting replication failures and I think that applies to all
> versions?
nsds5replicaLastUpdateStatus is the documented way to get replication
status. The fact that this error is not being reported that way seems
like a bug.
You can also monitor the errors logs.
As for this particular problem, see
https://fedorahosted.org/389/ticket/47409
>
> Thanks!
>
> — Shilen
>
> From: Rich Megginson <rmeggins at redhat.com <mailto:rmeggins at redhat.com>>
> Reply-To: "389-users at lists.fedoraproject.org
> <mailto:389-users at lists.fedoraproject.org>"
> <389-users at lists.fedoraproject.org
> <mailto:389-users at lists.fedoraproject.org>>
> Date: Wednesday, October 22, 2014 at 12:14 PM
> To: "389-users at lists.fedoraproject.org
> <mailto:389-users at lists.fedoraproject.org>"
> <389-users at lists.fedoraproject.org
> <mailto:389-users at lists.fedoraproject.org>>
> Subject: Re: [389-users] Error code 51 and replication errors
>
> On 10/22/2014 10:10 AM, Shilen Patel wrote:
>>
>> 389-ds-base-1.2.11.32-1.el6.x86_64
>>
>
> I would strongly encourage you to use the version provided with EL
> 6.6, which is 389-ds-base-1.2.11.15-47. It looks like you are
> using a build from the old rmeggins repo or the newer copr repo.
> These are really only for those users who needed critical fixes or
> features not yet in the "supported" EL6.6 version. I don't know
> if that will fix your problem, but it will make it a lot easier to
> support.
>
>
>>
>> Thanks!
>>
>> — Shilen
>>
>> From: Rich Megginson <rmeggins at redhat.com
>> <mailto:rmeggins at redhat.com>>
>> Reply-To: "389-users at lists.fedoraproject.org
>> <mailto:389-users at lists.fedoraproject.org>"
>> <389-users at lists.fedoraproject.org
>> <mailto:389-users at lists.fedoraproject.org>>
>> Date: Wednesday, October 22, 2014 at 12:07 PM
>> To: "389-users at lists.fedoraproject.org
>> <mailto:389-users at lists.fedoraproject.org>"
>> <389-users at lists.fedoraproject.org
>> <mailto:389-users at lists.fedoraproject.org>>
>> Subject: Re: [389-users] Error code 51 and replication errors
>>
>> On 10/22/2014 09:54 AM, Shilen Patel wrote:
>>> Hi,
>>>
>>> I’m running 1.2.11.32.
>>
>> What is output of rpm -q 389-ds-base?
>>
>>> I have 6 replicas (two of which are read-only). I ran into
>>> an issue where a DELETE operation failed on a server with
>>> error code 51 (ldap busy).
>>>
>>> [21/Oct/2014:23:44:44 -0400] conn=78160 op=39510 RESULT
>>> err=51 tag=107 nentries=0 etime=3 csn=5447282c000300050000
>>>
>>>
>>> The application retried the delete several times for a
>>> couple of hours (while the server wasn’t getting any other
>>> requests) and the result was always the same (err=51). Each
>>> time that happened, the error log had the following:
>>>
>>> [21/Oct/2014:23:44:44 -0400] - Retry count exceeded in delete
>>>
>>>
>>> My first question is, what would cause a problem like this?
>>>
>>> I simply restarted that directory and then the update
>>> succeeded. However, when the update went to the other 5
>>> servers, they failed in the same way and the same error was
>>> logged in their log files. But the update wasn’t retried.
>>> It was just skipped and future updates via replication
>>> succeeded on those 5 servers.
>>>
>>> My second question is, what’s the best way to monitor for
>>> these types of replication errors? In this
>>> case, nsds5replicaLastUpdateStatus did not indicate a
>>> problem. If I had not been looking at the error file on
>>> those 5 hosts, I’m wondering how I would have known that a
>>> delete failed to replicate to them. If the answer is to
>>> just have something monitoring the error log files, are
>>> there specific search strings to look for to separate out
>>> updates that have failed and won’t be retried from other
>>> errors (e.g. temporary connection issues)? Just curious if
>>> there is a best practice here.
>>>
>>> Thanks!
>>>
>>> — Shilen
>>>
>>>
>>> --
>>> 389 users mailing list
>>> 389-users at lists.fedoraproject.orghttps://admin.fedoraproject.org/mailman/listinfo/389-users
>>
>>
>>
>> --
>> 389 users mailing list
>> 389-users at lists.fedoraproject.orghttps://admin.fedoraproject.org/mailman/listinfo/389-users
>
>
>
> --
> 389 users mailing list
> 389-users at lists.fedoraproject.org
> https://admin.fedoraproject.org/mailman/listinfo/389-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.fedoraproject.org/pipermail/389-users/attachments/20141022/18ea6da1/attachment.html>
More information about the 389-users
mailing list