Thanks for the information. I'm actually running 6.5 not 6.6. The latest version
I'm seeing for 6.5 is 1.2.11.15-34.el6_5. Is that version for 6.5 about the same (in
terms of bug fixes) as 1.2.11.15-47 in 6.6? If so, I'll check out 1.2.11.15-34 in
6.5. Otherwise, I'll upgrade to 6.6 first. Appreciate the help.
Thanks!
- Shilen
From: Rich Megginson <rmeggins@redhat.com<mailto:rmeggins@redhat.com>>
Reply-To:
"389-users@lists.fedoraproject.org<mailto:389-users@lists.fedoraproject.org>"
<389-users@lists.fedoraproject.org<mailto:389-users@lists.fedoraproject.org>>
Date: Wednesday, October 22, 2014 at 1:10 PM
To:
"389-users@lists.fedoraproject.org<mailto:389-users@lists.fedoraproject.org>"
<389-users@lists.fedoraproject.org<mailto:389-users@lists.fedoraproject.org>>
Subject: Re: [389-users] Error code 51 and replication errors
On 10/22/2014 10:58 AM, Shilen Patel wrote:
1.2.11.15 is a couple of years old?
Yes and no. 1.2.11.15 was the starting point for EL6. However, many, many features and
fixes have been backported from later versions into 1.2.11.15-47 in EL 6.6.
I had to upgrade to the latest in copr because of another issue that I think was fixed in
1.2.11.30.
Has that issue been fixed in 1.2.11.15-47 in EL 6.6? I know a lot of 389 community
members running on EL6 were using fedorapeople/copr repos because they could not wait
until those fixes/features were available in EL 6.6. Now that EL 6.6 is out, I encourage
you (and anyone else in this situation) to stop using fedorapeople/copr builds and instead
use 1.2.11.15-47 in EL 6.6.
If I'm misunderstanding version numbers in EL vs copr, please let me know.
See above.
But my main question is the second question regarding best practices for detecting
replication failures and I think that applies to all versions?
nsds5replicaLastUpdateStatus is the documented way to get replication status. The fact
that this error is not being reported that way seems like a bug.
You can also monitor the errors logs.
As for this particular problem, see
https://fedorahosted.org/389/ticket/47409
Thanks!
- Shilen
From: Rich Megginson <rmeggins@redhat.com<mailto:rmeggins@redhat.com>>
Reply-To:
"389-users@lists.fedoraproject.org<mailto:389-users@lists.fedoraproject.org>"
<389-users@lists.fedoraproject.org<mailto:389-users@lists.fedoraproject.org>>
Date: Wednesday, October 22, 2014 at 12:14 PM
To:
"389-users@lists.fedoraproject.org<mailto:389-users@lists.fedoraproject.org>"
<389-users@lists.fedoraproject.org<mailto:389-users@lists.fedoraproject.org>>
Subject: Re: [389-users] Error code 51 and replication errors
On 10/22/2014 10:10 AM, Shilen Patel wrote:
389-ds-base-1.2.11.32-1.el6.x86_64
I would strongly encourage you to use the version provided with EL 6.6, which is
389-ds-base-1.2.11.15-47. It looks like you are using a build from the old rmeggins repo
or the newer copr repo. These are really only for those users who needed critical fixes
or features not yet in the "supported" EL6.6 version. I don't know if that
will fix your problem, but it will make it a lot easier to support.
Thanks!
- Shilen
From: Rich Megginson <rmeggins@redhat.com<mailto:rmeggins@redhat.com>>
Reply-To:
"389-users@lists.fedoraproject.org<mailto:389-users@lists.fedoraproject.org>"
<389-users@lists.fedoraproject.org<mailto:389-users@lists.fedoraproject.org>>
Date: Wednesday, October 22, 2014 at 12:07 PM
To:
"389-users@lists.fedoraproject.org<mailto:389-users@lists.fedoraproject.org>"
<389-users@lists.fedoraproject.org<mailto:389-users@lists.fedoraproject.org>>
Subject: Re: [389-users] Error code 51 and replication errors
On 10/22/2014 09:54 AM, Shilen Patel wrote:
Hi,
I'm running 1.2.11.32.
What is output of rpm -q 389-ds-base?
I have 6 replicas (two of which are read-only). I ran into an issue where a DELETE
operation failed on a server with error code 51 (ldap busy).
[21/Oct/2014:23:44:44 -0400] conn=78160 op=39510 RESULT err=51 tag=107 nentries=0 etime=3
csn=5447282c000300050000
The application retried the delete several times for a couple of hours (while the server
wasn't getting any other requests) and the result was always the same (err=51). Each
time that happened, the error log had the following:
[21/Oct/2014:23:44:44 -0400] - Retry count exceeded in delete
My first question is, what would cause a problem like this?
I simply restarted that directory and then the update succeeded. However, when the update
went to the other 5 servers, they failed in the same way and the same error was logged in
their log files. But the update wasn't retried. It was just skipped and future
updates via replication succeeded on those 5 servers.
My second question is, what's the best way to monitor for these types of replication
errors? In this case, nsds5replicaLastUpdateStatus did not indicate a problem. If I had
not been looking at the error file on those 5 hosts, I'm wondering how I would have
known that a delete failed to replicate to them. If the answer is to just have something
monitoring the error log files, are there specific search strings to look for to separate
out updates that have failed and won't be retried from other errors (e.g. temporary
connection issues)? Just curious if there is a best practice here.
Thanks!
- Shilen
--
389 users mailing list
389-users@lists.fedoraproject.org<mailto:389-users@lists.fedoraproject.org>https://admin.fedoraproject.org/mailman/listinfo/389-users
--
389 users mailing list
389-users@lists.fedoraproject.org<mailto:389-users@lists.fedoraproject.org>https://admin.fedoraproject.org/mailman/listinfo/389-users
--
389 users mailing list
389-users@lists.fedoraproject.org<mailto:389-users@lists.fedoraproject.org>https://admin.fedoraproject.org/mailman/listinfo/389-users