[389-users] 1.2.7.5 process disappearing, replication failing

Andrew Kerr andrew.kerr at amdocs.com
Wed Feb 2 17:37:37 UTC 2011


I reinstalled the two replicas that were saying "No such object" and now they work - same exact cut-and-paste process that didn't work before.

The good news is that I am back up and running (phew, what a morning!).

I left one replica on 1.2.7.5, disabled behind our load balancer, so it is getting replicated to but no production traffic - with the intent of helping figure out what the problem is before others find it.  I'll get a bug report filed since this seems like something new.

FYI, these are all virtual machines (on a mix of vmware, kvm, and xen depending on the datacenter) and have very minimal installs, running no other apps, with no selinux or anything either.

-----Original Message-----
From: 389-users-bounces at lists.fedoraproject.org [mailto:389-users-bounces at lists.fedoraproject.org] On Behalf Of Andrew Kerr
Sent: Wednesday, February 02, 2011 11:44 AM
To: Rich Megginson; General discussion list for the 389 Directory server project.
Subject: Re: [389-users] 1.2.7.5 process disappearing, replication failing

The process is completely gone.  Doesn't show up in ps, and the pid referenced in the pid file doesn't exist.

I do have a lot of lines like this in my access log:
[02/Feb/2011:10:05:06 -0500] conn=4479 op=-1 fd=161 closed - B1

On the positive side, I was able to get some of the replicas downgraded to 1.2.4.  I had been deleting the server from the site under netscaproot and re-registering, but I hadn't re-created the replication agreement, I was just re-initializing the existing one.  Deleting it and creating a new one got rid of the error: "Unable to parse the response to the startReplication extended operation.  Replication is aborting".

Four of the six systems I put back to 1.2.4 (by removing the RPMs and blowing away all dirsrv relics left behind, reinstalling, and re-configuring).  Two of them I initialize and can see the directory, but when I do an ldapsearch remotely I get "result: 32 No such object".  More random/unpredictable behavior...


-----Original Message-----
From: Rich Megginson [mailto:rmeggins at redhat.com] 
Sent: Wednesday, February 02, 2011 11:10 AM
To: General discussion list for the 389 Directory server project.
Cc: Andrew Kerr
Subject: Re: [389-users] 1.2.7.5 process disappearing, replication failing

On 02/02/2011 09:06 AM, Andrew Kerr wrote:
> I'm running a single master with 13 replicas, all CentOS 5.5.  The master, and a few of the slaves, are running 1.2.7.5.  We were previously on 1.2.4, with most replicas still on that version.
You might be running into https://bugzilla.redhat.com/show_bug.cgi?id=668619
The symptom of that bug is your server will just stop responding to 
requests, including server-to-server requests like replication.  Your 
server will still be running.

Does ps -ef|grep slapd show your server process is running?
Do you see the messages like "op=-1 fd=66 closed - T2" in your access log?
> All of a sudden, the 1.2.7.5 replicas slapd process had just started to disappear.  Nothing in the error log with level at 8192.  Its just gone.  I can start it up and it'll last about 5 minutes.  Replication is what seems to be breaking - it seems to go away right after an update.
>
> I've tried rolling the replicas back to 1.2.4, but when I initialize the consumers I get "Unable to parse the response to the startReplication extended operation.  Replication is aborting".
>
> Any suggestions on where to go from this point?  It seems 1.2.7.5 is HIGHLY unstable.  But it seems it can't initialize 1.2.4 replicas (??), or maybe it just doesn't work at all.
>
> I'm not sure what the safe way is to roll back the master from 1.2.7.5, can I use "yum downgrade" safely?  At least now my  master and the replicas on 1.2.4 are working, I don't want to risk completely taking down ldap.
>
> Is there a good stable version I ought to be at?  I upgraded from 1.2.4 because of a number of other bugs, although none of them as bad as 1.2.7.5 seems to be.
>
> Thanks - any help is greatly appreciated.
>
> This message and the information contained herein is proprietary and confidential and subject to the Amdocs policy statement,
> you may review at http://www.amdocs.com/email_disclaimer.asp
> --
> 389 users mailing list
> 389-users at lists.fedoraproject.org
> https://admin.fedoraproject.org/mailman/listinfo/389-users

--
389 users mailing list
389-users at lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/389-users



More information about the 389-users mailing list