Re: [389-users] Master caught in infinite loop

Friday, 18 November 2011

On 11/18/2011 11:46 AM, Daniel Fenert wrote:
...
 W dniu 2011-11-18 14:42, Rich Megginson pisze:
> On 11/18/2011 05:08 AM, Daniel Fenert wrote:
>> Hi,
>>
>> I'm using 389ds 1.2.5 with replication, my current setup:
>>
>> Master
>> |     \
>> L1     L2
>> | \    |  \
>> S1 S2 S3  S4
>>
>> L* - acting as slave to "master" and master to "S*"
>> S* - slaves to L*
>>
>>
>>  From time to time (usually few months between problems) we encounter
>> "master" going to some infinite loop.
>> After analyzing access log, it looks like it stops doing queries, and
>> accepts new connections until it runs out of fd's.
>> After that, it won't stop peacefully, only SIGKILL saves the day.
>>
>> Workload:
>> Master is used only for updates, maybe 20 connections/s.
>> L* are used only for replication.
>> All bind's and search queries are targeted to S* which are read only.
>>
>> With previous setup (less complicated), we've also seen this problem:
>> Master
>> |  |  |  \
>> S1 S2 S3  S4...
>>
>> Is there a chance that upgrading to latest version will fix the 
>> problem?
>> Were there any fixes nearby? Upgrade will be complex as hell ;)
>>
>> Error log from last problem:
>>   - Not listening for new connections - too many fds open
> Have you tried increasing the number of fds to 8192?

 Yes, but it doesn't make sense - during normal operation master uses 
 no more than 50-60 fd's. Right.  I'm not suggesting this is the root cause
of the problem, but 
increasing the number of fds could help reduce the occurance of the problem.
...

>>   - slapd shutting down - signaling operation threads
>>   - slapd shutting down - waiting for 120 threads to terminate
> Does the server shutdown on its own, or did you shut it down normally 
> (i.e. service dirsrv stop)?

 We have tried to stop it using init.d scripts. 120 threads?  Did you increase
nsslapd-threadnumber?
If not, then I'm very curious about what all those threads are doing.
...

>> ... SIGKILL ...
>>   - 389-Directory/1.2.5 B2010.012.2034 starting up
>>   - Detected Disorderly Shutdown last time Directory Server was 
>> running,
>> recovering database.
>>   - slapd started.  Listening on All Interfaces port 389 for LDAP 
>> requests
>>
>> Number of fds: 4096.
> Since 1.2.5 we have fixed a number of bugs around connection 
> handling.  You might find that 1.2.9.9 (current stable version) works 
> much better for you.

 OK, we'll try to upgrade.

 How to upgrade such complex setup?
 Should we try top-to-bottom approach (master first, then L*, then S*) 
 or bottom-to-top (S*, L*, master last)? bottom to top
...
 Shutting down all servers is not really an option.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [389-users] Master caught in infinite loop