On 11/18/2011 11:46 AM, Daniel Fenert wrote:
W dniu 2011-11-18 14:42, Rich Megginson pisze:
> On 11/18/2011 05:08 AM, Daniel Fenert wrote:
>> Hi,
>>
>> I'm using 389ds 1.2.5 with replication, my current setup:
>>
>> Master
>> | \
>> L1 L2
>> | \ | \
>> S1 S2 S3 S4
>>
>> L* - acting as slave to "master" and master to "S*"
>> S* - slaves to L*
>>
>>
>> From time to time (usually few months between problems) we encounter
>> "master" going to some infinite loop.
>> After analyzing access log, it looks like it stops doing queries, and
>> accepts new connections until it runs out of fd's.
>> After that, it won't stop peacefully, only SIGKILL saves the day.
>>
>> Workload:
>> Master is used only for updates, maybe 20 connections/s.
>> L* are used only for replication.
>> All bind's and search queries are targeted to S* which are read only.
>>
>> With previous setup (less complicated), we've also seen this problem:
>> Master
>> | | | \
>> S1 S2 S3 S4...
>>
>> Is there a chance that upgrading to latest version will fix the
>> problem?
>> Were there any fixes nearby? Upgrade will be complex as hell ;)
>>
>> Error log from last problem:
>> - Not listening for new connections - too many fds open
> Have you tried increasing the number of fds to 8192?
Yes, but it doesn't make sense - during normal operation master uses
no more than 50-60 fd's.
Right. I'm not suggesting this is the root cause
of the problem, but
increasing the number of fds could help reduce the occurance of the problem.
>> - slapd shutting down - signaling operation threads
>> - slapd shutting down - waiting for 120 threads to terminate
> Does the server shutdown on its own, or did you shut it down normally
> (i.e. service dirsrv stop)?
We have tried to stop it using init.d scripts.
120 threads? Did you increase
nsslapd-threadnumber?
If not, then I'm very curious about what all those threads are doing.
>> ... SIGKILL ...
>> - 389-Directory/1.2.5 B2010.012.2034 starting up
>> - Detected Disorderly Shutdown last time Directory Server was
>> running,
>> recovering database.
>> - slapd started. Listening on All Interfaces port 389 for LDAP
>> requests
>>
>> Number of fds: 4096.
> Since 1.2.5 we have fixed a number of bugs around connection
> handling. You might find that 1.2.9.9 (current stable version) works
> much better for you.
OK, we'll try to upgrade.
How to upgrade such complex setup?
Should we try top-to-bottom approach (master first, then L*, then S*)
or bottom-to-top (S*, L*, master last)?
bottom to top
Shutting down all servers is not really an option.