W dniu 2011-11-18 14:42, Rich Megginson pisze:
On 11/18/2011 05:08 AM, Daniel Fenert wrote:
> Hi,
>
> I'm using 389ds 1.2.5 with replication, my current setup:
>
> Master
> | \
> L1 L2
> | \ | \
> S1 S2 S3 S4
>
> L* - acting as slave to "master" and master to "S*"
> S* - slaves to L*
>
>
> From time to time (usually few months between problems) we encounter
> "master" going to some infinite loop.
> After analyzing access log, it looks like it stops doing queries, and
> accepts new connections until it runs out of fd's.
> After that, it won't stop peacefully, only SIGKILL saves the day.
>
> Workload:
> Master is used only for updates, maybe 20 connections/s.
> L* are used only for replication.
> All bind's and search queries are targeted to S* which are read only.
>
> With previous setup (less complicated), we've also seen this problem:
> Master
> | | | \
> S1 S2 S3 S4...
>
> Is there a chance that upgrading to latest version will fix the problem?
> Were there any fixes nearby? Upgrade will be complex as hell ;)
>
> Error log from last problem:
> - Not listening for new connections - too many fds open
Have you tried increasing the number of fds to 8192?
Yes, but it doesn't make sense - during normal operation master uses no
more than 50-60 fd's.
> - slapd shutting down - signaling operation threads
> - slapd shutting down - waiting for 120 threads to terminate
Does the server shutdown on its own, or did you shut it down normally
(i.e. service dirsrv stop)?
We have tried to stop it using init.d scripts.
> ... SIGKILL ...
> - 389-Directory/1.2.5 B2010.012.2034 starting up
> - Detected Disorderly Shutdown last time Directory Server was running,
> recovering database.
> - slapd started. Listening on All Interfaces port 389 for LDAP
> requests
>
> Number of fds: 4096.
Since 1.2.5 we have fixed a number of bugs around connection
handling. You might find that 1.2.9.9 (current stable version) works
much better for you.
OK, we'll try to upgrade.
How to upgrade such complex setup?
Should we try top-to-bottom approach (master first, then L*, then S*) or
bottom-to-top (S*, L*, master last)?
Shutting down all servers is not really an option.
--
Daniel Fenert