Re: [389-users] Master caught in infinite loop

Friday, 18 November 2011


W dniu 2011-11-18 19:49, Rich Megginson pisze:
...
 On 11/18/2011 11:46 AM, Daniel Fenert wrote:
> W dniu 2011-11-18 14:42, Rich Megginson pisze:
>> On 11/18/2011 05:08 AM, Daniel Fenert wrote:
>>> Hi,
>>>
>>> I'm using 389ds 1.2.5 with replication, my current setup:
>>>
>>> Master
>>> |     \
>>> L1     L2
>>> | \    |  \
>>> S1 S2 S3  S4
>>>
>>> L* - acting as slave to "master" and master to "S*"
>>> S* - slaves to L*
>>>
>>>
>>>  From time to time (usually few months between problems) we encounter
>>> "master" going to some infinite loop.
>>> After analyzing access log, it looks like it stops doing queries, and
>>> accepts new connections until it runs out of fd's.
>>> After that, it won't stop peacefully, only SIGKILL saves the day.
>>>
>>> Workload:
>>> Master is used only for updates, maybe 20 connections/s.
>>> L* are used only for replication.
>>> All bind's and search queries are targeted to S* which are read only.
>>>
>>> With previous setup (less complicated), we've also seen this problem:
>>> Master
>>> |  |  |  \
>>> S1 S2 S3  S4...
>>>
>>> Is there a chance that upgrading to latest version will fix the 
>>> problem?
>>> Were there any fixes nearby? Upgrade will be complex as hell ;)
>>>
>>> Error log from last problem:
>>>   - Not listening for new connections - too many fds open
>> Have you tried increasing the number of fds to 8192?
>
> Yes, but it doesn't make sense - during normal operation master uses 
> no more than 50-60 fd's.
 Right.  I'm not suggesting this is the root cause of the problem, but 
 increasing the number of fds could help reduce the occurance of the 
 problem. 
When the number of fd's being used started to grow, it wasn't already 
running queries.
I think giving him more fd's would just delay for a few minutes log 
message that it stopped accepting new connections :)

...
>
>>>   - slapd shutting down - signaling operation threads
>>>   - slapd shutting down - waiting for 120 threads to terminate
>> Does the server shutdown on its own, or did you shut it down 
>> normally (i.e. service dirsrv stop)?
>
> We have tried to stop it using init.d scripts.
 120 threads?  Did you increase nsslapd-threadnumber?
 If not, then I'm very curious about what all those threads are doing. 
Yes, we've raised number of threads long time ago - when master was used 
also for queries - when we hit performance problems.
Nowadays these threads just hang and do nothing - I've forgot to take 
the thread number down.

...
>
>>> ... SIGKILL ...
>>>   - 389-Directory/1.2.5 B2010.012.2034 starting up
>>>   - Detected Disorderly Shutdown last time Directory Server was 
>>> running,
>>> recovering database.
>>>   - slapd started.  Listening on All Interfaces port 389 for LDAP 
>>> requests
>>>
>>> Number of fds: 4096.
>> Since 1.2.5 we have fixed a number of bugs around connection 
>> handling.  You might find that 1.2.9.9 (current stable version) 
>> works much better for you.
>
> OK, we'll try to upgrade.
>
> How to upgrade such complex setup?
> Should we try top-to-bottom approach (master first, then L*, then S*) 
> or bottom-to-top (S*, L*, master last)?
 bottom to top 
Thanks, we'll try in the next weeks.

...
> Shutting down all servers is not really an option.
>

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [389-users] Master caught in infinite loop