On Tue, Feb 14, 2012 at 19:54:39 -0600, Rich Megginson wrote:
On 02/14/2012 06:37 PM, Iain Morgan wrote:
> On a fairly frequent basis, one of my 389 DS servers hangs after certain
> CMP operations. Once this happens, the server cannot be shutdown
> gracefully. This has been going on for several weeks, and I have not yet
> found a solution.
> My setup consists of two systems running RHEL 6.2 with 389 DS 18.104.22.168.
> Multimaster replication is enabled between the two servers, but the
> client systems (currently just two test systems) preferrentially use the
> same server, ServerA. The second server, ServerB, is the one which is
> experiencing the problem.
> We are using class-of-service entries to to set the values for the
> shadowMax, shadowMin, and shadowWarning attributes. And we are
> conditionally setting a pwdPolicySubentry attribute for some entries in
> the same manner.
> If I execute an ldapcompare command, such as the following:
> # ldapcompare uid=imorgan,ou=People,dc=example,dc=com \
> pwdpolicysubentry:"cn=Special Policy,ou=Policies,dc=example,dc=com"
> the command will occassionally hang. Most of the time, the command
> succeeds and indicates that the attribute is not defined for that entry.
> However, once or twice a day it will simply hang.
> The access log shows that the CMP request was received, but no result is
> logged. After this occurs, the server will not shut down gracefully. The
> init script fails to shut down the server and I end up having to send a
> SIGKILL to ns-slapd.
When you get the hang, can you attach to the process with gdb?
ps -ef|grep ns-slapd
gdb /usr/sbin/ns-slapd pid-of-ns-slapd
> The error log does not report any issues.
> CMP operations against other attributes, such as loginShell, do not seem
> to exhibit this problem. Also, the problem does not occur on ServerA;
> only on ServerB. Once the CMP operation has hung, comparisons against
> other attributes, even shadowMax, continue to work.
> As noted above, most of the time the CMP operation returns normally.
> However, if I reinitialize ServerB from ServerA, the problem occurs with
> the first CMP operation against ServerB.
> Both servers have the same set of RPMs and the dse.ldif on both systems
> do not have any significant differences.
> Has anyone seen a similar issue? Any suggestions on how to debug of fix
> A somewhat simplified and redacted version of the class-of-service
> configuration is listed below.
A gzip'd copy of the 'thread apply all bt full' output is attached.