389 crash 1.2.11.15-22.el6_4
by Michael Gettes
We had a crash early this morning on one of our masters (MMR with 2 servers, 3 replicas connected to each). Nothing in the errors log. The service was restarted and has not crashed since.
From syslog we have:
kernel: ns-slapd[18143]: segfault at 0 ip 00007f43d5eeaad6 sp 00007f437dbedf38 error 4 i
on the live process:
0x00007f43d5eeaad6 - 0x7f43d5dc3000 = 0x127ad6
0000000000127ac0 <__strcmp_sse42>:
127ac0: 89 f1 mov %esi,%ecx
127ac2: 89 f8 mov %edi,%eax
127ac4: 48 83 e1 3f and $0x3f,%rcx
127ac8: 48 83 e0 3f and $0x3f,%rax
127acc: 83 f9 30 cmp $0x30,%ecx
127acf: 77 3f ja 127b10 <__strcmp_sse42+0x50>
127ad1: 83 f8 30 cmp $0x30,%eax
127ad4: 77 3a ja 127b10 <__strcmp_sse42+0x50>
*127ad6: f3 0f 6f 0f movdqu (%rdi),%xmm1
Our environment consists of:
389-admin.x86_64 1.1.29-1.el6 installed
389-admin-console.noarch 1.1.8-1.el6 @epel-x86_64-server-6
389-admin-console-doc.noarch 1.1.8-1.el6 @epel-x86_64-server-6
389-adminutil.x86_64 1.1.15-1.el6 installed
389-console.noarch 1.1.7-3.el5 installed
389-ds.noarch 1.2.2-1.el6 @epel-x86_64-server-6
389-ds-base.x86_64 1.2.11.15-22.el6_4 installed
389-ds-base-libs.x86_64 1.2.11.15-22.el6_4 installed
389-ds-console.noarch 1.2.6-1.el6 @epel-x86_64-server-6
389-ds-console-doc.noarch 1.2.6-1.el6 @epel-x86_64-server-6
389-dsgw.x86_64 1.1.10-1.el6 @epel-x86_64-server-6
389-admin.i686 1.1.29-1.el6 epel-x86_64-server-6
389-adminutil.i686 1.1.15-1.el6 epel-x86_64-server-6
389-adminutil-devel.i686 1.1.15-1.el6 epel-x86_64-server-6
389-adminutil-devel.x86_64 1.1.15-1.el6 epel-x86_64-server-6
389-ds-base-debuginfo.x86_64 1.2.10.26-1.el6_3 389_rhel6_x86_64
389-ds-base-devel.i686 1.2.11.15-22.el6_4 rhel-x86_64-server-optional-6
389-ds-base-devel.x86_64 1.2.11.15-22.el6_4 rhel-x86_64-server-optional-6
389-ds-base-libs.i686 1.2.11.15-22.el6_4 rhel-x86_64-server-6
uname -a
Linux XXX 2.6.32-358.14.1.el6.x86_64 #1 SMP Mon Jun 17 15:54:20 EDT 2013 x86_64 x86_64 x86_64 GNU/Linux
I've looked around in trac and can't find anything obvious. Happy to report as a bug if this is sufficient info.
/mrg
9 years, 5 months
Problem starting and replicating RHDS9
by Ric
Hello All,
I hope you can forgive a request which I am sure doesn't have enough
information in it, please let me know what else I can add if you might
be able to help.
I have a problem with our installation of RHDS9 and practically
nothing in the logs to suggest where to look.
We have a multi-master pair, with DNS round robin to load balance.
Due to the problem I have updated DNS to point all traffic to the
working server so I hope I can get this working again without
impacting the users. But while I don't know the reason I'm concerned
it may occur on the working server and prevent all logins. :(
We first noticed that replication was not working, now it seems that I
can't get slapd to start on one of the pair.
Have restarted both dirsrv and both servers.
There is woefully little in the log files, but if there is a way to
increase logging levels I haven't found it yet. If there is, please
advise and I'll do that and post.
This is the info I have gathered so far. Please let me know what else
might help.
/usr/sbin/ns-slapd -v
389 Project
389-Directory/1.2.11.15 B2013.211.1952
dirsrv dir01 is stopped
There is no:
/var/run/dirsrv/slapd-dir01.pid
# service dirsrv start
*** Error: 1 instance(s) failed to start
The start-up runs the wait loop and finally exists, with the message above.
errors log includes the message:
[01/Oct/2013:12:14:47 +0100] - 389-Directory/1.2.11.15 B2013.211.1952
starting up
[01/Oct/2013:12:14:47 +0100] - WARNING: userRoot: entry cache size
10485760B is less than db size 10739712B; We recommend to increase the
entry cache size nsslapd-cachememsize.
The start-up process leaves one slapd running:
# ps -ef |grep slapd
dsuser 12560 1 0 09:51 ? 00:00:03 /usr/sbin/ns-slapd -D
/etc/dirsrv/slapd-dir01 -i /var/run/dirsrv/slapd-dir01.pid -w
/var/run/dirsrv/slapd-dir01.startpid
but no working ns-slapd.
I recognise that we need to tune the cache, but don't believe that it
will cause the start-up failure, just a performance hit. To tune via
the console I suspect I have to get it running first!
The working server shows the same error, along with:
[01/Oct/2013:12:16:26 +0100] slapi_ldap_bind - Error: could not send
bind request for id [cn=repman,cn=config] mech [SIMPLE]: error -1
(Can't contact LDAP server) 0 (unknown) 107 (Transport endpoint is not
connected)
Which makes sense.
The logs errors and access provide no other content at all, so nothing
to indicate what is failing.
Any ideas where I might start will be greatly welcomed.
Many thanks, Ric.
9 years, 5 months