[389-users] Crash with segmentation fault after a database reinitialization

Fri Feb 5 17:53:28 UTC 2010

Noriko Hosoi wrote:
> On 02/05/2010 08:32 AM, Francesco Fiore wrote:
>>
>>
>> Francesco Fiore wrote:
>>>
>>>
>>> Rich Megginson wrote:
>>>> Francesco Fiore wrote:
>>>>   
>>>>> Hi,
>>>>> I've two directory server in multimaster configuration. I've to 
>>>>> reinitialize all databases on 2 nd server (B) using the data of the 1st (A).
>>>>> After the synchronization, server B crash with an segmentation fault. 
>>>>> There isn't any relevant message in the error log.
>>>>> If I restart the directory server B, I've the same error.
>>>>> The directory server version is 1.1.3 on Redhat5.
>>>>>   
>>>>>     
>>>> rpm -qi fedora-ds-base
>>>>
>>>> 32-bit or 64-bit?
>>>>
>>>> We have fixed quite a few replication bugs since 1.1.3, including a 
>>>> couple of crashes.  I recommend upgrading to the latest.
>>>>   
>>> # rpm -qi 389-ds-base
>>> Name        : 389-ds-base                  Relocations: (not
>>> relocatable)
>>> Version     : 1.2.4                             Vendor: Fedora Project
>>> Release     : 1.el5                         Build Date: Tue 03 Nov
>>> 2009 04:47:39 PM CET
>>> Install Date: Fri 05 Feb 2010 11:49:11 AM CET      Build Host:
>>> x86-6.fedora.phx.redhat.com
>>> Group       : System Environment/Daemons    Source RPM:
>>> 389-ds-base-1.2.4-1.el5.src.rpm
>>> Size        : 5339258                          License: GPLv2 with
>>> exceptions
>>> Signature   : DSA/SHA1, Fri 06 Nov 2009 05:17:38 PM CET, Key ID
>>> 119cc036217521f6
>>> Packager    : Fedora Project
>>> URL         : http://port389.org/
>>> Summary     : 389 Directory Server (base)
>>> Description :
>>>
>>> x86-64
>>>
>>> I updated to the last stable version but I've the same error.
>>> I traced the running process and I discovered that the segmentation
>>> fault is probably caused by futex system call. I attach the tail of
>>> the output of the strace command below.
>>>
>>> getpeername(6, 0x7fff8256e3a0, [1475252821577171056]) = -1 ENOTCONN
>>> (Transport endpoint is not connected)
>>> poll([{fd=42, events=POLLIN}, {fd=-1}, {fd=6, events=POLLIN},
>>> {fd=-1}, {fd=65, events=POLLIN}], 5, 250) = 1 ([{fd=65,
>>> revents=POLLIN}])
>>> futex(0x145f806c, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x145f8068,
>>> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
>>> futex(0x145d0850, FUTEX_WAKE_PRIVATE, 1) = 1
>>> getpeername(6, 0x7fff8256e3a0, [1475252821577171056]) = -1 ENOTCONN
>>> (Transport endpoint is not connected)
>>> poll([{fd=42, events=POLLIN}, {fd=-1}, {fd=6, events=POLLIN},
>>> {fd=-1}], 4, 250) = 1 ([{fd=42, revents=POLLIN}])
>>> read(42, "\0", 200)                     = 1
>>> getpeername(6, 0x7fff8256e3a0, [1475252821577171056]) = -1 ENOTCONN
>>> (Transport endpoint is not connected)
>>> poll([{fd=42, events=POLLIN}, {fd=-1}, {fd=6, events=POLLIN},
>>> {fd=-1}, {fd=64, events=POLLIN}], 5, 250) = 1 ([{fd=64,
>>> revents=POLLIN}])
>>> futex(0x145f806c, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x145f8068,
>>> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
>>> futex(0x14550730, FUTEX_WAKE_PRIVATE, 1 <unavailable ...>
>>> getpeername(6, 0x7fff8256e3a0, [1475252821577171056]) = -1 ENOTCONN
>>> (Transport endpoint is not connected)
>>> poll([{fd=42, events=POLLIN}, {fd=-1}, {fd=6, events=POLLIN},
>>> {fd=-1}, {fd=65, events=POLLIN}], 5, 250) = 1 ([{fd=65,
>>> revents=POLLIN}])
>>> futex(0x145f806c, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x145f8068,
>>> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
>>> futex(0x145d0850, FUTEX_WAKE_PRIVATE, 1) = 1
>>> getpeername(6, 0x7fff8256e3a0, [1475252821577171056]) = -1 ENOTCONN
>>> (Transport endpoint is not connected)
>>> poll([{fd=42, events=POLLIN}, {fd=-1}, {fd=6, events=POLLIN},
>>> {fd=-1}], 4, 250) = 1 ([{fd=42, revents=POLLIN}])
>>> read(42, "\0", 200)                     = 1
>>> getpeername(6, 0x7fff8256e3a0, [1475252821577171056]) = -1 ENOTCONN
>>> (Transport endpoint is not connected)
>>> poll([{fd=42, events=POLLIN}, {fd=-1}, {fd=6, events=POLLIN},
>>> {fd=-1}, {fd=64, events=POLLIN}], 5, 250) = 1 ([{fd=64,
>>> revents=POLLIN}])
>>> futex(0x145f806c, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x145f8068,
>>> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
>>> futex(0x14550730, FUTEX_WAKE_PRIVATE, 1 <unavailable ...>
>>
>> I debugged the running process and gdb printed this stacktrace after
>> the segmentation fault:
>>
>> Program received signal SIGSEGV, Segmentation fault.
>> [Switching to Thread 0x63b2b940 (LWP 31976)]
>> 0x000000364fa79140 in strcmp () from /lib64/libc.so.6
>> (gdb) bt
>> #0  0x000000364fa79140 in strcmp () from /lib64/libc.so.6
>> #1  0x00002b188041e4fc in ?? () from
>> /usr/lib64/dirsrv/plugins/libback-ldbm.so
>> #2  0x00002b188041d8d9 in add_hash () from
>> /usr/lib64/dirsrv/plugins/libback-ldbm.so
>> #3  0x00002b188041df27 in ?? () from
>> /usr/lib64/dirsrv/plugins/libback-ldbm.so
>> #4  0x00002b188042c273 in id2entry () from
>> /usr/lib64/dirsrv/plugins/libback-ldbm.so
>> #5  0x00002b18804594c0 in uniqueid2entry () from
>> /usr/lib64/dirsrv/plugins/libback-ldbm.so
>> #6  0x00002b188042b961 in ?? () from
>> /usr/lib64/dirsrv/plugins/libback-ldbm.so
>> #7  0x00002b18804445fc in ldbm_back_delete () from
>> /usr/lib64/dirsrv/plugins/libback-ldbm.so
>> #8  0x00002b187c4990d4 in ?? () from /usr/lib64/dirsrv/libslapd.so.0
>> #9  0x00002b187c499413 in do_delete () from /usr/lib64/dirsrv/libslapd.so.0
>> #10 0x0000000000412e79 in sasl_map_config_add ()
>> #11 0x0000003590827fad in ?? () from /usr/lib64/libnspr4.so
>> #12 0x00000036506064a7 in start_thread () from /lib64/libpthread.so.0
>> #13 0x000000364fad3c2d in clone () from /lib64/libc.so.6
>>   
>> I hope that these information can be useful.
> The stacktrace is really useful.  Thanks!  If possible, could you
> install the debuginfo package and take the stacktrace?
> yum install 389-ds-base-debuginfo
Hi,
I'm a collegue of Francesco, and i'm too following this problem.
We have already installed the 389-ds-base-debuginfo and the stacktrace is:

Program received signal SIGSEGV, Segmentation fault.
0x000000364fa79140 in strcmp () from /lib64/libc.so.6
(gdb) bt
#0  0x000000364fa79140 in strcmp () from /lib64/libc.so.6
#1  0x00002b39f5cea4fc in entry_same_dn (e=<value optimized out>,
k=0x2aaab800e860) at ldap/servers/slapd/back-ldbm/cache.c:137
#2  0x00002b39f5ce98d9 in add_hash (ht=0x191b1900, key=0x2aaab800e860,
keylen=<value optimized out>, entry=0x2aaab800ae00, alt=0x64035b68)
    at ldap/servers/slapd/back-ldbm/cache.c:185
#3  0x00002b39f5ce9f27 in cache_add_int (cache=0x19105718,
e=0x2aaab800ae00, state=0, alt=0x64035c18) at
ldap/servers/slapd/back-ldbm/cache.c:1037
#4  0x00002b39f5cf8273 in id2entry (be=0x191aef70, id=1505303, txn=0x0,
err=0x64035d58) at ldap/servers/slapd/back-ldbm/id2entry.c:268
#5  0x00002b39f5d254c0 in uniqueid2entry (be=0x191aef70, uniqueid=<value
optimized out>, txn=0x0, err=0x64035d58)
    at ldap/servers/slapd/back-ldbm/uniqueid2entry.c:86
#6  0x00002b39f5cf7961 in find_entry_internal (pb=0x2aaab8008200,
be=0x191aef70, addr=<value optimized out>, lock=1, txn=0x0,
really_internal=0)
    at ldap/servers/slapd/back-ldbm/findentry.c:201
#7  0x00002b39f5d105fc in ldbm_back_delete (pb=0x2aaab8008200) at
ldap/servers/slapd/back-ldbm/ldbm_delete.c:140
#8  0x00002b39f1d810d4 in op_shared_delete (pb=0x2aaab8008200) at
ldap/servers/slapd/delete.c:318
#9  0x00002b39f1d81413 in do_delete (pb=0x2aaab8008200) at
ldap/servers/slapd/delete.c:116
#10 0x0000000000412e79 in connection_threadmain () at
ldap/servers/slapd/connection.c:548
#11 0x0000003590827fad in ?? () from /usr/lib64/libnspr4.so
#12 0x00000036506064a7 in start_thread () from /lib64/libpthread.so.0
#13 0x000000364fad3c2d in clone () from /lib64/libc.so.6

Thanks
> --noriko
>>>
>>>>> I attach the tails of the error log and the /var/log/messages log.
>>>>>
>>>>> [03/Feb/2010:19:20:53 +0100] - import Addressbook2: Workers finished; 
>>>>> cleaning up...
>>>>> [03/Feb/2010:19:21:13 +0100] - import Addressbook1: Workers finished; 
>>>>> cleaning up...
>>>>> [03/Feb/2010:19:21:13 +0100] - import Addressbook2: Workers cleaned up.
>>>>> [03/Feb/2010:19:21:13 +0100] - import Addressbook2: Indexing complete.  
>>>>> Post-processing...
>>>>> [03/Feb/2010:19:21:13 +0100] - import Addressbook1: Workers cleaned up.
>>>>> [03/Feb/2010:19:21:13 +0100] - import Addressbook1: Indexing complete.  
>>>>> Post-processing...
>>>>> [03/Feb/2010:19:21:50 +0100] - import Addressbook2: Flushing caches...
>>>>> [03/Feb/2010:19:22:27 +0100] - import Addressbook1: Flushing caches...
>>>>> [03/Feb/2010:19:22:27 +0100] - import Addressbook2: Closing files...
>>>>> [03/Feb/2010:19:22:27 +0100] - import Addressbook1: Closing files...
>>>>> [03/Feb/2010:19:32:27 +0100] - import Addressbook2: Import complete.  
>>>>> Processed 3820687 entries in 4957 seconds. (770.77 entries/sec)
>>>>> [03/Feb/2010:19:32:28 +0100] NSMMReplicationPlugin - 
>>>>> multimaster_be_state_change: replica o=addressbook2 is coming online; 
>>>>> enabling replication
>>>>> [03/Feb/2010:19:32:29 +0100] - import Addressbook1: Import complete.  
>>>>> Processed 3820339 entries in 4960 seconds. (770.23 entries/sec)
>>>>> [03/Feb/2010:19:32:29 +0100] NSMMReplicationPlugin - 
>>>>> multimaster_be_state_change: replica o=addressbook1 is coming online; 
>>>>> enabling replication
>>>>> [03/Feb/2010:19:32:29 +0100] NSMMReplicationPlugin - replica_reload_ruv: 
>>>>> Warning: new data for replica o=addressbook1 does not match the data in 
>>>>> the changelog.
>>>>>  Recreating the changelog file. This could affect replication with 
>>>>> replica's  consumers in which case the consumers should be reinitialized.
>>>>>
>>>>> Feb  3 19:32:35 mmt-l-al19 kernel: ns-slapd[5575]: segfault at 
>>>>> 0000000000000000 rip 000000364fa79140 rsp 0000000056bd3b18 error 4
>>>>>
>>>>> Have you any idea?
>>>>>
>>>>> Thanks
>>>>>
>>>>>   
>>>>>     
>>>>
>>>> --
>>>> 389 users mailing list
>>>> 389-users at lists.fedoraproject.org
>>>> https://admin.fedoraproject.org/mailman/listinfo/389-users
>>>>   
>>>
>>> -- 
>>> Francesco Fiore
>>> System Integrator
>>> Babel S.r.l. - http://www.babel.it
>>> P.zza S.Benedetto da Norcia, 33 - 00040 Pomezia (Roma)
>>>
>>>
>>> CONFIDENZIALE: Questo messaggio ed i suoi allegati sono di carattere
>>> confidenziale per i destinatari in indirizzo. Se hai ricevuto questo
>>> messaggio per errore sei invitato cortesemente a rispondere
>>> immediatamente al mittente e cancellare tutti i suoi contenuti.
>>>   
>>> ------------------------------------------------------------------------
>>>
>>> --
>>> 389 users mailing list
>>> 389-users at lists.fedoraproject.org
>>> https://admin.fedoraproject.org/mailman/listinfo/389-users
>> Thanks
>> -- 
>> Francesco Fiore
>> System Integrator
>> Babel S.r.l. - http://www.babel.it
>> P.zza S.Benedetto da Norcia, 33 - 00040 Pomezia (Roma)
>>
>>
>> CONFIDENZIALE: Questo messaggio ed i suoi allegati sono di carattere
>> confidenziale per i destinatari in indirizzo. Se hai ricevuto questo
>> messaggio per errore sei invitato cortesemente a rispondere
>> immediatamente al mittente e cancellare tutti i suoi contenuti.
>>   
>>
>>
>> --
>> 389 users mailing list
>> 389-users at lists.fedoraproject.org
>> https://admin.fedoraproject.org/mailman/listinfo/389-users
>
> ------------------------------------------------------------------------
>
> --
> 389 users mailing list
> 389-users at lists.fedoraproject.org
> https://admin.fedoraproject.org/mailman/listinfo/389-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.fedoraproject.org/pipermail/389-users/attachments/20100205/e7e32313/attachment.html>