[389-users] 389 directory server crash

Mon Jul 15 15:28:50 UTC 2013

On 07/15/2013 02:57 AM, Mitja Mihelič wrote:
> On 07/12/2013 05:55 PM, Rich Megginson wrote:
>> On 07/12/2013 08:22 AM, Mitja Mihelič wrote:
>>> On 07/09/2013 03:34 PM, Rich Megginson wrote:
>>>> On 07/09/2013 06:43 AM, Mitja Mihelič wrote:
>>>>> Hi!
>>>>>
>>>>> We are having problems with some our 389-DS instances. They crash 
>>>>> after receiving an update from the provider.
>>>>
>>>> After looking at the stack trace, I think this is 
>>>> https://fedorahosted.org/389/ticket/47391
> Yes, it looks like it might be it. When CONSUMER_ONE crashed for the 
> first time, the last thing replicated was a password change.
> Do you perhaps know, where I could get a 389DS version for Centos6 
> that has the patch? The ticket says it was pushed to 1.2.11, but would 
> seem that our 1.2.11.15-14 is still an unpatched one and the 
> repositories do not have any newer versions.

Is that the 389-ds-base that is included with CentOS6?

>>>>
>>>>> The crash happened twice after about a week of running without 
>>>>> problems. The crashes happened on two consumer servers but not at 
>>>>> the same time.
>>>>> The servers are running CentOS 6x with the following 389DS 
>>>>> packages installed:
>>>>> 389-ds-console-doc-1.2.6-1.el6.noarch
>>>>> 389-console-1.1.7-1.el6.noarch
>>>>> 389-adminutil-1.1.15-1.el6.x86_64
>>>>> 389-dsgw-1.1.10-1.el6.x86_64
>>>>> 389-ds-base-debuginfo-1.2.11.15-14.el6_4.x86_64
>>>>> 389-admin-1.1.29-1.el6.x86_64
>>>>> 389-ds-console-1.2.6-1.el6.noarch
>>>>> 389-admin-console-doc-1.1.8-1.el6.noarch
>>>>> 389-ds-1.2.2-1.el6.noarch
>>>>> 389-ds-base-1.2.11.15-14.el6_4.x86_64
>>>>> 389-ds-base-libs-1.2.11.15-14.el6_4.x86_64
>>>>> 389-admin-console-1.1.8-1.el6.noarch
>>>>>
>>>>> We are in the process of replacing the Centos 5x base 
>>>>> consumer+provider setup with a CentOS 6x base one. For the time 
>>>>> being, the CentOS 6 machines are acting as consumers for the old 
>>>>> server. They run for a while and then the replicated instances 
>>>>> crash though not at the same time.
>>>>> One of the servers did not want to start after the crash,
>>>>
>>>> Can you provide the error messages from the errors log?
>>> I have attached error logs from the provider 
>>> (2013-06-27-provider_error) and the consumer 
>>> (2013-06-27-server_two_error) in question.
>>>>
>>>>> so I have run db2index on its database. It's been running for four 
>>>>> days and it has still not finished. 
>>>>
>>>> Try exporting using db2ldif, then importing using ldif2db.
>>> The export process hangs. After an hour strace still shows:
>>> futex(0x7f5822670ed4, FUTEX_WAIT, 1, NULL
>>> The error log for this is attached as 
>>> 2013-07-10-server_two-ldif_import_hangs.
>>
>> Are you using db2ldif or db2ldif.pl?  If you are using db2ldif, is 
>> the server running?  If not, please try first shutting down the 
>> server and use db2ldif.
>>
>> If db2ldif still hangs, then please follow the instructions at 
>> http://port389.org/wiki/FAQ#Debugging_Hangs to get a stack trace of 
>> the hung process.
> I was using db2ldif with the server shut down. I tried it again and it 
> hung. The LDIF file was created but its size was zero. The produced 
> stack trace is attached as 
> server_two-db2ldif_hang-stacktrace.1373877200.txt.
>
>>
>>>
>>>>
>>>>> All I get from db2index now are these outputs:
>>>>> [09/Jul/2013:13:29:11 +0200] - reindex db: Processed 65095 entries 
>>>>> (pass 1104) -- average rate 53686277.5/sec, recent rate 0.0/sec, 
>>>>> hit ratio 0%
>>>>
>>>> How many entries do you have in your database?
>>> The number revolves around 65400. It varies perhaps 2 user del/add 
>>> operations a month and 20 attribute changes per week, if that.
>>>>
>>>>>
>>>>> The other instance did start up, but the replication process did 
>>>>> not work anymore. I disabled the replication to this host and set 
>>>>> it up again. I chose "Initialize consumer now" and the consumer 
>>>>> crashed every time.
>>>>
>>>> Can provide a stack trace of the core when the server crashes?  
>>>> This may be different than the stack trace below.
>>> The last provided stack trace was produced at the last server crash. 
>>> I will provide another stack trace when CONSUMER_ONE crashes again. 
>>> Currently it refuses to crash at initialization time and keeps running.
>>>>
>>>>> I have enabled full error logging and could find nothing.
>>>>> I have read a few threads (not all, I admit) on this list and 
>>>>> http://directory.fedoraproject.org/wiki/FAQ#Debugging_Crashes and 
>>>>> tried to troubleshoot.
>>>>>
>>>>> The crash produced the attached core dump and I could use your 
>>>>> help with understanding it. As well as any help with the crash. If 
>>>>> more info is needed I will gladly provide it.
>>>>>
>>>>> Regards, Mitja
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> 389 users mailing list
>>>>> 389-users at lists.fedoraproject.org
>>>>> https://admin.fedoraproject.org/mailman/listinfo/389-users
>>>>
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.fedoraproject.org/pipermail/389-users/attachments/20130715/5cd4553d/attachment.html>