<html>
<head>
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">On 07/17/2013 01:52 AM, Mitja Mihelič
wrote:<br>
</div>
<blockquote cite="mid:51E64D31.2030907@arnes.si" type="cite">
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
<div class="moz-cite-prefix">On 07/16/2013 04:49 PM, Rich
Megginson wrote:<br>
</div>
<blockquote cite="mid:51E55D7A.50007@redhat.com" type="cite">
<meta content="text/html; charset=UTF-8"
http-equiv="Content-Type">
<div class="moz-cite-prefix">On 07/16/2013 01:23 AM, Mitja
Mihelič wrote:<br>
</div>
<blockquote cite="mid:51E4F4DB.7040309@arnes.si" type="cite">
<meta content="text/html; charset=UTF-8"
http-equiv="Content-Type">
<div class="moz-cite-prefix">On 07/15/2013 05:28 PM, Rich
Megginson wrote:<br>
</div>
<blockquote cite="mid:51E41532.4060906@redhat.com" type="cite">
<meta content="text/html; charset=UTF-8"
http-equiv="Content-Type">
<div class="moz-cite-prefix">On 07/15/2013 02:57 AM, Mitja
Mihelič wrote:<br>
</div>
<blockquote cite="mid:51E3B96C.4000706@arnes.si" type="cite">
<meta content="text/html; charset=UTF-8"
http-equiv="Content-Type">
<div class="moz-cite-prefix">On 07/12/2013 05:55 PM, Rich
Megginson wrote:<br>
</div>
<blockquote cite="mid:51E026FE.5030402@redhat.com"
type="cite">
<meta content="text/html; charset=UTF-8"
http-equiv="Content-Type">
<div class="moz-cite-prefix">On 07/12/2013 08:22 AM,
Mitja Mihelič wrote:<br>
</div>
<blockquote cite="mid:51E0110C.8080603@arnes.si"
type="cite">
<meta content="text/html; charset=UTF-8"
http-equiv="Content-Type">
<div class="moz-cite-prefix">On 07/09/2013 03:34 PM,
Rich Megginson wrote:<br>
</div>
<blockquote cite="mid:51DC1167.6040809@redhat.com"
type="cite">
<meta content="text/html; charset=UTF-8"
http-equiv="Content-Type">
<div class="moz-cite-prefix">On 07/09/2013 06:43 AM,
Mitja Mihelič wrote:<br>
</div>
<blockquote cite="mid:51DC058E.2090202@arnes.si"
type="cite">
<meta content="text/html; charset=UTF-8"
http-equiv="Content-Type">
Hi!<br>
<br>
We are having problems with some our 389-DS
instances. They crash after receiving an update
from the provider.<br>
</blockquote>
<br>
After looking at the stack trace, I think this is <a
moz-do-not-send="true"
class="moz-txt-link-freetext"
href="https://fedorahosted.org/389/ticket/47391">https://fedorahosted.org/389/ticket/47391</a><br>
</blockquote>
</blockquote>
</blockquote>
Yes, it looks like it might be it. When CONSUMER_ONE
crashed for the first time, the last thing replicated was
a password change.<br>
Do you perhaps know, where I could get a 389DS version for
Centos6 that has the patch? The ticket says it was pushed
to 1.2.11, but would seem that our 1.2.11.15-14 is still
an unpatched one and the repositories do not have any
newer versions.<br>
</blockquote>
<br>
Is that the 389-ds-base that is included with CentOS6?<br>
</blockquote>
Yes, the 389-ds-base-1.2.11.15-14.el6_4.x86_64 and
389-ds-base-libs-1.2.11.15-14.el6_4.x86_64 are from the
official Centos6 updates repoository.<br>
389-ds-base-debuginfo is from <a moz-do-not-send="true"
class="moz-txt-link-freetext"
href="http://debuginfo.centos.org/6/">http://debuginfo.centos.org/6/</a><br>
The rest are from epel.<br>
</blockquote>
<br>
Looking at the stack trace you sent earlier - there is only 1
thread? You ran <br>
<pre>gdb -ex 'set confirm off' -ex 'set pagination off' -ex 'thread apply all bt full' -ex 'quit' /usr/sbin/ns-slapd `pidof ns-slapd` > stacktrace.`date +%s`.txt 2>&1
? If so, I have no idea what's going on - I've never seen the server deadlock itself with only 1 thread . . .
</pre>
</blockquote>
I ran<br>
gdb -ex 'set confirm off' -ex 'set pagination off' -ex 'thread
apply all bt full' -ex 'quit' /usr/sbin/ns-slapd `pidof -o 49171
ns-slapd` > stacktrace.`date +%s`.txt 2>&1<br>
The "-o 49171" is to exclude the pid of the config server
instance, so only the problematic pid was looked at.<br>
If you get any more information regarding this crash it would be
very much appreciated.<br>
<br>
It may be best if I removed all 389DS related data from both of
the consumer servers and start fresh. If they crash again I will
send the relevant stack traces.<br>
</blockquote>
<br>
Yes, that sounds good.<br>
<br>
<blockquote cite="mid:51E64D31.2030907@arnes.si" type="cite"> <br>
<blockquote cite="mid:51E55D7A.50007@redhat.com" type="cite"> <br>
<br>
<br>
<blockquote cite="mid:51E4F4DB.7040309@arnes.si" type="cite">
<blockquote cite="mid:51E41532.4060906@redhat.com" type="cite">
<br>
<blockquote cite="mid:51E3B96C.4000706@arnes.si" type="cite">
<blockquote cite="mid:51E026FE.5030402@redhat.com"
type="cite">
<blockquote cite="mid:51E0110C.8080603@arnes.si"
type="cite">
<blockquote cite="mid:51DC1167.6040809@redhat.com"
type="cite"> <br>
<blockquote cite="mid:51DC058E.2090202@arnes.si"
type="cite"> The crash happened twice after about
a week of running without problems. The crashes
happened on two consumer servers but not at the
same time.<br>
The servers are running CentOS 6x with the
following 389DS packages installed:<br>
389-ds-console-doc-1.2.6-1.el6.noarch<br>
389-console-1.1.7-1.el6.noarch<br>
389-adminutil-1.1.15-1.el6.x86_64<br>
389-dsgw-1.1.10-1.el6.x86_64<br>
389-ds-base-debuginfo-1.2.11.15-14.el6_4.x86_64<br>
389-admin-1.1.29-1.el6.x86_64<br>
389-ds-console-1.2.6-1.el6.noarch<br>
389-admin-console-doc-1.1.8-1.el6.noarch<br>
389-ds-1.2.2-1.el6.noarch<br>
389-ds-base-1.2.11.15-14.el6_4.x86_64<br>
389-ds-base-libs-1.2.11.15-14.el6_4.x86_64<br>
389-admin-console-1.1.8-1.el6.noarch<br>
<br>
We are in the process of replacing the Centos 5x
base consumer+provider setup with a CentOS 6x base
one. For the time being, the CentOS 6 machines are
acting as consumers for the old server. They run
for a while and then the replicated instances
crash though not at the same time.<br>
One of the servers did not want to start after the
crash,</blockquote>
<br>
Can you provide the error messages from the errors
log?<br>
</blockquote>
I have attached error logs from the provider
(2013-06-27-provider_error) and the consumer
(2013-06-27-server_two_error) in question.<br>
<blockquote cite="mid:51DC1167.6040809@redhat.com"
type="cite"> <br>
<blockquote cite="mid:51DC058E.2090202@arnes.si"
type="cite">so I have run db2index on its
database. It's been running for four days and it
has still not finished. </blockquote>
<br>
Try exporting using db2ldif, then importing using
ldif2db.<br>
</blockquote>
The export process hangs. After an hour strace still
shows:<br>
futex(0x7f5822670ed4, FUTEX_WAIT, 1, NULL<br>
The error log for this is attached as
2013-07-10-server_two-ldif_import_hangs.<br>
</blockquote>
<br>
Are you using db2ldif or db2ldif.pl? If you are using
db2ldif, is the server running? If not, please try
first shutting down the server and use db2ldif.<br>
<br>
If db2ldif still hangs, then please follow the
instructions at <a moz-do-not-send="true"
class="moz-txt-link-freetext"
href="http://port389.org/wiki/FAQ#Debugging_Hangs">http://port389.org/wiki/FAQ#Debugging_Hangs</a>
to get a stack trace of the hung process.<br>
</blockquote>
I was using db2ldif with the server shut down. I tried it
again and it hung. The LDIF file was created but its size
was zero. The produced stack trace is attached as
server_two-db2ldif_hang-stacktrace.1373877200.txt.<br>
<br>
<blockquote cite="mid:51E026FE.5030402@redhat.com"
type="cite"> <br>
<blockquote cite="mid:51E0110C.8080603@arnes.si"
type="cite"> <br>
<blockquote cite="mid:51DC1167.6040809@redhat.com"
type="cite"> <br>
<blockquote cite="mid:51DC058E.2090202@arnes.si"
type="cite">All I get from db2index now are these
outputs:<br>
[09/Jul/2013:13:29:11 +0200] - reindex db:
Processed 65095 entries (pass 1104) -- average
rate 53686277.5/sec, recent rate 0.0/sec, hit
ratio 0%<br>
</blockquote>
<br>
How many entries do you have in your database?<br>
</blockquote>
The number revolves around 65400. It varies perhaps 2
user del/add operations a month and 20 attribute
changes per week, if that.<br>
<blockquote cite="mid:51DC1167.6040809@redhat.com"
type="cite"> <br>
<blockquote cite="mid:51DC058E.2090202@arnes.si"
type="cite"> <br>
The other instance did start up, but the
replication process did not work anymore. I
disabled the replication to this host and set it
up again. I chose "Initialize consumer now" and
the consumer crashed every time.</blockquote>
<br>
Can provide a stack trace of the core when the
server crashes? This may be different than the
stack trace below.<br>
</blockquote>
The last provided stack trace was produced at the last
server crash. I will provide another stack trace when
CONSUMER_ONE crashes again. Currently it refuses to
crash at initialization time and keeps running.<br>
<blockquote cite="mid:51DC1167.6040809@redhat.com"
type="cite"> <br>
<blockquote cite="mid:51DC058E.2090202@arnes.si"
type="cite">I have enabled full error logging and
could find nothing.<br>
I have read a few threads (not all, I admit) on
this list and
<meta http-equiv="content-type"
content="text/html; charset=UTF-8">
<a moz-do-not-send="true"
href="http://directory.fedoraproject.org/wiki/FAQ#Debugging_Crashes">http://directory.fedoraproject.org/wiki/FAQ#Debugging_Crashes</a>
and tried to troubleshoot.<br>
<br>
The crash produced the attached core dump and I
could use your help with understanding it. As well
as any help with the crash. If more info is needed
I will gladly provide it.<br>
<br>
Regards, Mitja<br>
<br>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">--
389 users mailing list
<a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:389-users@lists.fedoraproject.org">389-users@lists.fedoraproject.org</a>
<a moz-do-not-send="true" class="moz-txt-link-freetext" href="https://admin.fedoraproject.org/mailman/listinfo/389-users">https://admin.fedoraproject.org/mailman/listinfo/389-users</a></pre>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<br>
</body>
</html>