<html>

  <head>

    <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <div class="moz-cite-prefix">On 07/17/2013 01:52 AM, Mitja Mihelič

      wrote:<br>

    </div>

    <blockquote cite="mid:51E64D31.2030907@arnes.si" type="cite">

      <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">

      <div class="moz-cite-prefix">On 07/16/2013 04:49 PM, Rich

        Megginson wrote:<br>

      </div>

      <blockquote cite="mid:51E55D7A.50007@redhat.com" type="cite">

        <meta content="text/html; charset=UTF-8"

          http-equiv="Content-Type">

        <div class="moz-cite-prefix">On 07/16/2013 01:23 AM, Mitja

          Mihelič wrote:<br>

        </div>

        <blockquote cite="mid:51E4F4DB.7040309@arnes.si" type="cite">

          <meta content="text/html; charset=UTF-8"

            http-equiv="Content-Type">

          <div class="moz-cite-prefix">On 07/15/2013 05:28 PM, Rich

            Megginson wrote:<br>

          </div>

          <blockquote cite="mid:51E41532.4060906@redhat.com" type="cite">

            <meta content="text/html; charset=UTF-8"

              http-equiv="Content-Type">

            <div class="moz-cite-prefix">On 07/15/2013 02:57 AM, Mitja

              Mihelič wrote:<br>

            </div>

            <blockquote cite="mid:51E3B96C.4000706@arnes.si" type="cite">

              <meta content="text/html; charset=UTF-8"

                http-equiv="Content-Type">

              <div class="moz-cite-prefix">On 07/12/2013 05:55 PM, Rich

                Megginson wrote:<br>

              </div>

              <blockquote cite="mid:51E026FE.5030402@redhat.com"

                type="cite">

                <meta content="text/html; charset=UTF-8"

                  http-equiv="Content-Type">

                <div class="moz-cite-prefix">On 07/12/2013 08:22 AM,

                  Mitja Mihelič wrote:<br>

                </div>

                <blockquote cite="mid:51E0110C.8080603@arnes.si"

                  type="cite">

                  <meta content="text/html; charset=UTF-8"

                    http-equiv="Content-Type">

                  <div class="moz-cite-prefix">On 07/09/2013 03:34 PM,

                    Rich Megginson wrote:<br>

                  </div>

                  <blockquote cite="mid:51DC1167.6040809@redhat.com"

                    type="cite">

                    <meta content="text/html; charset=UTF-8"

                      http-equiv="Content-Type">

                    <div class="moz-cite-prefix">On 07/09/2013 06:43 AM,

                      Mitja Mihelič wrote:<br>

                    </div>

                    <blockquote cite="mid:51DC058E.2090202@arnes.si"

                      type="cite">

                      <meta content="text/html; charset=UTF-8"

                        http-equiv="Content-Type">

                      Hi!<br>

                      <br>

                      We are having problems with some our 389-DS

                      instances. They crash after receiving an update

                      from the provider.<br>

                    </blockquote>

                    <br>

                    After looking at the stack trace, I think this is <a

                      moz-do-not-send="true"

                      class="moz-txt-link-freetext"

                      href="https://fedorahosted.org/389/ticket/47391">https://fedorahosted.org/389/ticket/47391</a><br>

                  </blockquote>

                </blockquote>

              </blockquote>

              Yes, it looks like it might be it. When CONSUMER_ONE

              crashed for the first time, the last thing replicated was

              a password change.<br>

              Do you perhaps know, where I could get a 389DS version for

              Centos6 that has the patch? The ticket says it was pushed

              to 1.2.11, but would seem that our 1.2.11.15-14 is still

              an unpatched one and the repositories do not have any

              newer versions.<br>

            </blockquote>

            <br>

            Is that the 389-ds-base that is included with CentOS6?<br>

          </blockquote>

          Yes, the 389-ds-base-1.2.11.15-14.el6_4.x86_64 and

          389-ds-base-libs-1.2.11.15-14.el6_4.x86_64 are from the

          official Centos6 updates repoository.<br>

          389-ds-base-debuginfo is from <a moz-do-not-send="true"

            class="moz-txt-link-freetext"

            href="http://debuginfo.centos.org/6/">http://debuginfo.centos.org/6/</a><br>

          The rest are from epel.<br>

        </blockquote>

        <br>

        Looking at the stack trace you sent earlier - there is only 1

        thread?  You ran <br>

        <pre>gdb -ex 'set confirm off' -ex 'set pagination off' -ex 'thread apply all bt full' -ex 'quit' /usr/sbin/ns-slapd `pidof ns-slapd` &gt; stacktrace.`date +%s`.txt 2&gt;&amp;1

?  If so, I have no idea what's going on - I've never seen the server deadlock itself with only 1 thread . . .

</pre>

      </blockquote>

      I ran<br>

      gdb -ex 'set confirm off' -ex 'set pagination off' -ex 'thread

      apply all bt full' -ex 'quit' /usr/sbin/ns-slapd `pidof -o 49171

      ns-slapd` &gt; stacktrace.`date +%s`.txt 2&gt;&amp;1<br>

      The "-o 49171" is to exclude the pid of the config server

      instance, so only the problematic pid was looked at.<br>

      If you get any more information regarding this crash it would be

      very much appreciated.<br>

      <br>

      It may be best if I removed all 389DS related data from both of

      the consumer servers and start fresh. If they crash again I will

      send the relevant stack traces.<br>

    </blockquote>

    <br>

    Yes, that sounds good.<br>

    <br>

    <blockquote cite="mid:51E64D31.2030907@arnes.si" type="cite"> <br>

      <blockquote cite="mid:51E55D7A.50007@redhat.com" type="cite"> <br>

        <br>

        <br>

        <blockquote cite="mid:51E4F4DB.7040309@arnes.si" type="cite">

          <blockquote cite="mid:51E41532.4060906@redhat.com" type="cite">

            <br>

            <blockquote cite="mid:51E3B96C.4000706@arnes.si" type="cite">

              <blockquote cite="mid:51E026FE.5030402@redhat.com"

                type="cite">

                <blockquote cite="mid:51E0110C.8080603@arnes.si"

                  type="cite">

                  <blockquote cite="mid:51DC1167.6040809@redhat.com"

                    type="cite"> <br>

                    <blockquote cite="mid:51DC058E.2090202@arnes.si"

                      type="cite"> The crash happened twice after about

                      a week of running without problems. The crashes

                      happened on two consumer servers but not at the

                      same time.<br>

                      The servers are running CentOS 6x with the

                      following 389DS packages installed:<br>

                      389-ds-console-doc-1.2.6-1.el6.noarch<br>

                      389-console-1.1.7-1.el6.noarch<br>

                      389-adminutil-1.1.15-1.el6.x86_64<br>

                      389-dsgw-1.1.10-1.el6.x86_64<br>

                      389-ds-base-debuginfo-1.2.11.15-14.el6_4.x86_64<br>

                      389-admin-1.1.29-1.el6.x86_64<br>

                      389-ds-console-1.2.6-1.el6.noarch<br>

                      389-admin-console-doc-1.1.8-1.el6.noarch<br>

                      389-ds-1.2.2-1.el6.noarch<br>

                      389-ds-base-1.2.11.15-14.el6_4.x86_64<br>

                      389-ds-base-libs-1.2.11.15-14.el6_4.x86_64<br>

                      389-admin-console-1.1.8-1.el6.noarch<br>

                      <br>

                      We are in the process of replacing the Centos 5x

                      base consumer+provider setup with a CentOS 6x base

                      one. For the time being, the CentOS 6 machines are

                      acting as consumers for the old server. They run

                      for a while and then the replicated instances

                      crash though not at the same time.<br>

                      One of the servers did not want to start after the

                      crash,</blockquote>

                    <br>

                    Can you provide the error messages from the errors

                    log?<br>

                  </blockquote>

                  I have attached error logs from the provider

                  (2013-06-27-provider_error) and the consumer

                  (2013-06-27-server_two_error) in question.<br>

                  <blockquote cite="mid:51DC1167.6040809@redhat.com"

                    type="cite"> <br>

                    <blockquote cite="mid:51DC058E.2090202@arnes.si"

                      type="cite">so I have run db2index on its

                      database. It's been running for four days and it

                      has still not finished. </blockquote>

                    <br>

                    Try exporting using db2ldif, then importing using

                    ldif2db.<br>

                  </blockquote>

                  The export process hangs. After an hour strace still

                  shows:<br>

                  futex(0x7f5822670ed4, FUTEX_WAIT, 1, NULL<br>

                  The error log for this is attached as

                  2013-07-10-server_two-ldif_import_hangs.<br>

                </blockquote>

                <br>

                Are you using db2ldif or db2ldif.pl?  If you are using

                db2ldif, is the server running?  If not, please try

                first shutting down the server and use db2ldif.<br>

                <br>

                If db2ldif still hangs, then please follow the

                instructions at <a moz-do-not-send="true"

                  class="moz-txt-link-freetext"

                  href="http://port389.org/wiki/FAQ#Debugging_Hangs">http://port389.org/wiki/FAQ#Debugging_Hangs</a>

                to get a stack trace of the hung process.<br>

              </blockquote>

              I was using db2ldif with the server shut down. I tried it

              again and it hung. The LDIF file was created but its size

              was zero. The produced stack trace is attached as

              server_two-db2ldif_hang-stacktrace.1373877200.txt.<br>

              <br>

              <blockquote cite="mid:51E026FE.5030402@redhat.com"

                type="cite"> <br>

                <blockquote cite="mid:51E0110C.8080603@arnes.si"

                  type="cite"> <br>

                  <blockquote cite="mid:51DC1167.6040809@redhat.com"

                    type="cite"> <br>

                    <blockquote cite="mid:51DC058E.2090202@arnes.si"

                      type="cite">All I get from db2index now are these

                      outputs:<br>

                      [09/Jul/2013:13:29:11 +0200] - reindex db:

                      Processed 65095 entries (pass 1104) -- average

                      rate 53686277.5/sec, recent rate 0.0/sec, hit

                      ratio 0%<br>

                    </blockquote>

                    <br>

                    How many entries do you have in your database?<br>

                  </blockquote>

                  The number revolves around 65400. It varies perhaps 2

                  user del/add operations a month and 20 attribute

                  changes per week, if that.<br>

                  <blockquote cite="mid:51DC1167.6040809@redhat.com"

                    type="cite"> <br>

                    <blockquote cite="mid:51DC058E.2090202@arnes.si"

                      type="cite"> <br>

                      The other instance did start up, but the

                      replication process did not work anymore. I

                      disabled the replication to this host and set it

                      up again. I chose "Initialize consumer now" and

                      the consumer crashed every time.</blockquote>

                    <br>

                    Can provide a stack trace of the core when the

                    server crashes?  This may be different than the

                    stack trace below.<br>

                  </blockquote>

                  The last provided stack trace was produced at the last

                  server crash. I will provide another stack trace when

                  CONSUMER_ONE crashes again. Currently it refuses to

                  crash at initialization time and keeps running.<br>

                  <blockquote cite="mid:51DC1167.6040809@redhat.com"

                    type="cite"> <br>

                    <blockquote cite="mid:51DC058E.2090202@arnes.si"

                      type="cite">I have enabled full error logging and

                      could find nothing.<br>

                      I have read a few threads (not all, I admit) on

                      this list and

                      <meta http-equiv="content-type"

                        content="text/html; charset=UTF-8">

                      <a moz-do-not-send="true"

                        href="http://directory.fedoraproject.org/wiki/FAQ#Debugging_Crashes">http://directory.fedoraproject.org/wiki/FAQ#Debugging_Crashes</a>

                      and tried to troubleshoot.<br>

                      <br>

                      The crash produced the attached core dump and I

                      could use your help with understanding it. As well

                      as any help with the crash. If more info is needed

                      I will gladly provide it.<br>

                      <br>

                      Regards, Mitja<br>

                      <br>

                      <br>

                      <fieldset class="mimeAttachmentHeader"></fieldset>

                      <br>

                      <pre wrap="">--

389 users mailing list

<a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:389-users@lists.fedoraproject.org">389-users@lists.fedoraproject.org</a>

<a moz-do-not-send="true" class="moz-txt-link-freetext" href="https://admin.fedoraproject.org/mailman/listinfo/389-users">https://admin.fedoraproject.org/mailman/listinfo/389-users</a></pre>

                    </blockquote>

                    <br>

                  </blockquote>

                  <br>

                </blockquote>

                <br>

              </blockquote>

              <br>

            </blockquote>

            <br>

          </blockquote>

          <br>

        </blockquote>

        <br>

      </blockquote>

      <br>

    </blockquote>

    <br>

  </body>

</html>