[389-users] FW: fresh replica reports "reloading ruv failed " just after successfull initialization
Rich Megginson
rmeggins at redhat.com
Fri Jun 28 15:57:58 UTC 2013
On 06/28/2013 09:55 AM, Jovan.VUKOTIC at sungard.com wrote:
>
> Thanks Rich,
>
> I will be the one to build the code myself and test the fix you suggest.
>
> I have found this link to tar.bz2 source code archive:
> http://directory.fedoraproject.org/wiki/Source#389_Directory_Server_1.2.11
>
> Is it the correct one?
>
Yes.
>
> Thanks,
>
> *Jovan Vukotić* • Senior Software Engineer • Ambit Treasury Management
> • SunGard • Banking • Bulevar Milutina Milankovića 136b, Belgrade,
> Serbia • tel: +381.11.6555-66-1 • jovan.vukotic at sungard.com
> <mailto:jovan.vukotic at sungard.com>
>
> *From:*Rich Megginson [mailto:rmeggins at redhat.com]
> *Sent:* Friday, June 28, 2013 4:17 PM
> *To:* Vukotic, Jovan
> *Cc:* 389-users at lists.fedoraproject.org; Mehta, Cyrus
> *Subject:* Re: [389-users] FW: fresh replica reports "reloading ruv
> failed " just after successfull initialization
>
> On 06/28/2013 03:30 AM, Jovan.VUKOTIC at sungard.com
> <mailto:Jovan.VUKOTIC at sungard.com> wrote:
>
> Rich,
>
> No, I do not build the code myself.
>
>
> ok - looks like CSW packages.
>
> I'm not sure if things are going to work correctly until we get the
> atomic op bug fixed. Unfortunately we don't have the means to build
> and test on Sparc. Is there someone who can help us build and test
> some fixes?
>
>
> At the moment, with error log level set to 40960 (32768+8192) I got a
> bit more error messages, but they are no indicative to me whatsoever:
>
> [28/Jun/2013:05:06:03 —0400] cache_add_tentative concurrency detected
>
> [28/Jun/2013:05:06:09 —0400) NSMllReplicationPlugin —
> _replica_configure_ruv: failed to create replica ruv tombstone entry
> (dc=xxxxxx,dc=com); LDAP error — 68
>
> [28/Jun/2013:05:06:39 —0400] — cache_add_tentative concurrency detected
>
> [28/Jun/2013:05:06:39 —0400] NSMMReplicationPlugin
> _replica_configure_ruv: failed to create replica ruv tombstone entry
> (dc=xxxxxxx,dc=com); LDAP error 68
>
> [28/Jun/2013:05:07:00 —0400] Changelog purge skipped anchor csn
> 51c5ec28000000020000
>
> [28/Jun/2013:05:07:09 —0400] cache_add_tentative concurrency detected
>
> [28/Jun/2013:05:07:09 —0400] NSMMMReplicationPlugin
> _replica_configure_ruv: failed to create replica ruv tombstone entry
> (dc=xxxxxx,dc=com): LDAP error 68
>
> [28/Jun/2013:05:07:39 —0400] cache_add_tentative concurrency detected
>
> [28/Jun/2013:05:07:39 —0400] NSMMReplicationPlugin —
> _replica_configure_ruv: failed to create replica ruv tombstone entry
> (dc=xxxxxxx,dc=com); LDAP error — 68
>
> [28/Jun/2013:05:08:09 —0400] — cache_add_tentative concurrency detected
>
> [28/Jun/2013:05:08:09 —0400] NSMMReplicationPlugin —
> _replica_configure_ruv: failed to create replica ruv tombstone entry
> (dc=xxxxxxx,dc=com); LDAP error — 68
>
> [28/Jun/2013:05:08:39 —0400] — cache_add_tentative concurrency detected
>
> [28/Jun/2013:05:08:39 —0400] NStlllReplicationPlugin —
> _replica_configure_ruv: failed to create replica ruv tombstone entry
> (dc=xxxxxx,dc=com): LDAP error — 68
>
> [28/Jun/2013:05:09:04 —0400] NSMMReplicationPlugin — changelog program
> — _cl5GetDBFile: found DB object 13f5c40 for database
> /var/opt/csw/lib/dirsrv/slapd—inst—dr02/changelogdb/686eae02—ldd2llb2—b3b3aede—af5e4e28_51c5c8ae000000020000.db4
>
> [28/Jun/2013:05:09:04 —0400] NSMMReplicationPlugin — changelog program
> — cl5CetOperationCount: found DB object 13f5c40
>
> [28/Jun/2013:05:09:09 —0400] — cache_add_tentative concurrency detected
>
> [28/Jun/2013:05:09:09 —0400] NSMMReplicationPlugin —
> _replica_configure_ruv: failed to create replica ruv tombstone entry
> (dc=xxxxxxx,dc=com): LDAP error — 68
>
> [28/Jun/2013:05:09:39 —0400] — cache_add_tentative concurrency detected
>
> [28/Jun/2013:05:09:39 —0400] NSMMReplicationPlugin —
> _replica_configure_ruv: failed to create replica ruv tombstone entry
> (dc=xxxxxxx,dc=com); LDAP error — 68
>
> [28/Jun/2013:05:10:09 —0400] — cache_add_tentative concurrency detected
>
> [28/Jun/2013:05:10:09 —0400] NSMMReplicationPlugin —
> _replica_configure_ruv: failed to create replica ruv tombstone entry
> (dc=xxxxxxx,dc=com); LDAP error — 68
>
> Thanks,
>
> *Jovan Vukotić* • Senior Software Engineer • Ambit Treasury Management
> • SunGard • Banking • Bulevar Milutina Milankovića 136b, Belgrade,
> Serbia • tel: +381.11.6555-66-1 • jovan.vukotic at sungard.com
> <mailto:jovan.vukotic at sungard.com>
>
> *Join the online conversation with SunGard’s customers, partners and
> Industry experts and find an event near you at: **www.sungard.com/ten*
> <http://www.capitalize-on-change.com/?email=70150000000Y1Et>*. *
>
> *From:*Rich Megginson [mailto:rmeggins at redhat.com]
> *Sent:* Thursday, June 27, 2013 6:20 PM
> *To:* Vukotic, Jovan
> *Cc:* 389-users at lists.fedoraproject.org
> <mailto:389-users at lists.fedoraproject.org>; Mehta, Cyrus
> *Subject:* Re: [389-users] FW: fresh replica reports "reloading ruv
> failed " just after successfull initialization
>
> On 06/27/2013 09:14 AM, Jovan.VUKOTIC at sungard.com
> <mailto:Jovan.VUKOTIC at sungard.com> wrote:
>
> Rich,
>
> On Linux x86_64 and Solaris x86_64 the error cannot be reproduced,
> only on Solaris SPARC.
>
> On the other hand, Solaris SPARC works fine only if it is the
> first master replica in the multi-master array, that is, the one
> that initializes other replicas.
>
> Do you, perhaps, have any suggestion as to how to tune Solaris
> SPARC platform?
>
>
> I think there is a bug in the way we handle atomic operations on
> SPARC. We don't develop or test on SPARC, so it's not surprising we
> have a bug in this area. Do you build the code yourself?
>
>
>
> I am going to add a more detailed logging to the errors file.
>
> Thanks,
> Jovan
>
> *Jovan Vukotić* • Senior Software Engineer • Ambit Treasury Management
> • SunGard • Banking • Bulevar Milutina Milankovića 136b, Belgrade,
> Serbia • tel: +381.11.6555-66-1 • jovan.vukotic at sungard.com
> <mailto:jovan.vukotic at sungard.com>
>
> *Join the online conversation with SunGard’s customers, partners and
> Industry experts and find an event near you at: **www.sungard.com/ten*
> <http://www.capitalize-on-change.com/?email=70150000000Y1Et>*. *
>
> *From:*Rich Megginson [mailto:rmeggins at redhat.com]
> *Sent:* Monday, June 24, 2013 10:45 PM
> *To:* General discussion list for the 389 Directory server project.
> *Cc:* Vukotic, Jovan; Mehta, Cyrus
> *Subject:* Re: [389-users] FW: fresh replica reports "reloading ruv
> failed " just after successfull initialization
>
> On 06/24/2013 09:34 AM, Jovan.VUKOTIC at sungard.com
> <mailto:Jovan.VUKOTIC at sungard.com> wrote:
>
> Hi,
>
> I would like to link the issue I reported on Saturday with the bug
> 723937 filed some two years ago.
>
> There, just as in my case, dn/entry cache entries have been
> reported prior to the initialization of master replica.
>
> I repeated the replication configuration today, where the
> multi-master replica that was initialized by other replica having
> only one entry in userRoot datase prior the initialization( root
> object)
>
> First, two entries were found, then 5… and then 918 (matches the
> number of entries from the master database)
>
> 24/Jun/2013:08:16:03 -0400) - entrycache_clear_int: there are
> still 2 entries in the entry cache.
>
> [24/Jun/2013:08:16:03 -0400) — dncache_clear_int: there are still
> 2 dn’s in the dn cache. :/
>
> [24/Jun/2013:08:16:03 -0400) - WARNNG Import is running with
> nsslapd-db-private-import-mem on: No other process is allowed to
> access the database
>
> [24/Jun/2013:08:16:07 -04001 - import userRoot: Workers finished:
> cleaning p...
>
> [24/Jun/2013:08:16:07 -0400) — import userRoot: Workers cleaned up.
>
> [24/Jun/2013:08:16:07 -0400) - import userRoot: Indexing complete.
> Post-processing...
>
> [24/Jun/2013:08:16:07 -0400) - import userRoot: Generating
> numSubordinates complete.
>
> [24/Jun/2013:08:16:07 —0400) - import userRoot: Flushing caches...
>
> [24/Jun/2013:08:16:07 —0400) — import userRoot: Closing files...
>
> [24/Jun/2013:08:16:07 —0400) — entrycache_clear_int: there are
> still 5 entries in the entry cache.
>
> [24/Jun/2013:08:16:07 -0400) - dncache_clear-int: there are still
> 918 dn’s in the dn cache. :/
>
> [24/Jun/2013:08:16:07 -0400) - import userRoot: Import complete.
> Processed 918 entries in 4 seconds. (229.50 entries/sac)
>
> [24/Jun/2013:08:16:07 -0400] NSMMReplicationPlugin -
> multimastar_be_state_change: replica dc:xxxxxx,dc=com is coming on
>
> line: enabling replication
>
> [24/Jun/2013:08:16:07 -0400] NSMMReplicationPlugin —
> replica_configure_ruv: failed to create replica ruv tombstone
> entry (dc=xxxxxx,dc—com): LDAP error — 68
>
> I would like to add that all replicas that could not be configured
> due to the reported errors were installed on Solaris 10 on Sparc
> processors, whereas the only replica that was initialized
> successfully was installed on Solaris 10 on i386 processors.
>
>
> Any chance you could try to reproduce this on a Linux x86_64 system?
>
>
>
>
> Thanks,
> Jovan
>
> *Jovan Vukotić* • Senior Software Engineer • Ambit Treasury Management
> • SunGard • Banking • Bulevar Milutina Milankovića 136b, Belgrade,
> Serbia • tel: +381.11.6555-66-1 • jovan.vukotic at sungard.com
> <mailto:jovan.vukotic at sungard.com>
>
> Description: Description: Description: Description: Description:
> coc-signature-03-2012
> <http://www.capitalize-on-change.com/?email=70150000000Y1Et>
>
> *Join the online conversation with SunGard’s customers, partners and
> Industry experts and find an event near you at: **www.sungard.com/ten*
> <http://www.capitalize-on-change.com/?email=70150000000Y1Et>*. *
>
> *From:*Vukotic, Jovan
> *Sent:* Saturday, June 22, 2013 11:59 PM
> *To:* '389-users at lists.fedoraproject.org
> <mailto:389-users at lists.fedoraproject.org>'
> *Subject:* fresh replica reports "reloading ruv failed " just after
> successfull initialization.
>
> Hi,
>
> We have four 389 DS, version 1.2.11 that we are organizing in
> multi-master replication topology.
>
> After I enabled all four multi-master replicas and initialized them -
> from the one, referent replica M1 and Incremental Replication started,
> it turned out that only two of them are included in replication, the
> referent M1 and M2 (replication is working in both direction)
>
> I tried to fix M3 and M4 in the following way:
>
> M3 example:
>
> removed replication agreement M1-M3 (M2-M3 did not existed, M4
> switched off)
>
> After several database restores of pre-replication state and
> reconfiguration of that replica, I removed 389 DS instance M3
> completely and reinstalled it again: remove-ds-admin.pl +
> setup-ds-admin.pl. I configured TLS/SSL (as before), restarted the DS
> and enabled replica from 389 Console.
>
> Then I returned to M1, recreated the agreement and did initialization
> of M3. It was successful again, in terms that M3 imported all the
> data, but immediately after that, to me strange errors were reported:
>
> What confuses me is that LDAP 68 means that an entry already exits…
> even if it is a new replica. Why a tombstone?
>
> Or to make the long story short: Is the only remedy to reinstall all
> four replica again?
>
> 22/Jun/2013:16:30:50 - 0400] — All database tnreaas now
> stopped // this is from a backup done before
> replication configuration
>
> [22/Jun/2013:16:43:25 —0400] NSMMReplicationPlugin —
> multimaster_be_state_change: replica xxxxxxxxxx is going off line;
> disablin
>
> g replication
>
> [22/Jun/2013:16:43:25 —0400] — entrycache_clear_int: there are still
> 20 entries in the entry cache,
>
> [22/Jun/2013:16:43:25 —0400] — dncache_clear_int: there are still 20
> dns in the dn cache. :/
>
> [22/Jun/2013:16:43:25 —0400] — WARNING: Import is running with
> nsslapd—db—private—import—mem on; No other process is allowed to access th
>
> e database
>
> [22/Jun/2013:16:43:30 —0400] — import userRoot: Workers finished;
> cleaning up..
>
> [22/Jun/2013:16:43:30 —0400] — import userRoot: Workers cleaned up.
>
> [22/Jun/2013:16:43:30 —0400] — import userRoot: Indexing complete.
> Post—processing...
>
> [22/Jun/2013:16:43:30 —0400] — import userRoot: Generating
> numSubordinates complete.
>
> [22/Jun/2013:16:43:30 —0400] — import userRoot: Flushing caches.
>
> [22/Jun/2013:16:43:30 —0400] — import userRoot: Closing files.
>
> [22/Jun/2013:16:43:30 —0400] — entrycache_clear_int: there are still
> 20 entries in the entry cache.
>
> [22/Jun/2013:16:43:30 —0400] — dncache_clear_int: there are still 917
> dn’s in the dn cache. :/
>
> [22/Jun/2013:16:43:30 —0400] — import userRoot: Import complete.
> Processed 917 entries in 4 seconds, (229.25 entries/sec)
>
> [22/Jun/2013:16:43:30 —0400] NSMMRep1 icationPlugin —
> multimaster_be_state_change: replica xxxxxxxxxxx is coming online;
> enabling
>
> replication
>
> [22/Jun/2013:16:43:30 —0400] NSMMReplicationPlugin —
> replica_configure_ruv: failed to create replica ruy tombstone entry
> (xxxxxxxxxx); LDAP error — 68
>
> [22/Jun/2013:16:43:30 —0400] NSMMReplicationPlugin —
> replica_enable_replication: reloading ruv failed
>
> [22/Jun/2013:16:43:32 —0400] NSMMReplicationPlugin —
> _replica_configure_ruv: failed to create replica ruv tombstone entry
> (xxxxxxxxx); LDAP error — 68
>
> [22/Jun/2013:16:44:02 —0400] NSMMReplicationPlugin —
> replica_configure_ruv: failed to create replica ruv tombstone entry
> (xxxxxxxxxx); LDAP error — 68
>
> [22/Jun/2013:16:44:32 —0400] NSMMReplicationPlugin —
> _replica_configure_ruv: failed to create replica ruv tombstone entry
> (xxxxxxxxx); LDAP error — 68
>
> [22/Jun/2013:16:45:02 —0400] NSMMReplicationPluyin —
> _replica_confiyure_ruv: failed to create replica ruv tombstone entry
> (xxxxxxxx); LDAP error — 68
>
> [22/Jun/2013:16:45:32 —0400] NSMMReplicationPlugin —
> _replica_configure_ruv: failed to create replica ruv tombstone entry
> (xxxxxxxxx); LDAP error — 68
>
> [22/Jun/2013:16:46:02 —0400] NSMMReplicationPlugin —
> _replica_configure_ruv: failed to create replica ruv tombstone entry
> (xxxxxxxxx); LDAP error — 68
>
> Any help will be appreciated.
>
> Thank you.
>
> *Jovan Vukotić* • Senior Software Engineer • Ambit Treasury Management
> • SunGard • Banking • Bulevar Milutina Milankovića 136b, Belgrade,
> Serbia • tel: +381.11.6555-66-1 • jovan.vukotic at sungard.com
> <mailto:jovan.vukotic at sungard.com>
>
> Description: Description: Description: Description: Description:
> coc-signature-03-2012
> <http://www.capitalize-on-change.com/?email=70150000000Y1Et>
>
> *Join the online conversation with SunGard’s customers, partners and
> Industry experts and find an event near you at: **www.sungard.com/ten*
> <http://www.capitalize-on-change.com/?email=70150000000Y1Et>*. *
>
>
>
>
>
>
> --
> 389 users mailing list
> 389-users at lists.fedoraproject.org <mailto:389-users at lists.fedoraproject.org>
> https://admin.fedoraproject.org/mailman/listinfo/389-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.fedoraproject.org/pipermail/389-users/attachments/20130628/956851af/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 8696 bytes
Desc: not available
URL: <http://lists.fedoraproject.org/pipermail/389-users/attachments/20130628/956851af/attachment.gif>
More information about the 389-users
mailing list