[389-users] FW: fresh replica reports "reloading ruv failed " just after successfull initialization

Fri Jun 28 15:57:58 UTC 2013

On 06/28/2013 09:55 AM, Jovan.VUKOTIC at sungard.com wrote:
>
> Thanks Rich,
>
> I will be the one to build the code myself and test the fix you suggest.
>
> I have found this link to  tar.bz2 source code archive: 
> http://directory.fedoraproject.org/wiki/Source#389_Directory_Server_1.2.11
>
> Is it the correct one?
>
Yes.
>
> Thanks,
>
> *Jovan Vukotić* • Senior Software Engineer • Ambit Treasury Management 
> • SunGard • Banking • Bulevar Milutina Milankovića 136b, Belgrade, 
> Serbia • tel: +381.11.6555-66-1 • jovan.vukotic at sungard.com 
> <mailto:jovan.vukotic at sungard.com>
>
> *From:*Rich Megginson [mailto:rmeggins at redhat.com]
> *Sent:* Friday, June 28, 2013 4:17 PM
> *To:* Vukotic, Jovan
> *Cc:* 389-users at lists.fedoraproject.org; Mehta, Cyrus
> *Subject:* Re: [389-users] FW: fresh replica reports "reloading ruv 
> failed " just after successfull initialization
>
> On 06/28/2013 03:30 AM, Jovan.VUKOTIC at sungard.com 
> <mailto:Jovan.VUKOTIC at sungard.com> wrote:
>
>     Rich,
>
>     No, I do not build the code myself.
>
>
> ok - looks like CSW packages.
>
> I'm not sure if things are going to work correctly until we get the 
> atomic op bug fixed.  Unfortunately we don't have the means to build 
> and test on Sparc.  Is there someone who can help us build and test 
> some fixes?
>
>
> At the moment, with error log level set to 40960 (32768+8192) I got a 
> bit more error messages, but they are no indicative to me whatsoever:
>
> [28/Jun/2013:05:06:03 —0400] cache_add_tentative concurrency detected
>
> [28/Jun/2013:05:06:09 —0400) NSMllReplicationPlugin — 
> _replica_configure_ruv: failed to create replica ruv tombstone entry 
> (dc=xxxxxx,dc=com); LDAP error — 68
>
> [28/Jun/2013:05:06:39 —0400] — cache_add_tentative concurrency detected
>
> [28/Jun/2013:05:06:39 —0400] NSMMReplicationPlugin 
> _replica_configure_ruv: failed to create replica ruv tombstone entry 
> (dc=xxxxxxx,dc=com); LDAP error 68
>
> [28/Jun/2013:05:07:00 —0400] Changelog purge skipped anchor csn 
> 51c5ec28000000020000
>
> [28/Jun/2013:05:07:09 —0400] cache_add_tentative concurrency detected
>
> [28/Jun/2013:05:07:09 —0400] NSMMMReplicationPlugin 
> _replica_configure_ruv: failed to create replica ruv tombstone entry 
> (dc=xxxxxx,dc=com): LDAP error 68
>
> [28/Jun/2013:05:07:39 —0400] cache_add_tentative concurrency detected
>
> [28/Jun/2013:05:07:39 —0400] NSMMReplicationPlugin — 
> _replica_configure_ruv: failed to create replica ruv tombstone entry 
> (dc=xxxxxxx,dc=com); LDAP error — 68
>
> [28/Jun/2013:05:08:09 —0400] — cache_add_tentative concurrency detected
>
> [28/Jun/2013:05:08:09 —0400] NSMMReplicationPlugin — 
> _replica_configure_ruv: failed to create replica ruv tombstone entry 
> (dc=xxxxxxx,dc=com); LDAP error — 68
>
> [28/Jun/2013:05:08:39 —0400] — cache_add_tentative concurrency detected
>
> [28/Jun/2013:05:08:39 —0400] NStlllReplicationPlugin — 
> _replica_configure_ruv: failed to create replica ruv tombstone entry 
> (dc=xxxxxx,dc=com): LDAP error — 68
>
> [28/Jun/2013:05:09:04 —0400] NSMMReplicationPlugin — changelog program 
> — _cl5GetDBFile: found DB object 13f5c40 for database 
> /var/opt/csw/lib/dirsrv/slapd—inst—dr02/changelogdb/686eae02—ldd2llb2—b3b3aede—af5e4e28_51c5c8ae000000020000.db4
>
> [28/Jun/2013:05:09:04 —0400] NSMMReplicationPlugin — changelog program 
> — cl5CetOperationCount: found DB object 13f5c40
>
> [28/Jun/2013:05:09:09 —0400] — cache_add_tentative concurrency detected
>
> [28/Jun/2013:05:09:09 —0400] NSMMReplicationPlugin — 
> _replica_configure_ruv: failed to create replica ruv tombstone entry 
> (dc=xxxxxxx,dc=com): LDAP error — 68
>
> [28/Jun/2013:05:09:39 —0400] — cache_add_tentative concurrency detected
>
> [28/Jun/2013:05:09:39 —0400] NSMMReplicationPlugin — 
> _replica_configure_ruv: failed to create replica ruv tombstone entry 
> (dc=xxxxxxx,dc=com); LDAP error — 68
>
> [28/Jun/2013:05:10:09 —0400] — cache_add_tentative concurrency detected
>
> [28/Jun/2013:05:10:09 —0400] NSMMReplicationPlugin — 
> _replica_configure_ruv: failed to create replica ruv tombstone entry 
> (dc=xxxxxxx,dc=com); LDAP error — 68
>
> Thanks,
>
> *Jovan Vukotić* • Senior Software Engineer • Ambit Treasury Management 
> • SunGard • Banking • Bulevar Milutina Milankovića 136b, Belgrade, 
> Serbia • tel: +381.11.6555-66-1 • jovan.vukotic at sungard.com 
> <mailto:jovan.vukotic at sungard.com>
>
> *Join the online conversation with SunGard’s customers, partners and 
> Industry experts and find an event near you at: **www.sungard.com/ten* 
> <http://www.capitalize-on-change.com/?email=70150000000Y1Et>*. *
>
> *From:*Rich Megginson [mailto:rmeggins at redhat.com]
> *Sent:* Thursday, June 27, 2013 6:20 PM
> *To:* Vukotic, Jovan
> *Cc:* 389-users at lists.fedoraproject.org 
> <mailto:389-users at lists.fedoraproject.org>; Mehta, Cyrus
> *Subject:* Re: [389-users] FW: fresh replica reports "reloading ruv 
> failed " just after successfull initialization
>
> On 06/27/2013 09:14 AM, Jovan.VUKOTIC at sungard.com 
> <mailto:Jovan.VUKOTIC at sungard.com> wrote:
>
>     Rich,
>
>     On Linux x86_64 and Solaris x86_64 the error cannot be reproduced,
>     only on Solaris  SPARC.
>
>     On the other hand, Solaris SPARC works fine only if it is the
>     first master replica in the multi-master array, that is, the one
>     that initializes other replicas.
>
>     Do you, perhaps, have any suggestion as to how to tune Solaris
>     SPARC platform?
>
>
> I think there is a bug in the way we handle atomic operations on 
> SPARC.  We don't develop or test on SPARC, so it's not surprising we 
> have a bug in this area.  Do you build the code yourself?
>
>
>
> I am going to add a more detailed logging to the errors file.
>
> Thanks,
> Jovan
>
> *Jovan Vukotić* • Senior Software Engineer • Ambit Treasury Management 
> • SunGard • Banking • Bulevar Milutina Milankovića 136b, Belgrade, 
> Serbia • tel: +381.11.6555-66-1 • jovan.vukotic at sungard.com 
> <mailto:jovan.vukotic at sungard.com>
>
> *Join the online conversation with SunGard’s customers, partners and 
> Industry experts and find an event near you at: **www.sungard.com/ten* 
> <http://www.capitalize-on-change.com/?email=70150000000Y1Et>*. *
>
> *From:*Rich Megginson [mailto:rmeggins at redhat.com]
> *Sent:* Monday, June 24, 2013 10:45 PM
> *To:* General discussion list for the 389 Directory server project.
> *Cc:* Vukotic, Jovan; Mehta, Cyrus
> *Subject:* Re: [389-users] FW: fresh replica reports "reloading ruv 
> failed " just after successfull initialization
>
> On 06/24/2013 09:34 AM, Jovan.VUKOTIC at sungard.com 
> <mailto:Jovan.VUKOTIC at sungard.com> wrote:
>
>     Hi,
>
>     I would like to link the issue I reported on Saturday with the bug
>     723937 filed some two years ago.
>
>     There, just as in my case, dn/entry cache entries have been
>     reported prior to the initialization of master replica.
>
>     I repeated the replication configuration today, where the
>     multi-master replica that was initialized by other replica having
>     only one entry in userRoot datase prior the initialization( root
>     object)
>
>     First, two entries were found, then 5… and then 918 (matches the
>     number of entries from the master database)
>
>     24/Jun/2013:08:16:03 -0400) - entrycache_clear_int: there are
>     still 2 entries in the entry cache.
>
>     [24/Jun/2013:08:16:03 -0400) — dncache_clear_int: there are still
>     2 dn’s in the dn cache. :/
>
>     [24/Jun/2013:08:16:03 -0400) - WARNNG Import is running with
>     nsslapd-db-private-import-mem on: No other process is allowed to
>     access the database
>
>     [24/Jun/2013:08:16:07 -04001 - import userRoot: Workers finished:
>     cleaning p...
>
>     [24/Jun/2013:08:16:07 -0400) — import userRoot: Workers cleaned up.
>
>     [24/Jun/2013:08:16:07 -0400) - import userRoot: Indexing complete.
>     Post-processing...
>
>     [24/Jun/2013:08:16:07 -0400) - import userRoot: Generating
>     numSubordinates complete.
>
>     [24/Jun/2013:08:16:07 —0400) - import userRoot: Flushing caches...
>
>     [24/Jun/2013:08:16:07 —0400) — import userRoot: Closing files...
>
>     [24/Jun/2013:08:16:07 —0400) — entrycache_clear_int: there are
>     still 5 entries in the entry cache.
>
>     [24/Jun/2013:08:16:07 -0400) - dncache_clear-int: there are still
>     918 dn’s in the dn cache. :/
>
>     [24/Jun/2013:08:16:07 -0400) - import userRoot: Import complete.
>     Processed 918 entries in 4 seconds. (229.50 entries/sac)
>
>     [24/Jun/2013:08:16:07 -0400] NSMMReplicationPlugin -
>     multimastar_be_state_change: replica dc:xxxxxx,dc=com is coming on
>
>     line: enabling replication
>
>     [24/Jun/2013:08:16:07 -0400] NSMMReplicationPlugin —
>     replica_configure_ruv: failed to create replica ruv tombstone
>     entry (dc=xxxxxx,dc—com): LDAP error — 68
>
>     I would like to add that all replicas that could not be configured
>     due to the reported errors were installed on Solaris 10 on Sparc
>     processors, whereas the only replica that was initialized
>     successfully was installed on Solaris 10 on i386 processors.
>
>
> Any chance you could try to reproduce this on a Linux x86_64 system?
>
>
>
>
> Thanks,
> Jovan
>
> *Jovan Vukotić* • Senior Software Engineer • Ambit Treasury Management 
> • SunGard • Banking • Bulevar Milutina Milankovića 136b, Belgrade, 
> Serbia • tel: +381.11.6555-66-1 • jovan.vukotic at sungard.com 
> <mailto:jovan.vukotic at sungard.com>
>
> Description: Description: Description: Description: Description: 
> coc-signature-03-2012 
> <http://www.capitalize-on-change.com/?email=70150000000Y1Et>
>
> *Join the online conversation with SunGard’s customers, partners and 
> Industry experts and find an event near you at: **www.sungard.com/ten* 
> <http://www.capitalize-on-change.com/?email=70150000000Y1Et>*. *
>
> *From:*Vukotic, Jovan
> *Sent:* Saturday, June 22, 2013 11:59 PM
> *To:* '389-users at lists.fedoraproject.org 
> <mailto:389-users at lists.fedoraproject.org>'
> *Subject:* fresh replica reports "reloading ruv failed " just after 
> successfull initialization.
>
> Hi,
>
> We have four 389 DS, version 1.2.11 that we are organizing in 
> multi-master replication topology.
>
> After I enabled all four multi-master replicas and initialized them - 
> from the one, referent replica M1 and Incremental Replication started, 
> it turned out that only two of them are included  in replication, the 
> referent M1 and M2 (replication is working in both direction)
>
> I tried to fix M3 and M4 in the following way:
>
> M3 example:
>
> removed replication agreement M1-M3 (M2-M3 did not existed, M4 
> switched off)
>
> After several database restores of pre-replication state and 
> reconfiguration of that replica, I removed 389 DS instance M3 
> completely and reinstalled it again: remove-ds-admin.pl + 
> setup-ds-admin.pl. I configured TLS/SSL (as before), restarted the DS 
> and enabled replica from 389 Console.
>
> Then I returned to M1, recreated the agreement and did  initialization 
> of M3. It was successful again, in terms that M3 imported all the 
> data, but immediately after that, to me strange errors were reported:
>
> What confuses me is that LDAP 68 means that an entry already exits… 
> even if it is a new replica. Why a tombstone?
>
> Or to make the long story short: Is the only remedy to reinstall all 
> four replica again?
>
> 22/Jun/2013:16:30:50 - 0400]     — All database tnreaas now 
> stopped                      // this is from a backup done before 
> replication configuration
>
> [22/Jun/2013:16:43:25 —0400] NSMMReplicationPlugin — 
> multimaster_be_state_change: replica xxxxxxxxxx  is going off line; 
> disablin
>
> g replication
>
> [22/Jun/2013:16:43:25 —0400] — entrycache_clear_int: there are still 
> 20 entries in the entry cache,
>
> [22/Jun/2013:16:43:25 —0400] — dncache_clear_int: there are still 20 
> dns in the dn cache. :/
>
> [22/Jun/2013:16:43:25 —0400] — WARNING: Import is running with 
> nsslapd—db—private—import—mem on; No other process is allowed to access th
>
> e database
>
> [22/Jun/2013:16:43:30 —0400] — import userRoot: Workers finished; 
> cleaning up..
>
> [22/Jun/2013:16:43:30 —0400] — import userRoot: Workers cleaned up.
>
> [22/Jun/2013:16:43:30 —0400] — import userRoot: Indexing complete. 
> Post—processing...
>
> [22/Jun/2013:16:43:30 —0400] — import userRoot: Generating 
> numSubordinates complete.
>
> [22/Jun/2013:16:43:30 —0400] — import userRoot: Flushing caches.
>
> [22/Jun/2013:16:43:30 —0400] — import userRoot: Closing files.
>
> [22/Jun/2013:16:43:30 —0400] — entrycache_clear_int: there are still 
> 20 entries in the entry cache.
>
> [22/Jun/2013:16:43:30 —0400] — dncache_clear_int: there are still 917 
> dn’s in the dn cache. :/
>
> [22/Jun/2013:16:43:30 —0400] — import userRoot: Import complete. 
> Processed 917 entries in 4 seconds, (229.25 entries/sec)
>
> [22/Jun/2013:16:43:30 —0400] NSMMRep1 icationPlugin — 
> multimaster_be_state_change: replica xxxxxxxxxxx is coming online; 
> enabling
>
> replication
>
> [22/Jun/2013:16:43:30 —0400] NSMMReplicationPlugin — 
> replica_configure_ruv: failed to create replica ruy tombstone entry 
> (xxxxxxxxxx); LDAP error — 68
>
> [22/Jun/2013:16:43:30 —0400] NSMMReplicationPlugin — 
> replica_enable_replication: reloading ruv failed
>
> [22/Jun/2013:16:43:32 —0400] NSMMReplicationPlugin — 
> _replica_configure_ruv: failed to create replica ruv tombstone entry 
> (xxxxxxxxx); LDAP error — 68
>
> [22/Jun/2013:16:44:02 —0400] NSMMReplicationPlugin — 
> replica_configure_ruv: failed to create replica ruv tombstone entry 
> (xxxxxxxxxx); LDAP error — 68
>
> [22/Jun/2013:16:44:32 —0400] NSMMReplicationPlugin — 
> _replica_configure_ruv: failed to create replica ruv tombstone entry 
> (xxxxxxxxx); LDAP error — 68
>
> [22/Jun/2013:16:45:02 —0400] NSMMReplicationPluyin — 
> _replica_confiyure_ruv: failed to create replica ruv tombstone entry 
> (xxxxxxxx); LDAP error — 68
>
> [22/Jun/2013:16:45:32 —0400] NSMMReplicationPlugin — 
> _replica_configure_ruv: failed to create replica ruv tombstone entry 
> (xxxxxxxxx); LDAP error — 68
>
> [22/Jun/2013:16:46:02 —0400] NSMMReplicationPlugin — 
> _replica_configure_ruv: failed to create replica ruv tombstone entry 
> (xxxxxxxxx); LDAP error — 68
>
> Any help will be appreciated.
>
> Thank you.
>
> *Jovan Vukotić* • Senior Software Engineer • Ambit Treasury Management 
> • SunGard • Banking • Bulevar Milutina Milankovića 136b, Belgrade, 
> Serbia • tel: +381.11.6555-66-1 • jovan.vukotic at sungard.com 
> <mailto:jovan.vukotic at sungard.com>
>
> Description: Description: Description: Description: Description: 
> coc-signature-03-2012 
> <http://www.capitalize-on-change.com/?email=70150000000Y1Et>
>
> *Join the online conversation with SunGard’s customers, partners and 
> Industry experts and find an event near you at: **www.sungard.com/ten* 
> <http://www.capitalize-on-change.com/?email=70150000000Y1Et>*. *
>
>
>
>
>
>
> --
> 389 users mailing list
> 389-users at lists.fedoraproject.org  <mailto:389-users at lists.fedoraproject.org>
> https://admin.fedoraproject.org/mailman/listinfo/389-users
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.fedoraproject.org/pipermail/389-users/attachments/20130628/956851af/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 8696 bytes
Desc: not available
URL: <http://lists.fedoraproject.org/pipermail/389-users/attachments/20130628/956851af/attachment.gif>