I'm having some significant issues getting my multi-master servers synchronized after a network outage this past weekend. First I was getting:
error--> NSMMReplicationPlugin - agmt="cn=srv1-to-srv2" (srv2:389): Replica has a different generation ID than the local data.
Then after numerous attempts to clear out the change log and reinitialize the consumer from srv1 to srv2, and failing each time hitting a "ratio 0%" error (we increased server memory and corresponding database/cache settings to no avail):
error--> import userRoot: Processed 48136 entries -- average rate 2292.2/sec, recent rate 2292.1/sec, hit ratio 0%
Finally tried a local file restore db2ldif (with -r) and ldif2db and one from db2bak. Upon restore on both servers, now on the "good" server (srv1) I see:
error--> NSMMReplicationPlugin - replica_check_for_data_reload: Warning: data for replica dc=<mydomain>,dc=com was reloaded and it no longer matches the data in the changelog (replica data > changelog). Recreating the changelog file. This could affect replication with replica's consumers in which case the consumers should be reinitialized.
AND
error--> NSMMReplicationPlugin - csnplCommit: can't find csn 45ee0228000000010000 error--> NSMMReplicationPlugin - ruv_update_ruv: cannot commit csn 45ee0228000000010000 error--> NSMMReplicationPlugin - replica_update_ruv: unable to update RUV for replica dc=<mydomain>,dc=com, csn = 45ee0228000000010000
These are both after clearing the changelogdb (multiple times) and of course no synchronization.
At this point I am stuck and would appreciate any help in getting this resolved. First I need to resolve the "NSMMReplicationPlugin - csnplCommit: can't find csn" problems so I can try the command line again.
Thanks much!
Wendt, Trevor wrote:
I'm having some significant issues getting my multi-master servers synchronized after a network outage this past weekend. First I was getting:
error--> NSMMReplicationPlugin - agmt="cn=srv1-to-srv2" (srv2:389): Replica has a different generation ID than the local data.
Then after numerous attempts to clear out the change log and reinitialize the consumer from srv1 to srv2, and failing each time hitting a "ratio 0%" error (we increased server memory and corresponding database/cache settings to no avail):
error--> import userRoot: Processed 48136 entries -- average rate 2292.2/sec, recent rate 2292.1/sec, hit ratio 0%
Are you thinking this is an 'error' ? It looks fine to me. 2300 entries/s processed. The hit ratio won't fill out until the load has been going for a few cycles, which it may never get to with a small number of entries.
The generation ID errors sound like real errors, but those should be resolvable with the correct replica re-initialization done.
"Are you thinking this is an 'error' ? It looks fine to me. 2300 entries/s processed. The hit ratio won't fill out until the load has been going for a few cycles, which it may never get to with a small number of entries."
It get's up to 100% then it backs down to 0% and holds at a processed number. This only occurs when trying to initialize the consumer from the supplier through the console. We have well over 100k entries. Example of such behavior:
- import userRoot: Processed 21552 entries -- average rate 1077.6/sec, recent rate 1077.5/sec, hit ratio 0% - import userRoot: Processed 52769 entries -- average rate 1319.2/sec, recent rate 1319.2/sec, hit ratio 100% - import userRoot: Processed 64526 entries -- average rate 1075.4/sec, recent rate 1074.3/sec, hit ratio 100% - import userRoot: Processed 64526 entries -- average rate 806.6/sec, recent rate 293.9/sec, hit ratio 0% - import userRoot: Processed 64526 entries -- average rate 638.9/sec, recent rate 0.0/sec, hit ratio 0% - import userRoot: Processed 64526 entries -- average rate 533.3/sec, recent rate 0.0/sec, hit ratio 0% - import userRoot: Processed 64526 entries -- average rate 457.6/sec, recent rate 0.0/sec, hit ratio 0% - import userRoot: Processed 64526 entries -- average rate 400.8/sec, recent rate 0.0/sec, hit ratio 0% - import userRoot: Processed 64526 entries -- average rate 356.5/sec, recent rate 0.0/sec, hit ratio 0% - import userRoot: Processed 64526 entries -- average rate 321.0/sec, recent rate 0.0/sec, hit ratio 0% - import userRoot: Processed 64526 entries -- average rate 292.0/sec, recent rate 0.0/sec, hit ratio 0% - import userRoot: Processed 64526 entries -- average rate 267.7/sec, recent rate 0.0/sec, hit ratio 0% - import userRoot: Processed 64526 entries -- average rate 247.2/sec, recent rate 0.0/sec, hit ratio 0% - import userRoot: Processed 64526 entries -- average rate 183.8/sec, recent rate 0.0/sec, hit ratio 0% - import userRoot: Processed 64526 entries (pass 2) -- average rate 214745138.5/sec, recent rate 0.0/sec, hit ratio 0% - import userRoot: Processed 64526 entries (pass 2) -- average rate 107372569.2/sec, recent rate 0.0/sec, hit ratio 0% - import userRoot: Processed 64526 entries (pass 2) -- average rate 71581712.8/sec, recent rate 0.0/sec, hit ratio 0% - import userRoot: Processed 64526 entries (pass 2) -- average rate 53686284.6/sec, recent rate 0.0/sec, hit ratio 0% - import userRoot: Processed 64526 entries (pass 2) -- average rate 42949027.7/sec, recent rate 0.0/sec, hit ratio 0% - import userRoot: Processed 64526 entries (pass 2) -- average rate 35790856.4/sec, recent rate 0.0/sec, hit ratio 0% - import userRoot: Processed 64526 entries (pass 2) -- average rate 30677876.9/sec, recent rate 0.0/sec, hit ratio 0% - import userRoot: Processed 64526 entries (pass 2) -- average rate 26843142.3/sec, recent rate 0.0/sec, hit ratio 0% - import userRoot: Processed 64526 entries (pass 2) -- average rate 23728744.6/sec, recent rate 0.0/sec, hit ratio 0% - import userRoot: Processed 64526 entries (pass 2) -- average rate 21367675.5/sec, recent rate 0.0/sec, hit ratio 0% - import userRoot: Processed 64526 entries (pass 2) -- average rate 15505064.2/sec, recent rate 0.0/sec, hit ratio 0% - import userRoot: Processed 64526 entries (pass 2) -- average rate 14460952.1/sec, recent rate 0.0/sec, hit ratio 0% - import userRoot: Processed 64526 entries (pass 2) -- average rate 13548589.2/sec, recent rate 0.0/sec, hit ratio 0% - import userRoot: Processed 64526 entries (pass 3) -- average rate 214745138.5/sec, recent rate 0.0/sec, hit ratio 0% - import userRoot: Processed 64526 entries (pass 3) -- average rate 107372569.2/sec, recent rate 0.0/sec, hit ratio 0% - import userRoot: Processed 64526 entries (pass 3) -- average rate 71581712.8/sec, recent rate 0.0/sec, hit ratio 0% - import userRoot: Processed 64526 entries (pass 3) -- average rate 53686284.6/sec, recent rate 0.0/sec, hit ratio 0% - import userRoot: Processed 64526 entries (pass 3) -- average rate 42523789.8/sec, recent rate 0.0/sec, hit ratio 0% - import userRoot: Processed 64526 entries (pass 3) -- average rate 35495064.2/sec, recent rate 0.0/sec, hit ratio 0% - import userRoot: Processed 64526 entries (pass 3) -- average rate 30460303.3/sec, recent rate 0.0/sec, hit ratio 0% - import userRoot: Processed 64526 entries (pass 3) -- average rate 26676414.7/sec, recent rate 0.0/sec, hit ratio 0% - import userRoot: Processed 64526 entries (pass 3) -- average rate 23728744.6/sec, recent rate 0.0/sec, hit ratio 0% - import userRoot: Processed 64526 entries (pass 3) -- average rate 16776963.9/sec, recent rate 0.0/sec, hit ratio 0% - import userRoot: Processed 64526 entries (pass 3) -- average rate 15561241.9/sec, recent rate 0.0/sec, hit ratio 0% - import userRoot: Processed 64526 entries (pass 3) -- average rate 14509806.7/sec, recent rate 0.0/sec, hit ratio 0% - import userRoot: Processed 64526 entries (pass 3) -- average rate 13591464.5/sec, recent rate 0.0/sec, hit ratio 0% - import userRoot: Processed 64526 entries (pass 4) -- average rate 214745138.5/sec, recent rate 0.0/sec, hit ratio 0% - import userRoot: Processed 64526 entries (pass 4) -- average rate 107372569.2/sec, recent rate 0.0/sec, hit ratio 0% - import userRoot: Processed 64526 entries (pass 4) -- average rate 71581712.8/sec, recent rate 0.0/sec, hit ratio 0% - import userRoot: Processed 64526 entries (pass 4) -- average rate 53686284.6/sec, recent rate 0.0/sec, hit ratio 0% - import userRoot: Processed 64526 entries (pass 4) -- average rate 42949027.7/sec, recent rate 0.0/sec, hit ratio 0% - import userRoot: Processed 64526 entries (pass 4) -- average rate 35790856.4/sec, recent rate 0.0/sec, hit ratio 0% - import userRoot: Processed 64526 entries (pass 4) -- average rate 30677876.9/sec, recent rate 0.0/sec, hit ratio 0% - import userRoot: Processed 64526 entries (pass 4) -- average rate 26843142.3/sec, recent rate 0.0/sec, hit ratio 0% - import userRoot: Processed 64526 entries (pass 4) -- average rate 16776963.9/sec, recent rate 0.0/sec, hit ratio 0% - import userRoot: Processed 64526 entries (pass 4) -- average rate 15561241.9/sec, recent rate 0.0/sec, hit ratio 0% - import userRoot: Processed 64526 entries (pass 4) -- average rate 14509806.7/sec, recent rate 0.0/sec, hit ratio 0% - import userRoot: Processed 64526 entries (pass 4) -- average rate 13591464.5/sec, recent rate 0.0/sec, hit ratio 0% <I killed the process at this point>
###################### "The generation ID errors sound like real errors, but those should be resolvable with the correct replica re-initialization done."
I've tried re-initializing the consumer multiple times with no success. The NSMMReplicationPlugin - replica_check_for_data_reload and the "csn" errors are on my supplier server. When my srv2 went offline my srv1 became the "Master" so I can't go from srv2 to srv1 without losing entries. This is the dilemma...
Thanks for you're suggestions. Please, keep them coming.
###################### https://www.redhat.com/archives/fedora-directory-users/2007-March/msg000 20.html
Wendt, Trevor wrote:
###################### "The generation ID errors sound like real errors, but those should be resolvable with the correct replica re-initialization done."
I've tried re-initializing the consumer multiple times with no success. The NSMMReplicationPlugin - replica_check_for_data_reload and the "csn" errors are on my supplier server. When my srv2 went offline my srv1 became the "Master" so I can't go from srv2 to srv1 without losing entries. This is the dilemma...
Thanks for you're suggestions. Please, keep them coming.
######################
Can you show us the RUV from the server that produces the csnplCommit error? ldapsearch -x -D "cn=directory manager" -W -s base -b "nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff,dc=<mydomain>,dc=com" "(&(objectclass=*)(objectclass=nstombstone))"
And your replica configuration? ldapsearch -x -D "cn=directory manager" -W -b "cn=config" "(objectclass=nsds5replica)"
"Can you show us the RUV from the server that produces the csnplCommit error?" All I get is "ldap_search: No such object"
Replica Configuration -- in it's current state. version: 1 dn: cn=replica,cn="dc=<mydomain>,dc=com",cn=mapping tree,cn=config objectClass: nsDS5Replica objectClass: top nsDS5ReplicaRoot: dc=<mydomain>,dc=com nsDS5ReplicaType: 3 nsDS5Flags: 1 nsDS5ReplicaId: 1 nsds5ReplicaPurgeDelay: 604800 nsDS5ReplicaBindDN: uid=<RepUserId>,cn=config cn: replica nsState:: AQAAAAQh7kUAAAAAAAAAAAEAAAA= nsDS5ReplicaName: 1848ed03-1dd211b2-808393a4-a3ae0000 nsds5ReplicaChangeCount: 41 nsds5replicareapactive: 0
-----Original Message-----
"The generation ID errors sound like real errors, but those should be resolvable with the correct replica re-initialization done."
I've tried re-initializing the consumer multiple times with no
success.
The NSMMReplicationPlugin - replica_check_for_data_reload and the
"csn"
errors are on my supplier server. When my srv2 went offline my srv1 became the "Master" so I can't go from srv2 to srv1 without losing entries. This is the dilemma...
Thanks for you're suggestions. Please, keep them coming.
Can you show us the RUV from the server that produces the csnplCommit error? ldapsearch -x -D "cn=directory manager" -W -s base -b "nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff,dc=<mydomain>,dc=com" "(&(objectclass=*)(objectclass=nstombstone))"
And your replica configuration? ldapsearch -x -D "cn=directory manager" -W -b "cn=config" "(objectclass=nsds5replica)"
####### REF: https://www.redhat.com/archives/fedora-directory-users/2007-March/msg000 20.html
I'd love to know how your RUV could be missing. I wonder if whatever problem left you with mismatched generation ID still persists, it seemed odd that happened after a network outage. If the RUV entry was missing that would explain it, that's where the generation ID of the local data is stored.
Did you get replication running again? Do your masters have updates that need to be merged? If they're in sync or are only diverged by testing-related updates, I would try... back up my two masters, for potential restore or future investigation (db2bak) export one master to LDIF without -r (db2ldif) import that LDIF into the same master (ldif2db) perform the search for the RUV entry to make sure its created this time after the import completes (ldapsearch for the magic nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff entry) initialize the other master, either via online initialization or LDIF with -r (db2ldif -r)
And if you can determine how you ended up in the state where the RUV is missing, post at bugzilla.redhat.com
Wendt, Trevor wrote:
"Can you show us the RUV from the server that produces the csnplCommit error?" All I get is "ldap_search: No such object"
Replica Configuration -- in it's current state. version: 1 dn: cn=replica,cn="dc=<mydomain>,dc=com",cn=mapping tree,cn=config objectClass: nsDS5Replica objectClass: top nsDS5ReplicaRoot: dc=<mydomain>,dc=com nsDS5ReplicaType: 3 nsDS5Flags: 1 nsDS5ReplicaId: 1 nsds5ReplicaPurgeDelay: 604800 nsDS5ReplicaBindDN: uid=<RepUserId>,cn=config cn: replica nsState:: AQAAAAQh7kUAAAAAAAAAAAEAAAA= nsDS5ReplicaName: 1848ed03-1dd211b2-808393a4-a3ae0000 nsds5ReplicaChangeCount: 41 nsds5replicareapactive: 0
-----Original Message-----
"The generation ID errors sound like real errors, but those should be resolvable with the correct replica re-initialization done."
I've tried re-initializing the consumer multiple times with no
success.
The NSMMReplicationPlugin - replica_check_for_data_reload and the
"csn"
errors are on my supplier server. When my srv2 went offline my srv1 became the "Master" so I can't go from srv2 to srv1 without losing entries. This is the dilemma...
Thanks for you're suggestions. Please, keep them coming.
Can you show us the RUV from the server that produces the csnplCommit error? ldapsearch -x -D "cn=directory manager" -W -s base -b "nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff,dc=<mydomain>,dc=com" "(&(objectclass=*)(objectclass=nstombstone))"
And your replica configuration? ldapsearch -x -D "cn=directory manager" -W -b "cn=config" "(objectclass=nsds5replica)"
####### REF: https://www.redhat.com/archives/fedora-directory-users/2007-March/msg000 20.html
-- Fedora-directory-users mailing list Fedora-directory-users@redhat.com https://www.redhat.com/mailman/listinfo/fedora-directory-users
389-users@lists.fedoraproject.org