Steffen Blume wrote:
Hi,
I have tried to setup up multi master replication without success. The two ldap servers are running fine. Then I execute the mmr.pl script (on b): ./mmr.pl --host1 a.domain.local --host2 b.domain.local --bindpw secret --host1_id 1 --host2_id 2 --repmanpw secret --base "dc=domain, dc=local" --create
--- error log on a --- [01/Sep/2010:14:11:39 +0200] NSMMReplicationPlugin - agmt="cn="Replication to b.domain.local"" (b:389): Replica has a different generation ID than the local data. [01/Sep/2010:14:11:42 +0200] NSMMReplicationPlugin - Beginning total update of replica "agmt="cn="Replication to b.domain.local"" (b:389)". [01/Sep/2010:14:11:47 +0200] NSMMReplicationPlugin - Finished total update of replica "agmt="cn="Replication to b.domain.local"" (b:389)". Sent 1375 entries.
--- error log on b --- [01/Sep/2010:14:11:39 +0200] NSMMReplicationPlugin - agmt="cn="Replication to a.domain.local"" (a:389): Replica has a different generation ID than the local data. [01/Sep/2010:14:11:40 +0200] NSMMReplicationPlugin - repl_set_mtn_referrals: could not set referrals for replica dc=domain,dc=local: 32 [01/Sep/2010:14:11:40 +0200] NSMMReplicationPlugin - multimaster_be_state_change: replica dc=domain,dc=local is going offline; disabling replication [01/Sep/2010:14:11:41 +0200] - somehow, there are still 200 entries in the entry cache. :/ [01/Sep/2010:14:11:42 +0200] - WARNING: Import is running with nsslapd-db-private-import-mem on; No other process is allowed to access the database [01/Sep/2010:14:11:46 +0200] - import userRoot: Workers finished; cleaning up... [01/Sep/2010:14:11:46 +0200] - import userRoot: Workers cleaned up. [01/Sep/2010:14:11:46 +0200] - import userRoot: Indexing complete. Post-processing... [01/Sep/2010:14:11:46 +0200] - import userRoot: Flushing caches... [01/Sep/2010:14:11:46 +0200] - import userRoot: Closing files... [01/Sep/2010:14:11:46 +0200] - somehow, there are still 200 entries in the entry cache. :/ [01/Sep/2010:14:11:47 +0200] - import userRoot: Import complete. Processed 1375 entries in 5 seconds. (275.00 entries/sec) [01/Sep/2010:14:11:47 +0200] NSMMReplicationPlugin - multimaster_be_state_change: replica dc=domain,dc=local is coming online; enabling replication [01/Sep/2010:14:11:47 +0200] NSMMReplicationPlugin - _replica_configure_ruv: failed to create replica ruv tombstone entry (dc=domain, dc=local); LDAP error - 68
This means the RUV entry or some other MMR state information was left over from a previous configuration attempt. Err=68 is Already Exists - the entry already exists.
Since this fails, nothing else is going to work.
[01/Sep/2010:14:11:47 +0200] NSMMReplicationPlugin - replica_enable_replication: reloading ruv failed [01/Sep/2010:14:11:49 +0200] NSMMReplicationPlugin - _replica_configure_ruv: failed to create replica ruv tombstone entry (dc=domain, dc=local); LDAP error - 68 [01/Sep/2010:14:12:19 +0200] NSMMReplicationPlugin - _replica_configure_ruv: failed to create replica ruv tombstone entry (dc=domain, dc=local); LDAP error - 68 [01/Sep/2010:14:12:49 +0200] NSMMReplicationPlugin - _replica_configure_ruv: failed to create replica ruv tombstone entry (dc=domain, dc=local); LDAP error - 68 [01/Sep/2010:14:13:19 +0200] NSMMReplicationPlugin - _replica_configure_ruv: failed to create replica ruv tombstone entry (dc=domain, dc=local); LDAP error - 68 [01/Sep/2010:14:13:49 +0200] NSMMReplicationPlugin - _replica_configure_ruv: failed to create replica ruv tombstone entry (dc=domain, dc=local); LDAP error - 68
So what do the errors "repl_set_mtn_referrals: could not set referrals" and "_replica_configure_ruv: failed to create replica ruv tombstone entry" mean?
The messages on b stop, when I restart the ldap server. But the replication is not working.
Since MMR setup failed, no MMR is going to work.
On the first replication setup not all the data was copied. I removed the replication configuration with mmr.pl
I think this is the problem. Either mmr.pl does not cleanly remove the replication configuration, or there is a bug in the server. For example, see https://bugzilla.redhat.com/show_bug.cgi?id=624442
and set it up again with same error messages. When I change something (in uid=sbl,ou=people,...) on a the error log of a shows --- error log on a --- [01/Sep/2010:14:35:20 +0200] NSMMReplicationPlugin - agmt="cn="Replication to b.domain.local"" (b:389): Replica has a different generation ID than the local data. [01/Sep/2010:14:35:24 +0200] NSMMReplicationPlugin - agmt="cn="Replication to b.domain.local"" (b:389): Replica has a different generation ID than the local data. [01/Sep/2010:14:35:28 +0200] NSMMReplicationPlugin - agmt="cn="Replication to b.domain.local"" (b:389): Replica has a different generation ID than the local data. ...
This means the consumer was not initialized properly.
Nothing in error log on b. But in access log:
--- acces log on b --- [01/Sep/2010:14:35:20 +0200] conn=0 op=3 SRCH base="ou=People, dc=domain, dc=local" scope=1 filter="(objectClass=*)" attrs="objectClass" [01/Sep/2010:14:35:20 +0200] conn=0 op=7 EXT oid="2.16.840.1.113730.3.5.3" name="Netscape Replication Start Session" [01/Sep/2010:14:35:20 +0200] conn=0 op=7 RESULT err=0 tag=120 nentries=0 etime=0 [01/Sep/2010:14:35:20 +0200] conn=0 op=8 EXT oid="2.16.840.1.113730.3.5.5" name="Netscape Replication End Session" [01/Sep/2010:14:35:20 +0200] conn=0 op=8 RESULT err=0 tag=120 nentries=0 etime=0 [01/Sep/2010:14:35:20 +0200] conn=0 op=3 RESULT err=0 tag=101 nentries=100 etime=0 notes=U [01/Sep/2010:14:35:20 +0200] conn=0 op=4 SRCH base="ou=People, dc=domain, dc=local" scope=1 filter="(objectClass=*)" attrs="objectClass" [01/Sep/2010:14:35:20 +0200] conn=0 op=4 RESULT err=0 tag=101 nentries=82 etime=0 [01/Sep/2010:14:35:24 +0200] conn=0 op=10 EXT oid="2.16.840.1.113730.3.5.3" name="Netscape Replication Start Session" [01/Sep/2010:14:35:24 +0200] conn=0 op=10 RESULT err=0 tag=120 nentries=0 etime=0 [01/Sep/2010:14:35:24 +0200] conn=0 op=11 EXT oid="2.16.840.1.113730.3.5.5" name="Netscape Replication End Session" [01/Sep/2010:14:35:24 +0200] conn=0 op=11 RESULT err=0 tag=120 nentries=0 etime=0 [01/Sep/2010:14:35:25 +0200] conn=0 op=5 SRCH base="uid=sbl,ou=People,dc=domain,dc=local" scope=0 filter="(objectClass=*)" attrs=ALL [01/Sep/2010:14:35:25 +0200] conn=0 op=5 RESULT err=0 tag=101 nentries=1 etime=0 [01/Sep/2010:14:35:27 +0200] conn=0 op=12 EXT oid="2.16.840.1.113730.3.5.3" name="Netscape Replication Start Session" [01/Sep/2010:14:35:27 +0200] conn=0 op=12 RESULT err=0 tag=120 nentries=0 etime=0 [01/Sep/2010:14:35:27 +0200] conn=0 op=13 EXT oid="2.16.840.1.113730.3.5.5" name="Netscape Replication End Session" [01/Sep/2010:14:35:27 +0200] conn=0 op=13 RESULT err=0 tag=120 nentries=0 etime=0 ...
Both 389 DS versions are 1.2.4. I compiled it myself for OpenSolaris (SunOS 5.11 snv_111b)
Try 1.2.6. There have been many, many bug fixes between 1.2.4 and 1.2.6.
Regards, Steffen