<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 TRANSITIONAL//EN">
<HTML>
<HEAD>
  <META HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=UTF-8">
  <META NAME="GENERATOR" CONTENT="GtkHTML/3.24.1.1">
</HEAD>
<BODY>
The day before the date in the error (when the errors started), we we had to delete two suffix databases from the console (they were damaged), create them again, and reinitialize those databases from other supplier. The database of the agreement throwing errors is the userRoot (dc=example,dc=com). The databases recreated were the suffixes o=cabu,dc=sacyl,dc=es and o=husa,dc=sacyl,dc=es.<BR>
<BR>
This is the error log from server1 (this did not crash, this server initialized the server2, that crashed):<BR>
<BR>
===========================<BR>
<BR>
[20/Apr/2009:14:18:28 +0200] NSMMReplicationPlugin - Beginning total update of replica &quot;agmt=&quot;cn=CABU_ppal-GRS_back&quot; (grsgscvalp0102:636)&quot;.<BR>
[20/Apr/2009:14:18:39 +0200] NSMMReplicationPlugin - Finished total update of replica &quot;agmt=&quot;cn=CABU_ppal-GRS_back&quot; (grsgscvalp0102:636)&quot;. Sent 4108 entries.<BR>
[20/Apr/2009:14:25:33 +0200] NSMMReplicationPlugin - Beginning total update of replica &quot;agmt=&quot;cn=HUSA_ppal-GRS_back&quot; (grsgscvalp0102:636)&quot;.<BR>
[20/Apr/2009:14:25:43 +0200] NSMMReplicationPlugin - Finished total update of replica &quot;agmt=&quot;cn=HUSA_ppal-GRS_back&quot; (grsgscvalp0102:636)&quot;. Sent 2650 entries.<BR>
[21/Apr/2009:10:50:47 +0200] - slapd shutting down - signaling operation threads<BR>
<BR>
===========================<BR>
<BR>
And this is the log from server2, where the databases crashed. The log shows the deletion of the agreements, the deletion of the databases, the creation of the databases and the initialization of them from server1. The messages from day 21 are when we tried to force to send the updates:<BR>
<BR>
===========================<BR>
<BR>
[20/Apr/2009:14:13:20 +0200] NSMMReplicationPlugin - agmt_delete: begin<BR>
[20/Apr/2009:14:13:21 +0200] NSMMReplicationPlugin - multimaster_be_state_change: replica dc=sacyl,dc=es is about to be deleted; disabling replication<BR>
[20/Apr/2009:14:14:16 +0200] - ldbm: Bringing o_cabu_dc_sacyl_dc_es offline...<BR>
[20/Apr/2009:14:14:16 +0200] - ldbm: removing 'o_cabu_dc_sacyl_dc_es'.<BR>
[20/Apr/2009:14:14:16 +0200] - Destructor for instance o_cabu_dc_sacyl_dc_es called<BR>
[20/Apr/2009:14:14:44 +0200] - No symmetric key found for cipher AES in backend o_cabu_dc_sacyl_dc_es, attempting to create one...<BR>
[20/Apr/2009:14:14:44 +0200] - Key for cipher AES successfully generated and stored<BR>
[20/Apr/2009:14:14:44 +0200] - No symmetric key found for cipher 3DES in backend o_cabu_dc_sacyl_dc_es, attempting to create one...<BR>
[20/Apr/2009:14:14:45 +0200] - Key for cipher 3DES successfully generated and stored<BR>
[20/Apr/2009:14:17:08 +0200] NSMMReplicationPlugin - agmt=&quot;cn=CABU_back-GRS_ppal&quot; (grsgscvalp0101:636): Replica has a different generation ID than the local <BR>
data.<BR>
[20/Apr/2009:14:18:11 +0200] NSMMReplicationPlugin - multimaster_be_state_change: replica o=cabu,dc=sacyl,dc=es is going offline; disabling replication<BR>
[20/Apr/2009:14:18:13 +0200] - WARNING: Import is running with nsslapd-db-private-import-mem on; No other process is allowed to access the database<BR>
[20/Apr/2009:14:18:35 +0200] - import o_cabu_dc_sacyl_dc_es: Workers finished; cleaning up...<BR>
[20/Apr/2009:14:18:36 +0200] - import o_cabu_dc_sacyl_dc_es: Workers cleaned up.<BR>
[20/Apr/2009:14:18:36 +0200] - import o_cabu_dc_sacyl_dc_es: Indexing complete.&nbsp; Post-processing...<BR>
[20/Apr/2009:14:18:36 +0200] - import o_cabu_dc_sacyl_dc_es: Flushing caches...<BR>
[20/Apr/2009:14:18:36 +0200] - import o_cabu_dc_sacyl_dc_es: Closing files...<BR>
[20/Apr/2009:14:18:38 +0200] - import o_cabu_dc_sacyl_dc_es: Import complete.&nbsp; Processed 4108 entries in 12 seconds. (342.33 entries/sec)<BR>
[20/Apr/2009:14:18:39 +0200] NSMMReplicationPlugin - multimaster_be_state_change: replica o=cabu,dc=sacyl,dc=es is coming online; enabling replication<BR>
[20/Apr/2009:14:20:09 +0200] NSMMReplicationPlugin - replica_config_delete: Warning: The changelog for replica o=husa,dc=sacyl,dc=es is no longer valid since<BR>
 the replica config is being deleted.&nbsp; Removing the changelog.<BR>
[20/Apr/2009:14:20:10 +0200] NSMMReplicationPlugin - agmt_delete: begin<BR>
[20/Apr/2009:14:20:12 +0200] NSMMReplicationPlugin - multimaster_be_state_change: replica dc=sacyl,dc=es is about to be deleted; disabling replication<BR>
[20/Apr/2009:14:20:42 +0200] - ldbm: Bringing o_husa_dc_sacyl_dc_es offline...<BR>
[20/Apr/2009:14:20:42 +0200] - ldbm: removing 'o_husa_dc_sacyl_dc_es'.<BR>
[20/Apr/2009:14:20:42 +0200] - Destructor for instance o_husa_dc_sacyl_dc_es called<BR>
[20/Apr/2009:14:21:10 +0200] - No symmetric key found for cipher AES in backend o_husa_dc_sacyl_dc_es, attempting to create one...<BR>
[20/Apr/2009:14:21:10 +0200] - Key for cipher AES successfully generated and stored<BR>
[20/Apr/2009:14:21:10 +0200] - No symmetric key found for cipher 3DES in backend o_husa_dc_sacyl_dc_es, attempting to create one...<BR>
[20/Apr/2009:14:21:10 +0200] - Key for cipher 3DES successfully generated and stored<BR>
[20/Apr/2009:14:24:23 +0200] NSMMReplicationPlugin - agmt=&quot;cn=HUSA_back-GRS_ppal&quot; (grsgscvalp0101:636): Replica has a different generation ID than the local <BR>
data.<BR>
[20/Apr/2009:14:25:18 +0200] NSMMReplicationPlugin - multimaster_be_state_change: replica o=husa,dc=sacyl,dc=es is going offline; disabling replication<BR>
[20/Apr/2009:14:25:20 +0200] - WARNING: Import is running with nsslapd-db-private-import-mem on; No other process is allowed to access the database<BR>
[20/Apr/2009:14:25:39 +0200] - import o_husa_dc_sacyl_dc_es: Workers finished; cleaning up...<BR>
[20/Apr/2009:14:25:40 +0200] - import o_husa_dc_sacyl_dc_es: Workers cleaned up.<BR>
[20/Apr/2009:14:25:40 +0200] - import o_husa_dc_sacyl_dc_es: Indexing complete.&nbsp; Post-processing...<BR>
[20/Apr/2009:14:25:40 +0200] - import o_husa_dc_sacyl_dc_es: Flushing caches...<BR>
[20/Apr/2009:14:25:40 +0200] - import o_husa_dc_sacyl_dc_es: Closing files...<BR>
[20/Apr/2009:14:25:42 +0200] - import o_husa_dc_sacyl_dc_es: Import complete.&nbsp; Processed 2650 entries in 8 seconds. (331.25 entries/sec)<BR>
[20/Apr/2009:14:25:42 +0200] NSMMReplicationPlugin - multimaster_be_state_change: replica o=husa,dc=sacyl,dc=es is coming online; enabling replication<BR>
[21/Apr/2009:10:50:07 +0200] NSMMReplicationPlugin - Replication agreement for agmt=&quot;cn=GRS_back-GRS_ppal&quot; (grsgscvalp0101:636) could not be updated. For rep<BR>
lication to take place, please enable the suffix and restart the server<BR>
[21/Apr/2009:10:50:07 +0200] NSMMReplicationPlugin - Replication agreement for agmt=&quot;cn=GRS_back-GRS_ppal&quot; (grsgscvalp0101:636) could not be updated. For rep<BR>
lication to take place, please enable the suffix and restart the server<BR>
<BR>
<BR>
===========================<BR>
<BR>
<BR>
El mar, 21-04-2009 a las 09:21 -0600, Rich Megginson escribi&#243;:
<BLOCKQUOTE TYPE=CITE>
<PRE>
Juan Asensio S&#225;nchez wrote:
&gt; Hi
&gt;
&gt; Since yesterday I am having troubles with replication between two 
&gt; servers. The replica is in multimaster mode in both servers, and 
&gt; everything is configured OK (database, suffixes, changelog, replica, 
&gt; agreements; until yesterday everything worked OK).
&gt;
&gt; [21/Apr/2009:11:04:57 +0200] NSMMReplicationPlugin - Replication 
&gt; agreement for agmt=&quot;cn=GRS_back-GRS_ppal&quot; (grsgscvalp0101:636) could 
&gt; not be updated. For replication to take place, please enable the 
&gt; suffix and restart the server
What changed?  Everything was working, then suddenly it's not?  
Something must have changed, perhaps even something that did not seem 
related to this problem.  Do you know when things started failing?  Did 
you examine the access and error logs on the supplier and consumer from 
around the time of the failure?
&gt;
&gt; The only thing to mention are replication problems with other 
&gt; databases and replicas, but not for the replica of the agreement in 
&gt; the message. They were fixed re-initializing the consumers of those 
&gt; replicas. Any idea?
&gt;
&gt; Regards and thanks in advance.
&gt; ------------------------------------------------------------------------
&gt;
&gt; --
&gt; Fedora-directory-users mailing list
&gt; <A HREF="mailto:Fedora-directory-users@redhat.com">Fedora-directory-users@redhat.com</A>
&gt; <A HREF="https://www.redhat.com/mailman/listinfo/fedora-directory-users">https://www.redhat.com/mailman/listinfo/fedora-directory-users</A>
&gt;   


--
Fedora-directory-users mailing list
<A HREF="mailto:Fedora-directory-users@redhat.com">Fedora-directory-users@redhat.com</A>
</PRE>
    <A HREF="https://www.redhat.com/mailman/listinfo/fedora-directory-users">https://www.redhat.com/mailman/listinfo/fedora-directory-users</A>
</BLOCKQUOTE>
</BODY>
</HTML>