Hi Rich,
One correction in step-4 "recreation of "cn=replica" entry for the suffix.
As per the example given below, suffix is "o=USA"
- Recreate the "cn=replica" entry for the suffix as below.
dn: cn=replica,cn=o=USA,cn=mapping tree,cn=config
changetype: add
objectClass: nsds5replica
objectClass: top
nsDS5ReplicaRoot: o=USA
nsDS5ReplicaType: 3
nsDS5Flags: 1
nsDS5ReplicaId: 10 ----> Please assign the same "nsDS5ReplicaId value what master
was having. In my case, Original master replica ID was 10.
nsds5ReplicaPurgeDelay: 1
nsds5ReplicaTombstonePurgeInterval: -1
cn: replica
Regards,
Jyoti
From: Das, Jyoti Ranjan (STSD)
Sent: Monday, October 31, 2011 2:38 PM
To: 'Rich Megginson'; General discussion list for the 389 Directory server
project.
Subject: RE: [389-users] Data inconsitency during replication
Hi Rich,
Thanks a lot for your response. Please find the sample reproducer details below. I am not
sure about how to log a bug. I will explore and do it.
Reproducer:
Step-1:
Have a topology like Master replicating to Slave and Slave replication to consumer.
Master -> Slave-> Consumer.
Step-2:
Make sure that all are on sync at this time. Let's take an example all are the on sync
up to CSN5 (5 records are added to master from CSN1 to CSN5).
Step-3:
Delete the replication agreement from Master to Slave and also from Slave to consumer.
Step-4:
Promote the Slave to master. Promotion steps are given below.
- Delete Supplier DN (cn=suppdn,cn=config) from Slave
- Delete "cn=replica" entry for the suffix "o=USA" using
ldapmodify. As a result, it will delete the changelog file.
Ex: dn: cn=replica,cn=o=USA,cn=mapping tree,cn=config
changetype: delete
- Modify the cn=o=USA ,cn=mapping tree,cn=config entry as below
EX: dn: cn=o=USA,cn=mapping tree,cn=config
changetype: modify
replace: nsslapd-state
nsslapd-state: backend
dn: cn=o=USA,cn=mapping tree,cn=config
changetype: modify
delete: nsslapd-referral
- Recreate the "cn=replica" entry for the suffix as below.
dn: cn=replica,cn=o=SWIFT,cn=mapping tree,cn=config
changetype: add
objectClass: nsds5replica
objectClass: top
nsDS5ReplicaRoot: o=SWIFT
nsDS5ReplicaType: 3
nsDS5Flags: 1
nsDS5ReplicaId: 10 ----> Please assign the same "nsDS5ReplicaId value what master
was having. In my case, Original master replica ID was 10.
nsds5ReplicaPurgeDelay: 1
nsds5ReplicaTombstonePurgeInterval: -1
cn: replica
- Restart slapd process. Now Slave become Master.
Is there anything am I missing during promotion operation or it's not the right way to
do the promotion operation?
Step -5:
Add the replication agreement between Slave(newly promoted Master) and Consumer . At this
time both Slave and consumer are on sync up to CSN5. During agreement creation please do
not initialize the consumer.
Slave(newly promoted as master) - > consumer.
Step-6:
Add another 5 more entries to Slave which was promoted above as Master. Let's assume
CSN numbers for these 5 entries are from CSN6 to CSN10.
Step-7:
Now, you will see, among the last 5 entries only last few will gets replicated without
halting the replication.
Regards,
Jyoti
From: Rich Megginson [mailto:rmeggins@redhat.com]
Sent: Friday, October 28, 2011 10:54 PM
To: General discussion list for the 389 Directory server project.
Cc: Das, Jyoti Ranjan (STSD)
Subject: Re: [389-users] Data inconsitency during replication
On 10/20/2011 12:45 AM, Das, Jyoti Ranjan (STSD) wrote:
Hi,
I am new to 389 directory server. Could you please help me in the below mentioned query?
Thank you very much in advance.
Problem statement:
Data loss during the replication between Supplier and consumer when master changelog db
file is being deleted due to some reason , consumer is imported with some stale data and
consumer doesn't want initialization during the new replication agreement. The test
scenario is given below.
Test scenario:
Steps:
Topology
Supplier -----------Replication agreement-----------------> Hub
Both replicas are in sync at this time as mentioned below.
Let's take this sample example: Five entries has been added starting from CSN1 to
CSN5
Take a db2ldif with "-r" option from the Hub replica.
Add another 5 entries in the supplier. Let's take their CSN numbers are starting from
CSN6 to CSN10
Delete the replication agreements
Before or after CSN6 to CSN10 have been replicated to the Hub?
Delete the master changelog db file from the changelogdb directory.
Supplier or Hub?
Add another 5 entries in the supplier. Let's take their CSN numbers are staring from
CSN11 to CSN15
Import the ldif file taken in Step-2 in the Hub replica( it's a initialization of
consumer with the stale data)
Create the replication agreement between master and hub with the "do not
initialize" option.
Now we will see the data loss starting from CSN6 to CSN14. Only entry with CSN15 will be
replicated to the consumer and also will continue further with successful replication
Questions:
Is this a correct approach in this scenario to continue with replication even if there are
data losses instead of halting the replication?
From the code analysis:
File: " ldapserver/ldap/servers/plugins/replication/cl5_api.c"
If the requested CSN number is now found in the changelog db file and also not there in
the purge list, it makes the following assumption and continues with replication
/* there is a special case which can occur just after migration - in this case,
the consumer RUV will contain the last state of the supplier before migration,
but the supplier will have an empty changelog, or the supplier changelog will
not contain any entries within the consumer min and max CSN - also, since
the purge RUV contains no CSNs, the changelog has never been purged
ASSUMPTIONS - it is assumed that the supplier had no pending changes to send
to any consumers; that is, we can assume that no changes were lost due to
either changelog purging or database reload - bug# 603061 -
richm@netscape.com<mailto:richm@netscape.com> */
Is it a correct approach in this scenario to halt the replication with a
fatal error message in the error log file?
Probably, but then this code would have to be a lot smarter to figure out that the problem
is due to stale data being imported into the consumer. Please file a bug with exact steps
to reproduce this problem.
Regards,
Jyoti
--
389 users mailing list
389-users@lists.fedoraproject.org<mailto:389-users@lists.fedoraproject.org>
https://admin.fedoraproject.org/mailman/listinfo/389-users