Re: [389-users] Data inconsitency during replication

Monday, 31 October 2011

Hi Rich,

One correction in step-4 "recreation of "cn=replica"  entry for the suffix.
 As per the example given below,  suffix is "o=USA"

-          Recreate the "cn=replica" entry for the suffix as below.

dn: cn=replica,cn=o=USA,cn=mapping tree,cn=config

changetype: add

objectClass: nsds5replica

objectClass: top

nsDS5ReplicaRoot: o=USA

nsDS5ReplicaType: 3

nsDS5Flags: 1

nsDS5ReplicaId: 10  ----> Please assign the same "nsDS5ReplicaId value what master
was having. In my case, Original master replica ID was 10.

nsds5ReplicaPurgeDelay: 1

nsds5ReplicaTombstonePurgeInterval: -1

cn: replica

Regards,
Jyoti

From: Das, Jyoti Ranjan (STSD)
Sent: Monday, October 31, 2011 2:38 PM
To: 'Rich Megginson'; General discussion list for the 389 Directory server
project.
Subject: RE: [389-users] Data inconsitency during replication

Hi Rich,

Thanks a lot for your response. Please find the sample reproducer details below. I am not
sure about how to log a bug. I will explore and do it.

Reproducer:

Step-1:

Have a topology like Master replicating to Slave and Slave replication to consumer.

Master -> Slave-> Consumer.

Step-2:
Make sure that all are on sync at this time. Let's take an example all are the on sync
up to CSN5 (5 records are added to master from CSN1 to CSN5).

Step-3:

Delete the replication agreement from Master to Slave and also from Slave to consumer.

Step-4:

Promote the Slave to master.  Promotion steps are given below.

-          Delete Supplier DN (cn=suppdn,cn=config) from Slave

-          Delete "cn=replica" entry for the suffix "o=USA" using
ldapmodify. As a result, it will delete the changelog file.

Ex: dn: cn=replica,cn=o=USA,cn=mapping tree,cn=config

changetype: delete

-          Modify the cn=o=USA ,cn=mapping tree,cn=config entry as below

EX: dn: cn=o=USA,cn=mapping tree,cn=config

changetype: modify

replace: nsslapd-state

nsslapd-state: backend

dn: cn=o=USA,cn=mapping tree,cn=config

changetype: modify

delete: nsslapd-referral

-          Recreate the "cn=replica" entry for the suffix as below.

dn: cn=replica,cn=o=SWIFT,cn=mapping tree,cn=config

changetype: add

objectClass: nsds5replica

objectClass: top

nsDS5ReplicaRoot: o=SWIFT

nsDS5ReplicaType: 3

nsDS5Flags: 1

nsDS5ReplicaId: 10  ----> Please assign the same "nsDS5ReplicaId value what master
was having. In my case, Original master replica ID was 10.

nsds5ReplicaPurgeDelay: 1

nsds5ReplicaTombstonePurgeInterval: -1

cn: replica

-          Restart  slapd process. Now Slave become Master.

Is there anything am I missing during promotion operation or it's not the right way to
do the promotion operation?

Step -5:

Add the replication agreement between Slave(newly promoted Master) and Consumer . At this
time both Slave and consumer are on sync up to CSN5. During agreement creation please do
not initialize the consumer.

           Slave(newly promoted as master) - > consumer.

Step-6:

Add another 5 more entries to Slave which was promoted above as Master. Let's assume
CSN numbers for these 5 entries are from CSN6 to CSN10.

Step-7:

Now, you will see, among the last 5 entries only last few will gets replicated without
halting the replication.

Regards,
Jyoti

From: Rich Megginson [mailto:rmeggins@redhat.com]
Sent: Friday, October 28, 2011 10:54 PM
To: General discussion list for the 389 Directory server project.
Cc: Das, Jyoti Ranjan (STSD)
Subject: Re: [389-users] Data inconsitency during replication

On 10/20/2011 12:45 AM, Das, Jyoti Ranjan (STSD) wrote:
Hi,

I am new to 389 directory server. Could you please help me in the below mentioned query?
Thank you very much in advance.

Problem statement:

Data loss during the replication between Supplier and consumer when master changelog db
file is being deleted due to some reason , consumer is imported with some stale data and
consumer doesn't want initialization during the new replication agreement. The test
scenario is given below.

Test scenario:
Steps:

Topology

Supplier -----------Replication agreement-----------------> Hub

Both replicas are in sync at this time as mentioned below.

Let's take this sample example: Five entries has been added starting from CSN1 to
CSN5

Take a db2ldif with "-r" option from the Hub replica.

Add another 5 entries in the supplier. Let's take their CSN numbers are starting from
CSN6 to CSN10

Delete the replication agreements
Before or after CSN6 to CSN10 have been replicated to the Hub?

Delete the master changelog db file from the changelogdb directory.
Supplier or Hub?

Add another 5 entries in the supplier. Let's take their CSN numbers are staring  from
CSN11 to CSN15

Import the ldif file  taken in Step-2 in the Hub replica(  it's a initialization of
consumer with the stale data)

Create the replication agreement between master and hub with the "do not
initialize" option.

Now we will see the data loss starting from CSN6 to CSN14. Only entry with CSN15 will be
replicated to the consumer and also will continue further with successful replication

Questions:

Is this a correct approach in this scenario to continue with replication even if there are
data losses instead of halting the replication?

...
From the code analysis: 
File: " ldapserver/ldap/servers/plugins/replication/cl5_api.c"

If the requested CSN number is now found in the changelog db file and also not there in
the purge list, it makes the following assumption and continues with replication

/* there is a special case which can occur just after migration - in this case,

  the consumer RUV will contain the last state of the supplier before migration,

  but the supplier will have an empty changelog, or the supplier changelog will

  not contain any entries within the consumer min and max CSN - also, since

  the purge RUV contains no CSNs, the changelog has never been purged

  ASSUMPTIONS - it is assumed that the supplier had no pending changes to send

  to any consumers; that is, we can assume that no changes were lost due to

  either changelog purging or database reload - bug# 603061 -
richm@netscape.com<mailto:richm@netscape.com> */

                 Is it a correct approach in this scenario to halt the replication with a
fatal error message in the error log file?
Probably, but then this code would have to be a lot smarter to figure out that the problem
is due to stale data being imported into the consumer.  Please file a bug with exact steps
to reproduce this problem.

Regards,
Jyoti

--

389 users mailing list

389-users@lists.fedoraproject.org<mailto:389-users@lists.fedoraproject.org>

https://admin.fedoraproject.org/mailman/listinfo/389-users

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [389-users] Data inconsitency during replication