[389-users] replication failure - clock skew

Thu Feb 9 19:52:26 UTC 2012

On 02/09/2012 02:38 PM, Rich Megginson wrote:
> On 02/09/2012 12:23 PM, Greg Kuchyt wrote:
>> Yesterday afternoon, one of my consumers randomly crashed/rebooted.
>> Upon rebooting, its replication agreement with its master failed with
>> the following error:
>>
>> Unable to acquire replica: Excessive clock skew between the supplier
>> and the consumer. Replication is aborting
>>
>> I did a little bit of Google searching and found some list traffic
>> from a few years ago. From that I derived that this replica was hosed
>> and I would need to re-initialize it. No problem. A re-initialization
>> didn't do anything, same error. Starting from scratch from a
>> completely new/fresh replica produces the same result. That's when I
>> noticed the following errors in the logs on the master.
>>
>> csngen_new_csn - Warning: too much time skew (-115319 secs). Current
>> seqnum=1
>>
>> I downloaded the readNsState.py script attached to the following
>> ticket (https://bugzilla.redhat.com/show_bug.cgi?id=233642). Running
>> this on the master produced the following output
>>
>> For replica cn=replica,cn=o\3Dpotsdam.edu,cn=mapping tree,cn=config
>> len of nsstate is 40
>> CSN generator state:
>> Replica ID : 6560
>> Sampled Time : 1328928777
>> Time in hex : 0x4f35d809
>> Time as str : Fri Feb 10 21:52:57 2012
>> Local Offset : 0
>> Remote Offset : 261
>> Seq. num : 1
>> System time : Thu Feb 9 14:00:01 2012
>> Diff in sec. : -114776
>>
>> This leads me to believe that the clock skew problem is on the master.
>>
>> I am not really sure how the clock skew happened. All of these systems
>> synchronize their clocks via a centralized time server and all the
>> times on their clocks are correct. There are 3 or 4 other replicas
>> that are still receiving incremental updates fine, but any attempt to
>> add a new replica results in a failed replication agreement due to
>> excessive clock skew.
>>
>> I am writing to get a better understanding of the situation and see if
>> there is anything to be done to resolve this. At the moment it seems
>> as if I am caught in an unfortunate situation that will require
>> re-initialization of my master from a back-up.
>>
>> Thanks for any help that can be provided.
> What is your 389-ds-base version and platform?
>> --
>> 389 users mailing list
>> 389-users at lists.fedoraproject.org
>> https://admin.fedoraproject.org/mailman/listinfo/389-users
>

Rich,

The master and a few replicas are running on Scientific Linux 6.1 
x86_64. Here, we're using the stock packages along with the modified 
389-ds-base packages in your fedorapoeople.org repo. So that puts it at 
1.2.9.9-1 for 389-ds-base I believe.

Two replicas (including the one that rebooted/failed) are on Fedora 12 
x86_64 and their 389-ds-base is 1.2.5-1.

Let me know what other info you need. Thanks.