Re: [389-users] replication failure - clock skew

Friday, 10 February 2012

On 02/10/2012 09:07 AM, Greg Kuchyt wrote:
...
 On 02/09/2012 03:03 PM, Rich Megginson wrote:
> On 02/09/2012 12:52 PM, Greg Kuchyt wrote:
>> On 02/09/2012 02:38 PM, Rich Megginson wrote:
>>> On 02/09/2012 12:23 PM, Greg Kuchyt wrote:
>>>> Yesterday afternoon, one of my consumers randomly crashed/rebooted.
>>>> Upon rebooting, its replication agreement with its master failed with
>>>> the following error:
>>>>
>>>> Unable to acquire replica: Excessive clock skew between the supplier
>>>> and the consumer. Replication is aborting
>>>>
>>>> I did a little bit of Google searching and found some list traffic
>>>> from a few years ago. From that I derived that this replica was hosed
>>>> and I would need to re-initialize it. No problem. A re-initialization
>>>> didn't do anything, same error. Starting from scratch from a
>>>> completely new/fresh replica produces the same result. That's when I
>>>> noticed the following errors in the logs on the master.
>>>>
>>>> csngen_new_csn - Warning: too much time skew (-115319 secs). Current
>>>> seqnum=1
>>>>
>>>> I downloaded the readNsState.py script attached to the following
>>>> ticket (https://bugzilla.redhat.com/show_bug.cgi?id=233642). Running
>>>> this on the master produced the following output
>>>>
>>>> For replica cn=replica,cn=o\3Dpotsdam.edu,cn=mapping tree,cn=config
>>>> len of nsstate is 40
>>>> CSN generator state:
>>>> Replica ID : 6560
>>>> Sampled Time : 1328928777
>>>> Time in hex : 0x4f35d809
>>>> Time as str : Fri Feb 10 21:52:57 2012
>>>> Local Offset : 0
>>>> Remote Offset : 261
>>>> Seq. num : 1
>>>> System time : Thu Feb 9 14:00:01 2012
>>>> Diff in sec. : -114776
>>>>
>>>> This leads me to believe that the clock skew problem is on the 
>>>> master.
>>>>
>>>> I am not really sure how the clock skew happened. All of these 
>>>> systems
>>>> synchronize their clocks via a centralized time server and all the
>>>> times on their clocks are correct. There are 3 or 4 other replicas
>>>> that are still receiving incremental updates fine, but any attempt to
>>>> add a new replica results in a failed replication agreement due to
>>>> excessive clock skew.
>>>>
>>>> I am writing to get a better understanding of the situation and 
>>>> see if
>>>> there is anything to be done to resolve this. At the moment it seems
>>>> as if I am caught in an unfortunate situation that will require
>>>> re-initialization of my master from a back-up.
>>>>
>>>> Thanks for any help that can be provided.
>>> What is your 389-ds-base version and platform?
>>>> -- 
>>>> 389 users mailing list
>>>> 389-users(a)lists.fedoraproject.org
>>>> https://admin.fedoraproject.org/mailman/listinfo/389-users
>>>
>>
>> Rich,
>>
>> The master and a few replicas are running on Scientific Linux 6.1
>> x86_64. Here, we're using the stock packages along with the modified
>> 389-ds-base packages in your fedorapoeople.org repo. So that puts it
>> at 1.2.9.9-1 for 389-ds-base I believe.
>>
>> Two replicas (including the one that rebooted/failed) are on Fedora 12
>> x86_64 and their 389-ds-base is 1.2.5-1.
> There was a known problem with clock skew calculation and handling in
> 1.2.5 - please try upgrading everything to 1.2.9.9. I realize fedora 12
> is no longer supported.
>>
>> Let me know what other info you need. Thanks.
>> -- 
>> 389 users mailing list
>> 389-users(a)lists.fedoraproject.org
>> https://admin.fedoraproject.org/mailman/listinfo/389-users
>

 Rich,
 The F12 systems were in production and were slated for replacement by 
 SL 6.1 systems. I just took the F12 systems out of the mix rather than 
 upgrade them, so everything is now SL 6.1 and 389-ds-base 1.2.9.9.

 When attempting to add a new replica I still see the following in the 
 error logs on the master.

 "Unable to acquire replica: Excessive clock skew between the supplier 
 and the consumer. Replication is aborting."

 As well, I am seeing a lot of these messages in the logs on the master.

 "csngen_new_csn - Warning: too much time skew (-123525 secs). Current 
 seqnum=1" Are your clocks on the servers all in sync?
...
 -- 
 389 users mailing list
 389-users(a)lists.fedoraproject.org
 https://admin.fedoraproject.org/mailman/listinfo/389-users 

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [389-users] replication failure - clock skew