Hi
I am having problems with some replicas. Using 389 DS 1.2.5, CentOS
5.5. A few days ago, a server crashed, and when restarted, it had the
time of the crash (more than 1 day). Just after the server started up,
the time was sync with the NTP, but when dirsrv started, the time was
wrong. Since that, the replication agreements of the multimaster
database it hosts, is giving problems: "-1 Incremental update has
failed and requires administrator actionSystem error". So, I am trying
to initialize the rest of the servers from the "main" (tha server
where the most os the modifications are done, we have 6 servers in
multimaster mode for the database, and other databases in hub mode).
When I try to initialize the server, i get this error on the supplier:
"Replication error acquiring replica: excessive clock skew. Error
Code: 2", although all the servers have the same time. In the consumer
log, I get this:
[16/Aug/2010:10:04:58 +0200] - csngen_adjust_time: adjustment limit
exceeded; value - 1390893, limit - 86400
[16/Aug/2010:10:04:58 +0200] - CSN generator's state:
[16/Aug/2010:10:04:58 +0200] - replica id: 5
[16/Aug/2010:10:04:58 +0200] - sampled time: 1281945898
[16/Aug/2010:10:04:58 +0200] - local offset: 0
[16/Aug/2010:10:04:58 +0200] - remote offset: 0
[16/Aug/2010:10:04:58 +0200] - sequence number: 111
I am stuck now. Tried to export database from supplier, import it in
the consumer, and try to reinitialize without success. Also tried to
disable the replica on both supplier and consumer, reenable it, and
recreate the replication agreements without success. I have seen this
bug
https://bugzilla.redhat.com/show_bug.cgi?id=233642, but we have
version 1.2.5, so his bug is supposed to be fixed. This is the result
of the readNsState.py on the supplier (only for the database giving
problems):
nsState is BAAAADT2aEwAAAAAAQAAAAQAAAA=
Little Endian
For replica cn=replica, cn="dc=XXXXX,dc=XXXX", cn=mapping tree, cn=config
fmtstr=[H2x3IH2x]
size=20
len of nsstate is 20
CSN generator state:
Replica ID : 4
Sampled Time : 1281947188
Gen as csn : 4c68f634000400040000
Time as str : Mon Aug 16 10:26:28 2010
Local Offset : 0
Remote Offset : 1
Seq. num : 4
System time : Mon Aug 16 10:26:42 2010
Diff in sec. : 14
Day:sec diff : 0:14
And this in the consumer:
nsState is BQAAAPv1aEwAAAAAAAAAAAIAAAA=
Little Endian
For replica cn=replica, cn="dc=XXX,dc=XXXXX", cn=mapping tree, cn=config
fmtstr=[H2x3IH2x]
size=20
len of nsstate is 20
CSN generator state:
Replica ID : 5
Sampled Time : 1281947131
Gen as csn : 4c68f5fb000200050000
Time as str : Mon Aug 16 10:25:31 2010
Local Offset : 0
Remote Offset : 0
Seq. num : 2
System time : Mon Aug 16 10:26:24 2010
Diff in sec. : 53
Day:sec diff : 0:53
I think the low remote offset (accoriding to the bug this number
should increase with the changes) is due to the initialization of the
database from the exports. Any help? All replication agreements are a
disaster now :S.
The bug that caused this to happen was fixed, but unfortunately
cannot
fix the bad nsState that already exists. The problem is that the CSN
generator attribute (nsState) in the cn=replica entry for the suffx is
not cleaned up properly when you re-init replication. In general, you
can't do this, because you could generate CSNs that you have generated
before.
I think the solution here is to first unconfigure replication, then
shutdown the servers, then dump the database(s) to LDIF, then remove the
nsState attribute. You will have to do this on every server. Then,
start up, reconfigure replication, reload the data, and re-init all of
the other replicas. Make sure all of your servers are in time sync
before you begin.
I know this is a pain but I don't know any other way to get rid of the
bad nsState.
Regards and thanks in advance.
------------------------------------------------------------------------
--
389 users mailing list
389-users(a)lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/389-users