Hi Ludwig,
<
http://www.polytechnique.edu>
------------------------------------------------------------------------
the fixes for the tickets you mention did change the iteration
thru the changelog and how it handles situtations when the start
csn is not found in the changelog. and it also did change the
logging, so you might see messages now which were not there or
hidden before.
That was my understanding too.
so far I have not seen any replication problems
related to these
messages, all generatedcsns seem to be replicated. What makes it a bit
more difficult is that most of the updates are updates of lastlogintime
and the original MOD is not logged. I still do not understand why we
have these messages so frequently, I will try to reproduce.
Or, if it possible, could you run the servers for just an hour with
replication logging enabled ?
When looking into the provided data set I did notice three replicated
ops with err=50, insufficient access. This should not happen and
requires a separate investigation
But I am very surprised to see them so frequently and I would like
to understand it.
First some questions, do you have changelog trimming enabled and
how, do you have fractional replication ?
yes for both questions.
Trimming: 14 days
Fractional replication:
nsDS5ReplicatedAttributeList: (objectclass=*) $ EXCLUDE entryusn memberOf
nsDS5ReplicatedAttributeListTotal: (objectclass=*) $ EXCLUDE entryusn
nsds5ReplicaStripAttrs: modifiersName modifyTimestamp
internalModifiersName internalModifyTimestamp internalCreatorsname
Changelog:
cn=changelog5,cn=config
objectClass: top
objectClass: extensibleObject
cn: changelog5
nsslapd-changelogdir: /Local/dirsrv/var/lib/dirsrv/slapd-ens/changelogdb
nsslapd-changelogmaxage: 14d
replica:
cn=replica,cn=dc\\3Did\\2Cdc\\3Dpolytechnique\\2Cdc\\3Dedu,cn=mapping
tree,cn=config
objectClass: top
objectClass: nsDS5Replica
cn: replica
nsDS5ReplicaId: 1
nsDS5ReplicaRoot: dc=id,dc=polytechnique,dc=edu
nsDS5Flags: 1
nsDS5ReplicaBindDN: cn=RepliX,cn=config
nsds5ReplicaPurgeDelay: 604800
nsds5ReplicaTombstonePurgeInterval: 86400
nsds5ReplicaLegacyConsumer: False
nsDS5ReplicaType: 3
nsState:: AQAAAAAAAADCrc5XAAAAAAAAAAAAAAAAAQAAAAAAAAABAAAAAAAAAA==
nsDS5ReplicaName: eeb6d304-736c11e6-9bc5a1ff-40280b8e
nsds5ReplicaChangeCount: 114948
nsds5replicareapactive: 0
Typical replication agreement:
cn=Replication from ldap-lab.<domain name> to ldap-adm.<domain
name>,cn=replica,cn=dc\\3Did\\2Cdc\\3Dpolytechnique\\2Cdc\\3Dedu,cn=mapping
tree,cn=config
objectClass: top
objectClass: nsDS5ReplicationAgreement
cn: Replication from ldap-lab.<domain name> to ldap-adm.<domain name>
description: Replication agreement from server ldap-lab.<domain name>
to server ldap-adm.<domain name>
nsDS5ReplicaHost: ldap-adm.<domain name>
nsDS5ReplicaRoot: dc=id,dc=polytechnique,dc=edu
nsDS5ReplicaPort: 636
nsDS5ReplicaTransportInfo: SSL
nsDS5ReplicaBindDN: cn=RepliX,cn=config
nsDS5ReplicaBindMethod: simple
nsDS5ReplicatedAttributeList: (objectclass=*) $ EXCLUDE entryusn memberOf
nsDS5ReplicatedAttributeListTotal: (objectclass=*) $ EXCLUDE entryusn
nsds5ReplicaStripAttrs: modifiersName modifyTimestamp
internalModifiersName internalModifyTimestamp internalCreatorsname
nsds5replicaBusyWaitTime: 5
nsds5ReplicaFlowControlPause: 500
nsds5ReplicaFlowControlWindow: 1000
nsds5replicaTimeout: 120
nsDS5ReplicaCredentials: {AES-...
nsds50ruv: {replicageneration} 57cd7377000000020000
nsds50ruv: {replica 2 ldap://ldap-adm.<domain name>:389}
nsruvReplicaLastModified: {replica 2 ldap://ldap-adm.<domain
name>:389} 00000000
nsds5replicareapactive: 0
nsds5replicaLastUpdateStart: 20160906115520Z
nsds5replicaLastUpdateEnd: 20160906115520Z
nsds5replicaChangesSentSinceStartup: 3:13525/670 1:3671/0 2:1/0
nsds5replicaLastUpdateStatus: 0 Replica acquired successfully:
Incremental update succeeded
nsds5replicaUpdateInProgress: FALSE
nsds5replicaLastInitStart: 19700101000000Z
nsds5replicaLastInitEnd: 19700101000000Z
Next, is it possible to get the access and error logs for a period
of an hour from all servers (you can send them off list) ? I would
like to track some of the reported csns.
Sure, i will send it to you off list in a moment.
Thank you,
Regards,
Andrey
Regards,
Ludwig
On 09/06/2016 12:31 PM, Ivanov Andrey (M.) wrote:
Hi,
We are successfully using the compiled 1.3.4 git branch of
389DS in production on CentOS 7 since about a year
(approximately 40 000 entries, about 4000 groups, hundreds of
reads and tens of writes per second).
Our current topology consists of 3 servers in triangle (each
server is a master replicating to 2 others, so two read-write
replication agreements on each).
Since the fixes for the Ticket 48766 ("Replication changelog
can incorrectly skip over updates") and Ticket 48954
("Replication fails because anchorcsn cannot be found") I’ve
started to see the following regular warnings in error logs:
[06/Sep/2016:01:21:43 +0200] clcache_load_buffer_bulk -
changelog record with csn (57cdfe06000100010000) not found for
DB_NEXT
[06/Sep/2016:01:21:43 +0200] agmt="cn=Replication from
ldap-adm.<domain> to ldap-lab.<domain>" (ldap-lab:636) -
Can't
locate CSN 57cdfe06000100010000 in the changelog (DB
rc=-30988). If replication stops, the consumer may need to be
reinitialized.
[06/Sep/2016:02:35:25 +0200] - replica_generate_next_csn:
opcsn=57ce0f4e000500020000 <= basecsn=57ce0f4e000500030000,
adjusted opcsn=57ce0f4e000600020000
[06/Sep/2016:04:10:11 +0200] clcache_load_buffer_bulk -
changelog record with csn (57ce257e000400030000) not found for
DB_NEXT
[06/Sep/2016:05:16:58 +0200] - replica_generate_next_csn:
opcsn=57ce352b000000020000 <= basecsn=57ce352b000100010000,
adjusted opcsn=57ce352b000100020000
[06/Sep/2016:06:56:04 +0200] agmt="cn=Replication from
ldap-adm.<domain> to ldap-ens.<domain>" (ldap-ens:636) -
Can't
locate CSN 57ce4c62000100030000 in the changelog (DB
rc=-30988). If replication stops, the consumer may need to be
reinitialized.
[06/Sep/2016:07:29:00 +0200] agmt="cn=Replication from
ldap-adm.<domain> to ldap-ens.<domain>" (ldap-ens:636) -
Can't
locate CSN 57ce541a000200030000 in the changelog (DB
rc=-30988). If replication stops, the consumer may need to be
reinitialized.
[06/Sep/2016:07:34:20 +0200] agmt="cn=Replication from
ldap-adm.<domain> to ldap-lab.<domain>" (ldap-lab:636) -
Can't
locate CSN 57ce5559000100010000 in the changelog (DB
rc=-30988). If replication stops, the consumer may need to be
reinitialized.
[06/Sep/2016:07:34:27 +0200] agmt="cn=Replication from
ldap-adm.<domain> to ldap-lab.<domain>" (ldap-lab:636) -
Can't
locate CSN 57ce5561000000010000 in the changelog (DB
rc=-30988). If replication stops, the consumer may need to be
reinitialized.
[06/Sep/2016:07:40:17 +0200] clcache_load_buffer_bulk -
changelog record with csn (57ce56c0000500030000) not found for
DB_NEXT
[06/Sep/2016:07:40:24 +0200] clcache_load_buffer_bulk -
changelog record with csn (57ce56c5000100030000) not found for
DB_NEXT
[06/Sep/2016:08:08:36 +0200] clcache_load_buffer_bulk -
changelog record with csn (57ce5d5f000f00010000) not found for
DB_NEXT
[06/Sep/2016:08:12:39 +0200] clcache_load_buffer_bulk -
changelog record with csn (57ce5e54000200030000) not found for
DB_NEXT
[06/Sep/2016:08:12:39 +0200] agmt="cn=Replication from
ldap-adm.<domain> to ldap-ens.<domain>" (ldap-ens:636) -
Can't
locate CSN 57ce5e54000200030000 in the changelog (DB
rc=-30988). If replication stops, the consumer may need to be
reinitialized.
[06/Sep/2016:08:26:45 +0200] clcache_load_buffer_bulk -
changelog record with csn (57ce61a3000200030000) not found for
DB_NEXT
[06/Sep/2016:08:27:40 +0200] clcache_load_buffer_bulk -
changelog record with csn (57ce61d8000200030000) not found for
DB_NEXT
[06/Sep/2016:08:27:40 +0200] agmt="cn=Replication from
ldap-adm.<domain> to ldap-ens.<domain>" (ldap-ens:636) -
Can't
locate CSN 57ce61d8000200030000 in the changelog (DB
rc=-30988). If replication stops, the consumer may need to be
reinitialized.
[06/Sep/2016:08:31:42 +0200] clcache_load_buffer_bulk -
changelog record with csn (57ce62c8000300010000) not found for
DB_NEXT
[06/Sep/2016:08:34:05 +0200] clcache_load_buffer_bulk -
changelog record with csn (57ce635a000100010000) not found for
DB_NEXT
[06/Sep/2016:08:44:28 +0200] clcache_load_buffer_bulk -
changelog record with csn (57ce65c9000200030000) not found for
DB_NEXT
[06/Sep/2016:08:52:25 +0200] agmt="cn=Replication from
ldap-adm.<domain> to ldap-ens.<domain>" (ldap-ens:636) -
Can't
locate CSN 57ce67aa000100030000 in the changelog (DB
rc=-30988). If replication stops, the consumer may need to be
reinitialized.
[06/Sep/2016:08:53:04 +0200] - replica_generate_next_csn:
opcsn=57ce67d1000100020000 <= basecsn=57ce67d1000200030000,
adjusted opcsn=57ce67d1000200020000
These warnings are present on all three servers and for all
replication agreements. One of them is virtual and two others
are physical.
The replication still seems to work fine in spite of these
warnings. The "replica_generate_next_csn" is not new - it
existed since always with 1.3.4, the two new warnings are
"clcache_load_buffer_bulk " and "Can't locate CSN ... in the
changelog (DB rc=-30988)." There are no network problems or
anything like that. So it could only be replication topology
(3-master fully-connected triangle) and/or servers being
rather busy. Is it a bug, a warning that can be ignored or
anything else?
Thank you!
--
389-users mailing list
389-users@lists.fedoraproject.orghttps://lists.fedoraproject.org/admin/lists/389-users@lists.fedoraproject.org
--
Red Hat
GmbH,http://www.de.redhat.com/, Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Michael Cunningham, Michael O'Neill, Eric
Shander
--
389-users mailing list
389-users(a)lists.fedoraproject.org
https://lists.fedoraproject.org/admin/lists/389-users@lists.fedoraproject...
--
389-users mailing list
389-users(a)lists.fedoraproject.org
https://lists.fedoraproject.org/admin/lists/389-users@lists.fedoraproject... , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Michael Cunningham, Michael O'Neill, Eric
Shander