Changelog, its location, ways to view, max life - 389-users - Fedora Mailing-Lists

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Changelog, its location, ways to view, max life

Announcing 389 Directory Server...

CPU usage jump after a few minutes...

Sergei Gerasenko

Friday, 3 November 2017 Fri, 3 Nov '17

10:48 a.m.

Hello, Some basic questions about the changelog: 1. What’s the location of the changelog where I can look up a CSN? 2. How do I see the setting for the max life of a CSN? 3. How do I view a particular CSN (i.e. its contents)? Thanks, Sergei

Attachments:

attachment.html (text/html — 748 bytes)

Reply

Show replies by date

Mark Reynolds

Friday, 3 November Fri, 3 Nov

11:16 a.m.

On 11/03/2017 11:48 AM, Sergei Gerasenko wrote:

Hello, Some basic questions about the changelog: 1. What’s the location of the changelog where I can look up a CSN?

typically its something like: /var/lib/dirsv/slapd-YOUR_INSTANCE/changelogdb To look at the replication changelog you need to use the cli tool "cl-dump.pl" https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=we...

2. How do I see the setting for the max life of a CSN?

There is no "max life" of a csn. There is replication purging and changelog trimming that uses csns in RUV's to determine what can be removed. The admin guide talks about these in more detail.

3. How do I view a particular CSN (i.e. its contents)?

csn: 59f9e547000200010000 Breaks down like this: 59f9e547 0002 0001 0000 The first 8 bits is the timestamp in hex: 59f9e547 --> 1509549383 seconds since EPOCH the next 4 is the sequence number (0002) the next 4 is the replica ID (0001) and the last 4 is the subsequence number (0000)

Thanks, Sergei _______________________________________________ 389-users mailing list -- 389-users(a)lists.fedoraproject.org To unsubscribe send an email to 389-users-leave(a)lists.fedoraproject.org

Reply

Sergei Gerasenko

11:28 a.m.

To look at the replication changelog you need to use the cli tool "cl-dump.pl" https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=we... <https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=we...

Ok, thank you

> 2. How do I see the setting for the max life of a CSN? There is no "max life" of a csn.

Ok, what brought this up is that about every week, one of the machines in our environment breaks the replication with messages like this: [01/Nov/2017:17:12:52.815891904 +0000] agmt="cn=meToXXXX" - Can't locate CSN 59f9d98a000000760000 in the changelog (DB rc=-30988). If replication stops, the consumer may need to be reinitialized. [01/Nov/2017:17:12:52.820619690 +0000] NSMMReplicationPlugin - changelog program - agmt="cn=meXXXX": CSN 59f9d98a000000760000 not found, we aren't as up to date, or we purged [01/Nov/2017:17:12:52.828626595 +0000] NSMMReplicationPlugin - agmt="cn=meToXXXX": Data required to update replica has been purged from the changelog. The replica must be reinitialized. So it made me think that perhaps the CSN record is removed too early? The ’76’ in the CSN is the machine having the problem. What do you think could cause problems of this kind?

There is replication purging and changelog trimming that uses csns in RUV's to determine what can be removed. The admin guide talks about these in more detail. > 3. How do I view a particular CSN (i.e. its contents)? csn: 59f9e547000200010000 Breaks down like this: 59f9e547 0002 0001 0000

Yep, found that info previously, but thank you still!

Reply

Mark Reynolds

11:37 a.m.

On 11/03/2017 12:28 PM, Sergei Gerasenko wrote:

> To look at the replication changelog you need to use the cli tool > "cl-dump.pl" > > https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=we... > <https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=we... Ok, thank you > >> 2. How do I see the setting for the max life of a CSN? > There is no "max life" of a csn. Ok, what brought this up is that about every week

Ahh yes, this is the default replication purge interval (7 days) https://access.redhat.com/documentation/en-US/Red_Hat_Directory_Server/8.... Look for nsDS5ReplicaPurgeDelay It could also be changelog trimming: http://www.port389.org/docs/389ds/FAQ/changelog-trimming.html So what this is telling me is that one of your replication agreements was over a week behind from the other replicas (not good). Was that agreement disabled for a while, and then enabled, for some reason?

, one of the machines in our environment breaks the replication with messages like this: [01/Nov/2017:17:12:52.815891904 +0000] agmt="cn=meToXXXX" - Can't locate CSN 59f9d98a000000760000 in the changelog (DB rc=-30988). If replication stops, the consumer may need to be reinitialized. [01/Nov/2017:17:12:52.820619690 +0000] NSMMReplicationPlugin - changelog program - agmt="cn=meXXXX": CSN 59f9d98a000000760000 not found, we aren't as up to date, or we purged [01/Nov/2017:17:12:52.828626595 +0000] NSMMReplicationPlugin - agmt="cn=meToXXXX": Data required to update replica has been purged from the changelog. The replica must be reinitialized. So it made me think that perhaps the CSN record is removed too early? The ’76’ in the CSN is the machine having the problem. What do you think could cause problems of this kind? > There is replication purging and changelog trimming that uses csns in > RUV's to determine what can be removed. The admin guide talks about > these in more detail. >> 3. How do I view a particular CSN (i.e. its contents)? > csn: > > 59f9e547000200010000 > > Breaks down like this: > > 59f9e547 0002 0001 0000 Yep, found that info previously, but thank you still!

Reply

Sergei Gerasenko

11:50 a.m.

> Ok, what brought this up is that about every week Ahh yes, this is the default replication purge interval (7 days) https://access.redhat.com/documentation/en-US/Red_Hat_Directory_Server/8.... <https://access.redhat.com/documentation/en-US/Red_Hat_Directory_Server/8.... Look for nsDS5ReplicaPurgeDelay It could also be changelog trimming: http://www.port389.org/docs/389ds/FAQ/changelog-trimming.html <http://www.port389.org/docs/389ds/FAQ/changelog-trimming.html> So what this is telling me is that one of your replication agreements was over a week behind from the other replicas (not good). Was that agreement disabled for a while, and then enabled, for some reason?

Not that I’m aware of. I’m using the repl-monitor script to monitor our replication and everything is inline (no CSN mismatch) until all of a sudden that happens. Since I’m not an expert on ldap, do you mind posting the ldapsearch command to look up the value of nsDS5ReplicaPurgeDelay. I’m getting an empty value back. The subdirs of /var/lib/dirsrv/INSTANCE are: bak cldb db ldif Is cldb the changelog db?

Reply

Mark Reynolds

11:59 a.m.

On 11/03/2017 12:50 PM, Sergei Gerasenko wrote:

>> Ok, what brought this up is that about every week > Ahh yes, this is the default replication purge interval (7 days) > > https://access.redhat.com/documentation/en-US/Red_Hat_Directory_Server/8.... > > Look for nsDS5ReplicaPurgeDelay > > It could also be changelog trimming: > > http://www.port389.org/docs/389ds/FAQ/changelog-trimming.html > > > So what this is telling me is that one of your replication agreements > was over a week behind from the other replicas (not good). Was that > agreement disabled for a while, and then enabled, for some reason? Not that I’m aware of. I’m using the repl-monitor script to monitor our replication and everything is inline (no CSN mismatch) until all of a sudden that happens. Since I’m not an expert on ldap, do you mind posting the ldapsearch command to look up the value of nsDS5ReplicaPurgeDelay. I’m getting an empty value back. The subdirs of /var/lib/dirsrv/INSTANCE are:

ldapsearch -D "cn=directory manger" -W -b cn=config objectClass=nsDS5Replica

bak cldb db ldif Is cldb the changelog db?

Probably, you can name it whatever you want, the default is "changelogdbdir".

Reply

Sergei Gerasenko

12:23 p.m.

ldapsearch -D "cn=directory manger" -W -b cn=config objectClass=nsDS5Replica

nsDS5ReplicaPurgeDelay is not set listed in the output :(. It must be at the default value of one week? Also, you mentioned that the agreement might have been disabled. What field of the nsds5replicationagreement class shows that? Given the error in the log, and the low likelihood of the agreement being disabled for a week, what else can cause a node not to find a CSN? Thanks!!

Reply

Mark Reynolds

12:53 p.m.

On 11/03/2017 01:23 PM, Sergei Gerasenko wrote:

> ldapsearch -D "cn=directory manger" -W -b cn=config > objectClass=nsDS5Replica nsDS5ReplicaPurgeDelay is not set listed in the output :(. It must be at the default value of one week? Also, you mentioned that the agreement might have been disabled. What field of the nsds5replicationagreement class shows that?

nsds5ReplicaEnabled

Given the error in the log, and the low likelihood of the agreement being disabled for a week, what else can cause a node not to find a CSN?

You have to manually disable (re-enable) an agreement, it does not just happen. Have you restored from a backup recently? That could contain an old database ruv, and when replication kicks in it can't find the updates it needs from the other replicas. You need to look through all the logs to further troubleshoot this. For now I would get everyone in sync then monitor replication, and archive your logs for the next week. That way you have a full data set to investigate if something goes wrong. What version of 389 are you on? rpm -qa | grep 389-ds-base

Thanks!! _______________________________________________ 389-users mailing list -- 389-users(a)lists.fedoraproject.org To unsubscribe send an email to 389-users-leave(a)lists.fedoraproject.org

Reply

Sergei Gerasenko

1:53 p.m.

> Also, you mentioned that the agreement might have been disabled. What field of the nsds5replicationagreement class shows that? nsds5ReplicaEnabled

Thank you

> Given the error in the log, and the low likelihood of the agreement being disabled for a week, what else can cause a node not to find a CSN? Have you restored from a backup recently?

No

You need to look through all the logs to further troubleshoot this. For now I would get everyone in sync then monitor replication, and archive your logs for the next week. That way you have a full data set to investigate if something goes wrong.

Ok, I’ll try to plow through the logs. I might still have them.

What version of 389 are you on? rpm -qa | grep 389-ds-base

389-ds-base-libs-1.3.5.10-21.el7_3.x86_64 389-ds-base-1.3.5.10-21.el7_3.x86_64 What does this tell you: [25/Oct/2017:18:16:43.389794105 +0000] connection - conn=167482 fd=121 Incoming BER Element was 3 bytes, max allowable is 2097152 bytes. Change the nsslapd-maxbersize attribute in cn=config to increase. This is confusing, it was 3 bytes which is < 2097152 and still the log message.

Reply

Mark Reynolds

2:01 p.m.

On 11/03/2017 02:53 PM, Sergei Gerasenko wrote:

>> Also, you mentioned that the agreement might have been disabled. What field of the nsds5replicationagreement class shows that? > nsds5ReplicaEnabled Thank you >> Given the error in the log, and the low likelihood of the agreement being disabled for a week, what else can cause a node not to find a CSN? > Have you restored from a backup recently? No > You need to look through all the logs to further troubleshoot this. For now I would get everyone in sync then monitor replication, and archive your logs for the next week. That way you have a full data set to investigate if something goes wrong. Ok, I’ll try to plow through the logs. I might still have them. > What version of 389 are you on? rpm -qa | grep 389-ds-base 389-ds-base-libs-1.3.5.10-21.el7_3.x86_64 389-ds-base-1.3.5.10-21.el7_3.x86_64

Actually you might be running into a known bug which is fixed in 1.3.6 and up. Sorry 1.3.5/el7_3 is no longer supported or maintained.

What does this tell you: [25/Oct/2017:18:16:43.389794105 +0000] connection - conn=167482 fd=121 Incoming BER Element was 3 bytes, max allowable is 2097152 bytes. Change the nsslapd-maxbersize attribute in cn=config to increase. This is confusing, it was 3 bytes which is < 2097152 and still the log message.

This happens when you try to open a ssl connection on the non-secure port. We have a bug open on this to make that error message means something useful (the message should be fixed in 1.3.7)

Reply

Sergei Gerasenko

2:11 p.m.

> 389-ds-base-libs-1.3.5.10-21.el7_3.x86_64 > 389-ds-base-1.3.5.10-21.el7_3.x86_64 Actually you might be running into a known bug which is fixed in 1.3.6 and up. Sorry 1.3.5/el7_3 is no longer supported or maintained.

Interesting! Can you link me to the bug?

> > What does this tell you: > > [25/Oct/2017:18:16:43.389794105 +0000] connection - conn=167482 fd=121 Incoming BER Element was 3 bytes, max allowable is 2097152 bytes. Change the nsslapd-maxbersize attribute in cn=config to increase. > > This is confusing, it was 3 bytes which is < 2097152 and still the log message. This happens when you try to open a ssl connection on the non-secure port. We have a bug open on this to make that error message means something useful (the message should be fixed in 1.3.7)

OK, so this is benign more or less?

Reply

2364

days inactive

2364

days old

389-users@lists.fedoraproject.org

Manage subscription

10 comments

2 participants

tags (0)

participants (2)

Mark Reynolds
Sergei Gerasenko