Hi Everyone
I am running 389 DS 1.2.8.2 in CentOS 4.8. I have a multi master setup, with 12 LDAP servers. Everything was working fine, till one of the boxes (ldapw02) suddenly crashed. When it came back up, I see the following in the error log,
[25/Nov/2013:20:26:00 -0500] - 389-Directory/1.2.8.2 B2013.028.104 starting up [25/Nov/2013:20:26:01 -0500] - Detected Disorderly Shutdown last time Directory Server was running, recovering database. [25/Nov/2013:20:26:03 -0500] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: data for replica o=EmpData does not match the data in the changelog (replica data (5293f8a1000000040000) > changelog (5293f89b000000080000)). Recreating the changelog file. This could affect replication with replica's consumers in which case the consumers should be reinitialized. [25/Nov/2013:20:26:03 -0500] - slapd started. Listening on All Interfaces port 389 for LDAP requests [25/Nov/2013:20:30:32 -0500] NSMMReplicationPlugin - changelog program - agmt="cn=ldapw022toroon63dsaw03" (toroon63dsaw03:389): CSN 5293f761000000020000 not found, we aren't as up to date, or we purged [25/Nov/2013:20:30:32 -0500] NSMMReplicationPlugin - agmt="cn=ldapw022toroon63dsaw03" (toroon63dsaw03:389): Data required to update replica has been purged. The replica must be reinitialized. [25/Nov/2013:20:30:32 -0500] NSMMReplicationPlugin - changelog program - agmt="cn=ldapw022toroon63ldapw03" (toroon63ldapw03:389): CSN 5293f761000000020000 not found, we aren't as up to date, or we purged [25/Nov/2013:20:30:32 -0500] NSMMReplicationPlugin - agmt="cn=ldapw022toroon63ldapw03" (toroon63ldapw03:389): Data required to update replica has been purged. The replica must be reinitialized. [25/Nov/2013:20:30:32 -0500] NSMMReplicationPlugin - agmt="cn=ldapw022toroon63ldapw03" (toroon63ldapw03:389): Incremental update failed and requires administrator action [25/Nov/2013:20:30:32 -0500] NSMMReplicationPlugin - agmt="cn=ldapw022toroon63dsaw03" (toroon63dsaw03:389): Incremental update failed and requires administrator action [25/Nov/2013:20:35:30 -0500] NSMMReplicationPlugin - changelog program - agmt="cn=ldapw022ldapw01" (ldapw01:389): CSN 5293f7f3000000050000 not found, we aren't as up to date, or we purged [25/Nov/2013:20:35:30 -0500] NSMMReplicationPlugin - agmt="cn=ldapw022ldapw01" (ldapw01:389): Data required to update replica has been purged. The replica must be reinitialized. [25/Nov/2013:20:35:30 -0500] NSMMReplicationPlugin - agmt="cn=ldapw022ldapw01" (ldapw01:389): Incremental update failed and requires administrator action.
Replication seems to work to 'ldapw02' from all other boxes, but replication from 'ldapw02' does not work. The issue disappears when I initialize all the neighbors from the affected box, but I will have to find the root cause for this, since this seems to happen very frequently. Also, I am not able to diagnose the reason for the crash, since installing 'debuginfo' package is out of my scope. I see a similar issue being discussed here,
http://thr3ads.net/fedora-directory-users/2007/10/176314-Re-Cant-locate-CSN-...
and in response to this discussion, the following bug was filed,
https://bugzilla.redhat.com/show_bug.cgi?id=388021
It is mentioned that this is fixed in "fedora-ds-base-1.2.0", so I hope I should have this fix in my 389 server 1.2.8.2. Do you have any idea as to why I am still getting this problem? Also, we did a recent LDAP upgrade from 389 DS 1.1.2 to 1.2.8.2, after which I see this problem happening in one or the other LDAP server. Is this related? Any help is appreciated.
Regards Sugantha J
On 12/02/2013 05:49 AM, Sugantha J wrote:
Hi Everyone I am running 389 DS 1.2.8.2 in CentOS 4.8. I have a multi master setup, with 12 LDAP servers. Everything was working fine, till one of the boxes (ldapw02) suddenly crashed. When it came back up, I see the following in the error log, [25/Nov/2013:20:26:00 -0500] - 389-Directory/1.2.8.2 B2013.028.104 starting up [25/Nov/2013:20:26:01 -0500] - Detected Disorderly Shutdown last time Directory Server was running, recovering database. [25/Nov/2013:20:26:03 -0500] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: data for replica o=EmpData does not match the data in the changelog (replica data (5293f8a1000000040000) > changelog (5293f89b000000080000)). Recreating the changelog file. This could affect replication with replica's consumers in which case the consumers should be reinitialized. [25/Nov/2013:20:26:03 -0500] - slapd started. Listening on All Interfaces port 389 for LDAP requests [25/Nov/2013:20:30:32 -0500] NSMMReplicationPlugin - changelog program
- agmt="cn=ldapw022toroon63dsaw03" (toroon63dsaw03:389): CSN
5293f761000000020000 not found, we aren't as up to date, or we purged [25/Nov/2013:20:30:32 -0500] NSMMReplicationPlugin - agmt="cn=ldapw022toroon63dsaw03" (toroon63dsaw03:389): Data required to update replica has been purged. The replica must be reinitialized. [25/Nov/2013:20:30:32 -0500] NSMMReplicationPlugin - changelog program
- agmt="cn=ldapw022toroon63ldapw03" (toroon63ldapw03:389): CSN
5293f761000000020000 not found, we aren't as up to date, or we purged [25/Nov/2013:20:30:32 -0500] NSMMReplicationPlugin - agmt="cn=ldapw022toroon63ldapw03" (toroon63ldapw03:389): Data required to update replica has been purged. The replica must be reinitialized. [25/Nov/2013:20:30:32 -0500] NSMMReplicationPlugin - agmt="cn=ldapw022toroon63ldapw03" (toroon63ldapw03:389): Incremental update failed and requires administrator action [25/Nov/2013:20:30:32 -0500] NSMMReplicationPlugin - agmt="cn=ldapw022toroon63dsaw03" (toroon63dsaw03:389): Incremental update failed and requires administrator action [25/Nov/2013:20:35:30 -0500] NSMMReplicationPlugin - changelog program
- agmt="cn=ldapw022ldapw01" (ldapw01:389): CSN 5293f7f3000000050000
not found, we aren't as up to date, or we purged [25/Nov/2013:20:35:30 -0500] NSMMReplicationPlugin - agmt="cn=ldapw022ldapw01" (ldapw01:389): Data required to update replica has been purged. The replica must be reinitialized. [25/Nov/2013:20:35:30 -0500] NSMMReplicationPlugin - agmt="cn=ldapw022ldapw01" (ldapw01:389): Incremental update failed and requires administrator action. Replication seems to work to ‘ldapw02’ from all other boxes, but replication from ‘ldapw02’ does not work. The issue disappears when I initialize all the neighbors from the affected box, but I will have to find the root cause for this, since this seems to happen very frequently.
Your boxes are crashing frequently?
Also, I am not able to diagnose the reason for the crash, since installing ‘debuginfo’ package is out of my scope.
Why?
I see a similar issue being discussed here, _http://thr3ads.net/fedora-directory-users/2007/10/176314-Re-Cant-locate-CSN-... and in response to this discussion, the following bug was filed, _https://bugzilla.redhat.com/show_bug.cgi?id=388021_ It is mentioned that this is fixed in “fedora-ds-base-1.2.0”, so I hope I should have this fix in my 389 server 1.2.8.2. Do you have any idea as to why I am still getting this problem?
No.
Also, we did a recent LDAP upgrade from 389 DS 1.1.2 to 1.2.8.2, after which I see this problem happening in one or the other LDAP server. Is this related? Any help is appreciated.
It's going to be extremely difficult to support 1.2.8. The oldest supported version (meaning someone on the dev team can actually try to install and reproduce the problem) is 1.2.11.
Regards Sugantha J
-- 389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users
Hi Rich
Yes, My LDAP process is crashing very often and when they are started up, one way replication seem to be affected. I will have to manually initialize the neighbors to correct the problem. I don’t notice a shortage of disk or memory leading to a crash. Also, since this is a production LDAP server, installing debuginfo can be very hard to negotiate for. Even if you can reproduce the problem in 1.2.11 and possibly fix the same, I can back port the fix to 1.2.8.2.
Please let me know your thoughts.
Regards Sugantha J
From: Rich Megginson [mailto:rmeggins@redhat.com] Sent: Tuesday, December 03, 2013 12:22 AM To: General discussion list for the 389 Directory server project. Cc: Sugantha J Subject: Re: [389-users] Replication issue after improper shutdown
On 12/02/2013 05:49 AM, Sugantha J wrote: Hi Everyone
I am running 389 DS 1.2.8.2 in CentOS 4.8. I have a multi master setup, with 12 LDAP servers. Everything was working fine, till one of the boxes (ldapw02) suddenly crashed. When it came back up, I see the following in the error log,
[25/Nov/2013:20:26:00 -0500] - 389-Directory/1.2.8.2 B2013.028.104 starting up [25/Nov/2013:20:26:01 -0500] - Detected Disorderly Shutdown last time Directory Server was running, recovering database. [25/Nov/2013:20:26:03 -0500] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: data for replica o=EmpData does not match the data in the changelog (replica data (5293f8a1000000040000) > changelog (5293f89b000000080000)). Recreating the changelog file. This could affect replication with replica's consumers in which case the consumers should be reinitialized. [25/Nov/2013:20:26:03 -0500] - slapd started. Listening on All Interfaces port 389 for LDAP requests [25/Nov/2013:20:30:32 -0500] NSMMReplicationPlugin - changelog program - agmt="cn=ldapw022toroon63dsaw03" (toroon63dsaw03:389): CSN 5293f761000000020000 not found, we aren't as up to date, or we purged [25/Nov/2013:20:30:32 -0500] NSMMReplicationPlugin - agmt="cn=ldapw022toroon63dsaw03" (toroon63dsaw03:389): Data required to update replica has been purged. The replica must be reinitialized. [25/Nov/2013:20:30:32 -0500] NSMMReplicationPlugin - changelog program - agmt="cn=ldapw022toroon63ldapw03" (toroon63ldapw03:389): CSN 5293f761000000020000 not found, we aren't as up to date, or we purged [25/Nov/2013:20:30:32 -0500] NSMMReplicationPlugin - agmt="cn=ldapw022toroon63ldapw03" (toroon63ldapw03:389): Data required to update replica has been purged. The replica must be reinitialized. [25/Nov/2013:20:30:32 -0500] NSMMReplicationPlugin - agmt="cn=ldapw022toroon63ldapw03" (toroon63ldapw03:389): Incremental update failed and requires administrator action [25/Nov/2013:20:30:32 -0500] NSMMReplicationPlugin - agmt="cn=ldapw022toroon63dsaw03" (toroon63dsaw03:389): Incremental update failed and requires administrator action [25/Nov/2013:20:35:30 -0500] NSMMReplicationPlugin - changelog program - agmt="cn=ldapw022ldapw01" (ldapw01:389): CSN 5293f7f3000000050000 not found, we aren't as up to date, or we purged [25/Nov/2013:20:35:30 -0500] NSMMReplicationPlugin - agmt="cn=ldapw022ldapw01" (ldapw01:389): Data required to update replica has been purged. The replica must be reinitialized. [25/Nov/2013:20:35:30 -0500] NSMMReplicationPlugin - agmt="cn=ldapw022ldapw01" (ldapw01:389): Incremental update failed and requires administrator action.
Replication seems to work to ‘ldapw02’ from all other boxes, but replication from ‘ldapw02’ does not work. The issue disappears when I initialize all the neighbors from the affected box, but I will have to find the root cause for this, since this seems to happen very frequently.
Your boxes are crashing frequently?
Also, I am not able to diagnose the reason for the crash, since installing ‘debuginfo’ package is out of my scope.
Why?
I see a similar issue being discussed here,
http://thr3ads.net/fedora-directory-users/2007/10/176314-Re-Cant-locate-CSN-...
and in response to this discussion, the following bug was filed,
https://bugzilla.redhat.com/show_bug.cgi?id=388021
It is mentioned that this is fixed in “fedora-ds-base-1.2.0”, so I hope I should have this fix in my 389 server 1.2.8.2. Do you have any idea as to why I am still getting this problem? No.
Also, we did a recent LDAP upgrade from 389 DS 1.1.2 to 1.2.8.2, after which I see this problem happening in one or the other LDAP server. Is this related? Any help is appreciated.
It's going to be extremely difficult to support 1.2.8. The oldest supported version (meaning someone on the dev team can actually try to install and reproduce the problem) is 1.2.11.
Regards Sugantha J
--
389 users mailing list
389-users@lists.fedoraproject.orgmailto:389-users@lists.fedoraproject.org
On 12/03/2013 04:45 AM, Sugantha J wrote:
Hi Rich
Yes, My LDAP process is crashing very often and when they are started up, one way replication seem to be affected. I will have to manually initialize the neighbors to correct the problem. I don’t notice a shortage of disk or memory leading to a crash. Also, since this is a production LDAP server, installing debuginfo can be very hard to negotiate for. Even if you can reproduce the problem in 1.2.11 and possibly fix the same, I can back port the fix to 1.2.8.2.
Please let me know your thoughts.
Without a core file/stack trace it will be very difficult to figure out why it is crashing. Is it possible that you can configure the production system to produce core files as in http://port389.org/wiki/FAQ#Debugging_Crashes _without_ installing debuginfo packages on the production machine? Then you can set up a non-production machine with the debuginfo packages. When you get a core file on the production machine, copy the core file to the non-production machine and generate the stack trace.
I am not aware of this problem in 1.2.11. If you can give me the exact steps to reproduce the crash, I might be able to attempt to reproduce with 1.2.11. Even then, there have been dozens of fixes for replication issues and crash issues since 1.2.8. It may be very difficult for you to back port all of them to 1.2.8.
I'm assuming you are stuck with 1.2.8 because you are stuck on EL4?
Regards
Sugantha J
*From:*Rich Megginson [mailto:rmeggins@redhat.com] *Sent:* Tuesday, December 03, 2013 12:22 AM *To:* General discussion list for the 389 Directory server project. *Cc:* Sugantha J *Subject:* Re: [389-users] Replication issue after improper shutdown
On 12/02/2013 05:49 AM, Sugantha J wrote:
Hi Everyone I am running 389 DS 1.2.8.2 in CentOS 4.8. I have a multi master setup, with 12 LDAP servers. Everything was working fine, till one of the boxes (ldapw02) suddenly crashed. When it came back up, I see the following in the error log, [25/Nov/2013:20:26:00 -0500] - 389-Directory/1.2.8.2 B2013.028.104 starting up [25/Nov/2013:20:26:01 -0500] - Detected Disorderly Shutdown last time Directory Server was running, recovering database. [25/Nov/2013:20:26:03 -0500] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: data for replica o=EmpData does not match the data in the changelog (replica data (5293f8a1000000040000) > changelog (5293f89b000000080000)). Recreating the changelog file. This could affect replication with replica's consumers in which case the consumers should be reinitialized. [25/Nov/2013:20:26:03 -0500] - slapd started. Listening on All Interfaces port 389 for LDAP requests [25/Nov/2013:20:30:32 -0500] NSMMReplicationPlugin - changelog program - agmt="cn=ldapw022toroon63dsaw03" (toroon63dsaw03:389): CSN 5293f761000000020000 not found, we aren't as up to date, or we purged [25/Nov/2013:20:30:32 -0500] NSMMReplicationPlugin - agmt="cn=ldapw022toroon63dsaw03" (toroon63dsaw03:389): Data required to update replica has been purged. The replica must be reinitialized. [25/Nov/2013:20:30:32 -0500] NSMMReplicationPlugin - changelog program - agmt="cn=ldapw022toroon63ldapw03" (toroon63ldapw03:389): CSN 5293f761000000020000 not found, we aren't as up to date, or we purged [25/Nov/2013:20:30:32 -0500] NSMMReplicationPlugin - agmt="cn=ldapw022toroon63ldapw03" (toroon63ldapw03:389): Data required to update replica has been purged. The replica must be reinitialized. [25/Nov/2013:20:30:32 -0500] NSMMReplicationPlugin - agmt="cn=ldapw022toroon63ldapw03" (toroon63ldapw03:389): Incremental update failed and requires administrator action [25/Nov/2013:20:30:32 -0500] NSMMReplicationPlugin - agmt="cn=ldapw022toroon63dsaw03" (toroon63dsaw03:389): Incremental update failed and requires administrator action [25/Nov/2013:20:35:30 -0500] NSMMReplicationPlugin - changelog program - agmt="cn=ldapw022ldapw01" (ldapw01:389): CSN 5293f7f3000000050000 not found, we aren't as up to date, or we purged [25/Nov/2013:20:35:30 -0500] NSMMReplicationPlugin - agmt="cn=ldapw022ldapw01" (ldapw01:389): Data required to update replica has been purged. The replica must be reinitialized. [25/Nov/2013:20:35:30 -0500] NSMMReplicationPlugin - agmt="cn=ldapw022ldapw01" (ldapw01:389): Incremental update failed and requires administrator action. Replication seems to work to ‘ldapw02’ from all other boxes, but replication from ‘ldapw02’ does not work. The issue disappears when I initialize all the neighbors from the affected box, but I will have to find the root cause for this, since this seems to happen very frequently.Your boxes are crashing frequently?
Also, I am not able to diagnose the reason for the crash, since installing ‘debuginfo’ package is out of my scope.
Why?
I see a similar issue being discussed here,
http://thr3ads.net/fedora-directory-users/2007/10/176314-Re-Cant-locate-CSN-...
and in response to this discussion, the following bug was filed,
https://bugzilla.redhat.com/show_bug.cgi?id=388021
It is mentioned that this is fixed in “fedora-ds-base-1.2.0”, so I hope I should have this fix in my 389 server 1.2.8.2. Do you have any idea as to why I am still getting this problem?
No.
Also, we did a recent LDAP upgrade from 389 DS 1.1.2 to 1.2.8.2, after which I see this problem happening in one or the other LDAP server. Is this related? Any help is appreciated.
It's going to be extremely difficult to support 1.2.8. The oldest supported version (meaning someone on the dev team can actually try to install and reproduce the problem) is 1.2.11.
Regards
Sugantha J
-- 389 users mailing list 389-users@lists.fedoraproject.org mailto:389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users
389-users@lists.fedoraproject.org