<br>Just in case anybody is interested in the reason the corruption occurred.<br>Apparently a rotten browsing index caused it. <br>An error message pointed me in this direction :<br> errors:[17/Jun/2010:12:51:18 +0200] - vlv_build_idl: can't follow db cursor (err -30989)<br>
I deleted the browsing index from the particular ou and the problem was gone.<br><br><div class="gmail_quote">On Wed, Jun 16, 2010 at 9:39 PM, Rich Megginson <span dir="ltr"><<a href="mailto:rmeggins@redhat.com">rmeggins@redhat.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"><div><div></div><div class="h5">mark benschop wrote:<br>
> Hi Rich,<br>
> Thanks for your reply.<br>
> Please find the logging from the problems below.<br>
> The serverb55 is one of 2 servers in a multiple masters configuration<br>
> that consists of serverb55 and serverb05.<br>
><br>
> The problem I inititially had was that I had 2 entries that could not<br>
> be deleted serverb55.<br>
><br>
> Here's logging from the access file.<br>
> =======================================================================<br>
> access.20100614-092820:[15/Jun/2010:09:20:49 +0200] conn=342177 op=7<br>
> SRCH base="uid=dbeijk, ou=people, dc=directory,dc=intern" scope=0<br>
> filter="(objectClass=*)" attrs=ALL<br>
> access.20100614-092820:[15/Jun/2010:09:20:49 +0200] conn=342177 op=7<br>
> RESULT err=0 tag=101 nentries=1 etime=0<br>
> access.20100614-092820:[15/Jun/2010:09:22:08 +0200] conn=342177 op=8<br>
> SRCH base="uid=dbeijk, ou=people, dc=directory,dc=intern" scope=1<br>
> filter="(objectClass=*)" attrs="objectClass"<br>
> access.20100614-092820:[15/Jun/2010:09:22:08 +0200] conn=342177 op=8<br>
> RESULT err=0 tag=101 nentries=0 etime=0 notes=U<br>
> access.20100614-092820:[15/Jun/2010:09:22:08 +0200] conn=342177 op=9<br>
> DEL dn="uid=dbeijk, ou=people, dc=directory,dc=intern"<br>
> access.20100614-092820:[15/Jun/2010:09:22:08 +0200] conn=342177 op=9<br>
> RESULT err=1 tag=107 nentries=0 etime=0 csn=4c172a21000000370000<br>
> access.20100614-092820:[15/Jun/2010:09:22:08 +0200] conn=342177 op=10<br>
> SRCH base="uid=dbeijk, ou=people, dc=directory,dc=intern" scope=1<br>
> filter="(objectClass=*)" attrs="objectClass"<br>
> access.20100614-092820:[15/Jun/2010:09:22:08 +0200] conn=342177 op=10<br>
> RESULT err=0 tag=101 nentries=0 etime=0 notes=U<br>
> =======================================================================<br>
><br>
> LDAP error 1 i found means 'unwiling to perform'. First I thought<br>
> something might be wrong with the entry itself.<br>
> The error log found in the error log from the serverb55 I've added<br>
> below seemed to point in that direction.<br>
><br>
><br>
> When I logged on the the other ldapserver, serverb05, I tried to<br>
> delete the same entry to see if this slapd had the same issue but here<br>
> it worked.<br>
> Replicating the delete didn't. The following error was logged to the<br>
> errorlog of this :<br>
><br>
> ========================================================================<br>
> [15/Jun/2010:09:35:17 +0200] NSMMReplicationPlugin -<br>
> agmt="cn=serverb55" (serverb55:636): Consumer failed to replay change<br>
> (uniqueid a276337c-5dc511df-852cfef8-667fa4d4, CSN<br>
> 4c172d36000000050000): Operations error. Will retry later.<br>
> =======================================================================<br>
><br>
> So there seemed to be a problem with the serverb55 only.<br>
> Since I assumed the database got somehow corrupt or inconsistent I've<br>
> tried the following steps to try and recreate the database or had it<br>
> checked in order to get it right again.<br>
> First there's the errors from the account that could not be deleted.<br>
> I 'reinitialised the consumer' from the working serverb05 to the<br>
> problematic serverb55.<br>
> Then I restarted the slapd.<br>
> Made an export of the database and imported that.<br>
> Slapd stopped the database.<br>
><br>
> Please find the logging from /var/log/dirsrv/slapd-serverb55/errors<br>
> from the actions leading to the problem of the fatal server stop.<br>
> ======================================================================<br>
> CentOS-Directory/8.1.0<br>
> B2009.134.1334<br>
><br>
> serverb55:636<br>
> (/etc/dirsrv/slapd-serverb55)<br>
><br>
><br>
><br>
> [15/Jun/2010:09:22:58 +0200] - Entry "uid=dbeijk, ou=People,<br>
> dc=directory,dc=intern" missing attribute "uidNumber" required by<br>
> object class "posixAccount"<br>
> [15/Jun/2010:09:22:58 +0200] - Entry "uid=dbeijk, ou=People,<br>
> dc=directory,dc=intern" missing attribute "gidNumber" required by<br>
> object class "posixAccount"<br>
> [15/Jun/2010:09:22:58 +0200] - Entry "uid=dbeijk, ou=People,<br>
> dc=directory,dc=intern" -- attribute "uidNumber" not allowed<br>
> [15/Jun/2010:09:22:58 +0200] - Entry "uid=dbeijk, ou=People,<br>
> dc=directory,dc=intern" missing attribute "uid" required by object<br>
> class "posixAccount"<br>
> [15/Jun/2010:09:22:58 +0200] - Entry "uid=dbeijk, ou=People,<br>
> dc=directory,dc=intern" missing attribute "cn" required by object<br>
> class "posixAccount"<br>
> [15/Jun/2010:09:22:58 +0200] - Entry "uid=dbeijk, ou=People,<br>
> dc=directory,dc=intern" missing attribute "homeDirectory" required by<br>
> object class "posixAccount"<br>
> [15/Jun/2010:09:23:04 +0200] - Entry "uid=dbeijk, ou=People,<br>
> dc=directory,dc=intern" missing attribute "uidNumber" required by<br>
> object class "posixAccount"<br>
> [15/Jun/2010:09:23:18 +0200] - Entry "uid=dbeijk, ou=People,<br>
> dc=directory,dc=intern" missing attribute "uidNumber" required by<br>
> object class "posixAccount"<br>
> [15/Jun/2010:09:23:18 +0200] - Entry "uid=dbeijk, ou=People,<br>
> dc=directory,dc=intern" missing attribute "gidNumber" required by<br>
> object class "posixAccount"<br>
> [15/Jun/2010:09:23:18 +0200] - Entry "uid=dbeijk, ou=People,<br>
> dc=directory,dc=intern" required attribute "objectclass" missing<br>
> [15/Jun/2010:09:23:18 +0200] - Entry "uid=dbeijk, ou=People,<br>
> dc=directory,dc=intern" required attribute "objectclass" missing<br>
> [15/Jun/2010:09:23:18 +0200] - Entry "uid=dbeijk, ou=People,<br>
> dc=directory,dc=intern" missing attribute "uid" required by object<br>
> class "posixAccount"<br>
> [15/Jun/2010:09:23:18 +0200] - Entry "uid=dbeijk, ou=People,<br>
> dc=directory,dc=intern" missing attribute "cn" required by object<br>
> class "posixAccount"<br>
> [15/Jun/2010:09:23:18 +0200] - Entry "uid=dbeijk, ou=People,<br>
> dc=directory,dc=intern" missing attribute "homeDirectory" required by<br>
> object class "posixAccount"<br>
> [15/Jun/2010:09:24:56 +0200] - Entry "uid=DEL *.*, ou=People,<br>
> dc=directory,dc=intern" missing attribute "homeDirectory" required by<br>
> object class "posixAccount"<br>
> [15/Jun/2010:09:50:20 +0200] - Entry "cn=wchiman, ou=people,<br>
> dc=directory,dc=intern" -- attribute "uidNumber" not allowed<br>
> [15/Jun/2010:10:12:43 +0200] NSMMReplicationPlugin -<br>
> multimaster_be_state_change: replica dc=directory,dc=intern is going<br>
> offline; disabling replication<br>
> [15/Jun/2010:10:12:43 +0200] - attrcrypt_unwrap_key: failed to unwrap<br>
> key for cipher AES<br>
> [15/Jun/2010:10:12:43 +0200] - Failed to retrieve key for cipher AES<br>
> in attrcrypt_cipher_init<br>
> [15/Jun/2010:10:12:43 +0200] - Failed to initialize cipher AES in<br>
> attrcrypt_init<br>
> [15/Jun/2010:10:12:43 +0200] - WARNING: Import is running with<br>
> nsslapd-db-private-import-mem on; No other process is allowed to<br>
> access the database<br>
> [15/Jun/2010:10:12:48 +0200] - import userRoot: Workers finished;<br>
> cleaning up...<br>
> [15/Jun/2010:10:12:49 +0200] - import userRoot: Workers cleaned up.<br>
> [15/Jun/2010:10:12:49 +0200] - import userRoot: Indexing complete.<br>
> Post-processing...<br>
> [15/Jun/2010:10:12:49 +0200] - import userRoot: Flushing caches...<br>
> [15/Jun/2010:10:12:49 +0200] - import userRoot: Closing files...<br>
> [15/Jun/2010:10:12:49 +0200] - import userRoot: Import complete.<br>
> Processed 4849 entries in 6 seconds. (808.17 entries/sec)<br>
> [15/Jun/2010:10:12:49 +0200] - attrcrypt_unwrap_key: failed to unwrap<br>
> key for cipher AES<br>
> [15/Jun/2010:10:12:49 +0200] - Failed to retrieve key for cipher AES<br>
> in attrcrypt_cipher_init<br>
> [15/Jun/2010:10:12:49 +0200] - Failed to initialize cipher AES in<br>
> attrcrypt_init<br>
> [15/Jun/2010:10:12:49 +0200] NSMMReplicationPlugin -<br>
> multimaster_be_state_change: replica dc=directory,dc=intern is coming<br>
> online; enabling replication<br>
> [15/Jun/2010:10:12:49 +0200] NSMMReplicationPlugin -<br>
> replica_reload_ruv: Warning: new data for replica<br>
> dc=directory,dc=intern does not match the data in the changelog.<br>
> Recreating the changelog file. This could affect replication with<br>
> replica's consumers in which case the consumers should be reinitialized.<br>
> [15/Jun/2010:10:12:49 +0200] - skipping cos definition<br>
> cn=nsAccountInactivation_cos,dc=directory,dc=intern--no templates found<br>
> [15/Jun/2010:10:55:43 +0200] NSMMReplicationPlugin -<br>
> multimaster_be_state_change: replica dc=directory,dc=intern is going<br>
> offline; disabling replication<br>
> [15/Jun/2010:10:55:43 +0200] - attrcrypt_unwrap_key: failed to unwrap<br>
> key for cipher AES<br>
> [15/Jun/2010:10:55:43 +0200] - Failed to retrieve key for cipher AES<br>
> in attrcrypt_cipher_init<br>
> [15/Jun/2010:10:55:43 +0200] - Failed to initialize cipher AES in<br>
> attrcrypt_init<br>
> [15/Jun/2010:10:55:43 +0200] - WARNING: Import is running with<br>
> nsslapd-db-private-import-mem on; No other process is allowed to<br>
> access the database<br>
> [15/Jun/2010:10:55:49 +0200] - import userRoot: Workers finished;<br>
> cleaning up...<br>
> [15/Jun/2010:10:55:49 +0200] - import userRoot: Workers cleaned up.<br>
> [15/Jun/2010:10:55:49 +0200] - import userRoot: Indexing complete.<br>
> Post-processing...<br>
> [15/Jun/2010:10:55:49 +0200] - import userRoot: Flushing caches...<br>
> [15/Jun/2010:10:55:49 +0200] - import userRoot: Closing files...<br>
> [15/Jun/2010:10:55:49 +0200] - import userRoot: Import complete.<br>
> Processed 4850 entries in 5 seconds. (970.00 entries/sec)<br>
> [15/Jun/2010:10:55:49 +0200] - attrcrypt_unwrap_key: failed to unwrap<br>
> key for cipher AES<br>
> [15/Jun/2010:10:55:49 +0200] - Failed to retrieve key for cipher AES<br>
> in attrcrypt_cipher_init<br>
> [15/Jun/2010:10:55:49 +0200] - Failed to initialize cipher AES in<br>
> attrcrypt_init<br>
> [15/Jun/2010:10:55:49 +0200] NSMMReplicationPlugin -<br>
> multimaster_be_state_change: replica dc=directory,dc=intern is coming<br>
> online; enabling replication<br>
> [15/Jun/2010:10:55:49 +0200] NSMMReplicationPlugin -<br>
> replica_reload_ruv: Warning: new data for replica<br>
> dc=directory,dc=intern does not match the data in the changelog.<br>
> Recreating the changelog file. This could affect replication with<br>
> replica's consumers in which case the consumers should be reinitialized.<br>
> [15/Jun/2010:10:55:49 +0200] - skipping cos definition<br>
> cn=nsAccountInactivation_cos,dc=directory,dc=intern--no templates found<br>
> [15/Jun/2010:10:59:57 +0200] - slapd shutting down - signaling<br>
> operation threads<br>
> [15/Jun/2010:10:59:57 +0200] - slapd shutting down - waiting for 26<br>
> threads to terminate<br>
> [15/Jun/2010:10:59:57 +0200] - slapd shutting down - closing down<br>
> internal subsystems and plugins<br>
> [15/Jun/2010:10:59:58 +0200] - Waiting for 4 database threads to stop<br>
> [15/Jun/2010:10:59:59 +0200] - All database threads now stopped<br>
> [15/Jun/2010:10:59:59 +0200] - slapd stopped.<br>
> CentOS-Directory/8.1.0 B2009.134.1334<br>
> <host>:<port> (/etc/dirsrv/slapd-serverb55)<br>
><br>
> [15/Jun/2010:11:00:01 +0200] - Entry "cn=schema" single-valued<br>
> attribute "modifyTimestamp" has multiple values<br>
> CentOS-Directory/8.1.0 B2009.134.1334<br>
> serverb55:636 (/etc/dirsrv/slapd-serverb55)<br>
><br>
> [15/Jun/2010:11:00:01 +0200] - CentOS-Directory/8.1.0 B2009.134.1334<br>
> starting up<br>
> [15/Jun/2010:11:00:01 +0200] - I'm resizing my cache now...cache was<br>
> 20000000 and is now 8000000<br>
> [15/Jun/2010:11:00:01 +0200] - attrcrypt_unwrap_key: failed to unwrap<br>
> key for cipher AES<br>
> [15/Jun/2010:11:00:01 +0200] - Failed to retrieve key for cipher AES<br>
> in attrcrypt_cipher_init<br>
> [15/Jun/2010:11:00:01 +0200] - Failed to initialize cipher AES in<br>
> attrcrypt_init<br>
> [15/Jun/2010:11:00:01 +0200] - attrcrypt_unwrap_key: failed to unwrap<br>
> key for cipher AES<br>
> [15/Jun/2010:11:00:01 +0200] - Failed to retrieve key for cipher AES<br>
> in attrcrypt_cipher_init<br>
> [15/Jun/2010:11:00:01 +0200] - Failed to initialize cipher AES in<br>
> attrcrypt_init<br>
> [15/Jun/2010:11:00:01 +0200] - skipping cos definition<br>
> cn=nsAccountInactivation_cos,dc=directory,dc=intern--no templates found<br>
> [15/Jun/2010:11:00:01 +0200] NSMMReplicationPlugin -<br>
> replica_check_for_data_reload: Warning: data for replica<br>
> dc=directory,dc=intern was reloaded and it no longer matches the data<br>
> in the changelog (replica data > changelog). Recreating the changelog<br>
> file. This could affect replication with replica's consumers in which<br>
> case the consumers should be reinitialized.<br>
> [15/Jun/2010:11:00:01 +0200] - skipping cos definition<br>
> cn=nsAccountInactivation_cos,dc=directory,dc=intern--no templates found<br>
> [15/Jun/2010:11:00:01 +0200] - slapd started. Listening on All<br>
> Interfaces port 389 for LDAP requests<br>
> [15/Jun/2010:11:00:01 +0200] - Listening on All Interfaces port 636<br>
> for LDAPS requests<br>
> [15/Jun/2010:11:29:59 +0200] - slapd shutting down - signaling<br>
> operation threads<br>
> [15/Jun/2010:11:29:59 +0200] - slapd shutting down - closing down<br>
> internal subsystems and plugins<br>
> [15/Jun/2010:11:30:00 +0200] - Waiting for 4 database threads to stop<br>
> [15/Jun/2010:11:30:00 +0200] - All database threads now stopped<br>
> [15/Jun/2010:11:30:00 +0200] - slapd stopped.<br>
> CentOS-Directory/8.1.0 B2009.134.1334<br>
> <host>:<port> (/etc/dirsrv/slapd-serverb55)<br>
><br>
> [15/Jun/2010:11:30:03 +0200] - Entry "cn=schema" single-valued<br>
> attribute "modifyTimestamp" has multiple values<br>
> CentOS-Directory/8.1.0 B2009.134.1334<br>
> serverb55:636 (/etc/dirsrv/slapd-serverb55)<br>
><br>
> [15/Jun/2010:11:30:03 +0200] - CentOS-Directory/8.1.0 B2009.134.1334<br>
> starting up<br>
> [15/Jun/2010:11:30:03 +0200] - attrcrypt_unwrap_key: failed to unwrap<br>
> key for cipher AES<br>
> [15/Jun/2010:11:30:03 +0200] - Failed to retrieve key for cipher AES<br>
> in attrcrypt_cipher_init<br>
> [15/Jun/2010:11:30:03 +0200] - Failed to initialize cipher AES in<br>
> attrcrypt_init<br>
> [15/Jun/2010:11:30:03 +0200] - attrcrypt_unwrap_key: failed to unwrap<br>
> key for cipher AES<br>
> [15/Jun/2010:11:30:03 +0200] - Failed to retrieve key for cipher AES<br>
> in attrcrypt_cipher_init<br>
> [15/Jun/2010:11:30:03 +0200] - Failed to initialize cipher AES in<br>
> attrcrypt_init<br>
> [15/Jun/2010:11:30:03 +0200] - skipping cos definition<br>
> cn=nsAccountInactivation_cos,dc=directory,dc=intern--no templates found<br>
> [15/Jun/2010:11:30:03 +0200] - skipping cos definition<br>
> cn=nsAccountInactivation_cos,dc=directory,dc=intern--no templates found<br>
> [15/Jun/2010:11:30:03 +0200] - slapd started. Listening on All<br>
> Interfaces port 389 for LDAP requests<br>
> [15/Jun/2010:11:30:03 +0200] - Listening on All Interfaces port 636<br>
> for LDAPS requests<br>
> [15/Jun/2010:11:40:44 +0200] - Beginning export of 'userroot'<br>
> [15/Jun/2010:11:40:44 +0200] - export userRoot: Processed 139 entries<br>
> (100%).<br>
> [15/Jun/2010:11:40:44 +0200] - Export finished.<br>
> [15/Jun/2010:11:46:12 +0200] NSMMReplicationPlugin -<br>
> multimaster_be_state_change: replica dc=directory,dc=intern is going<br>
> offline; disabling replication<br>
> [15/Jun/2010:11:46:12 +0200] - attrcrypt_unwrap_key: failed to unwrap<br>
> key for cipher AES<br>
> [15/Jun/2010:11:46:12 +0200] - Failed to retrieve key for cipher AES<br>
> in attrcrypt_cipher_init<br>
> [15/Jun/2010:11:46:12 +0200] - Failed to initialize cipher AES in<br>
> attrcrypt_init<br>
> [15/Jun/2010:11:46:12 +0200] - WARNING: Import is running with<br>
> nsslapd-db-private-import-mem on; No other process is allowed to<br>
> access the database<br>
> [15/Jun/2010:11:46:14 +0200] - libdb: page 1: illegal page type or format<br>
> [15/Jun/2010:11:46:14 +0200] - libdb: PANIC: Invalid argument<br>
> [15/Jun/2010:11:46:14 +0200] - FATAL ERROR at by MCC ou=people<br>
> dc=directory dc=intern (77); server stopping as database recovery needed.<br>
> CentOS-Directory/8.1.0 B2009.134.1334<br>
> <host>:<port> (/etc/dirsrv/slapd-serverb55<br>
> ======================================================================<br>
><br>
> Finally my questions :<br>
> What could be the cause of the problem ?<br>
</div></div>Not sure - never seen this particular error before - any disk errors in<br>
/var/log/messages?<br>
<br>
Can you try doing a reinit of this server from the other server?<br>
<div class="im">> What would be the best procedure to get the serverb55 up and running<br>
> again ?<br>
</div>Doing a reinit of this server from the other server.<br>
<div class="im">><br>
> Thanks for any advise.<br>
><br>
> Regards,<br>
> Mark<br>
><br>
><br>
><br>
> =======<br>
><br>
><br>
> On Tue, Jun 15, 2010 at 7:04 PM, Rich Megginson <<a href="mailto:rmeggins@redhat.com">rmeggins@redhat.com</a><br>
</div><div><div></div><div class="h5">> <mailto:<a href="mailto:rmeggins@redhat.com">rmeggins@redhat.com</a>>> wrote:<br>
><br>
> mark benschop wrote:<br>
> > Hi All,<br>
> ><br>
> > I'm having a problem on a CentOs Directory Server 8.1 multiple<br>
> master<br>
> > setup.<br>
> > The database of one of the servers has been marked as corrupt<br>
> and has<br>
> > been brought offline by the Directory Server.<br>
> Can you post any relevant error messages from the error log of the<br>
> server?<br>
> > Ldapclients querying the ldapserver for e.g. loggin in of users<br>
> get an<br>
> > errormessage, effectively disabling users to log in.<br>
> What error message?<br>
> ><br>
> > I'm wondering what the best method is to recover from this<br>
> situation.<br>
> > I can think of a few :<br>
> > 1) Starting the ldapserver, deleting the database, recreating it and<br>
> > restoring a backup.<br>
><br>
> > 2) Starting the ldapserver, deleting the database and reinitialising<br>
> > the server from the other master.<br>
> If you reinitialize the problem server from another server, you don't<br>
> need to delete the database, reinit will do that for you.<br>
> ><br>
> > Can anyone give me some hints if this wil work or would another<br>
> > approach be better ?<br>
> ><br>
> > Thanks for your advise,<br>
> > Mark<br>
> ><br>
> ------------------------------------------------------------------------<br>
> ><br>
> > --<br>
> > 389 users mailing list<br>
> > <a href="mailto:389-users@lists.fedoraproject.org">389-users@lists.fedoraproject.org</a><br>
</div></div>> <mailto:<a href="mailto:389-users@lists.fedoraproject.org">389-users@lists.fedoraproject.org</a>><br>
<div class="im">> > <a href="https://admin.fedoraproject.org/mailman/listinfo/389-users" target="_blank">https://admin.fedoraproject.org/mailman/listinfo/389-users</a><br>
><br>
> --<br>
> 389 users mailing list<br>
> <a href="mailto:389-users@lists.fedoraproject.org">389-users@lists.fedoraproject.org</a><br>
</div>> <mailto:<a href="mailto:389-users@lists.fedoraproject.org">389-users@lists.fedoraproject.org</a>><br>
<div><div></div><div class="h5">> <a href="https://admin.fedoraproject.org/mailman/listinfo/389-users" target="_blank">https://admin.fedoraproject.org/mailman/listinfo/389-users</a><br>
><br>
><br>
> ------------------------------------------------------------------------<br>
><br>
> --<br>
> 389 users mailing list<br>
> <a href="mailto:389-users@lists.fedoraproject.org">389-users@lists.fedoraproject.org</a><br>
> <a href="https://admin.fedoraproject.org/mailman/listinfo/389-users" target="_blank">https://admin.fedoraproject.org/mailman/listinfo/389-users</a><br>
<br>
--<br>
389 users mailing list<br>
<a href="mailto:389-users@lists.fedoraproject.org">389-users@lists.fedoraproject.org</a><br>
<a href="https://admin.fedoraproject.org/mailman/listinfo/389-users" target="_blank">https://admin.fedoraproject.org/mailman/listinfo/389-users</a><br>
</div></div></blockquote></div><br>