Hi Rich,
Thanks for your reply.
Please find the logging from the problems below.
The serverb55 is one of 2 servers in a multiple masters configuration that consists of serverb55 and serverb05.

The problem I inititially had was that I had 2 entries that could not be deleted serverb55.

Here's logging from the access file.
=======================================================================
access.20100614-092820:[15/Jun/2010:09:20:49 +0200] conn=342177 op=7 SRCH base="uid=dbeijk, ou=people, dc=directory,dc=intern" scope=0 filter="(objectClass=*)" attrs=ALL
access.20100614-092820:[15/Jun/2010:09:20:49 +0200] conn=342177 op=7 RESULT err=0 tag=101 nentries=1 etime=0
access.20100614-092820:[15/Jun/2010:09:22:08 +0200] conn=342177 op=8 SRCH base="uid=dbeijk, ou=people, dc=directory,dc=intern" scope=1 filter="(objectClass=*)" attrs="objectClass"
access.20100614-092820:[15/Jun/2010:09:22:08 +0200] conn=342177 op=8 RESULT err=0 tag=101 nentries=0 etime=0 notes=U
access.20100614-092820:[15/Jun/2010:09:22:08 +0200] conn=342177 op=9 DEL dn="uid=dbeijk, ou=people, dc=directory,dc=intern"
access.20100614-092820:[15/Jun/2010:09:22:08 +0200] conn=342177 op=9 RESULT err=1 tag=107 nentries=0 etime=0 csn=4c172a21000000370000
access.20100614-092820:[15/Jun/2010:09:22:08 +0200] conn=342177 op=10 SRCH base="uid=dbeijk, ou=people, dc=directory,dc=intern" scope=1 filter="(objectClass=*)" attrs="objectClass"
access.20100614-092820:[15/Jun/2010:09:22:08 +0200] conn=342177 op=10 RESULT err=0 tag=101 nentries=0 etime=0 notes=U
=======================================================================

LDAP error 1 i found means 'unwiling to perform'. First I thought something might be wrong with the entry itself.
The error log found in the error log from the serverb55 I've added below seemed to point in that direction.


When I logged on the the other ldapserver, serverb05, I tried to delete the same entry to see if this slapd had the same issue but here it worked.
Replicating the delete didn't. The following error was logged to the errorlog of this :

========================================================================
[15/Jun/2010:09:35:17 +0200] NSMMReplicationPlugin - agmt="cn=serverb55" (serverb55:636): Consumer failed to replay change (uniqueid a276337c-5dc511df-852cfef8-667fa4d4, CSN 4c172d36000000050000): Operations error. Will retry later.
=======================================================================

So there seemed to be a problem with the serverb55 only.
Since I assumed the database got somehow corrupt or inconsistent I've tried the following steps to try and recreate the database or had it checked in order to get it right again.
First there's the errors from the account that could not be deleted.
I 'reinitialised the consumer' from the working serverb05 to the problematic serverb55.
Then I restarted the slapd.
Made an export of the database and imported that.
Slapd stopped the database.

Please find the logging from /var/log/dirsrv/slapd-serverb55/errors from the actions leading to the problem of the fatal server stop.
======================================================================
        CentOS-Directory/8.1.0 B2009.134.1334                                                                                                                                                          
        serverb55:636 (/etc/dirsrv/slapd-serverb55)                                                                                                                                 
                                                                                                                                                                                                       
[15/Jun/2010:09:22:58 +0200] - Entry "uid=dbeijk, ou=People, dc=directory,dc=intern" missing attribute "uidNumber" required by object class "posixAccount"
[15/Jun/2010:09:22:58 +0200] - Entry "uid=dbeijk, ou=People, dc=directory,dc=intern" missing attribute "gidNumber" required by object class "posixAccount"
[15/Jun/2010:09:22:58 +0200] - Entry "uid=dbeijk, ou=People, dc=directory,dc=intern" -- attribute "uidNumber" not allowed
[15/Jun/2010:09:22:58 +0200] - Entry "uid=dbeijk, ou=People, dc=directory,dc=intern" missing attribute "uid" required by object class "posixAccount"
[15/Jun/2010:09:22:58 +0200] - Entry "uid=dbeijk, ou=People, dc=directory,dc=intern" missing attribute "cn" required by object class "posixAccount"
[15/Jun/2010:09:22:58 +0200] - Entry "uid=dbeijk, ou=People, dc=directory,dc=intern" missing attribute "homeDirectory" required by object class "posixAccount"
[15/Jun/2010:09:23:04 +0200] - Entry "uid=dbeijk, ou=People, dc=directory,dc=intern" missing attribute "uidNumber" required by object class "posixAccount"
[15/Jun/2010:09:23:18 +0200] - Entry "uid=dbeijk, ou=People, dc=directory,dc=intern" missing attribute "uidNumber" required by object class "posixAccount"
[15/Jun/2010:09:23:18 +0200] - Entry "uid=dbeijk, ou=People, dc=directory,dc=intern" missing attribute "gidNumber" required by object class "posixAccount"
[15/Jun/2010:09:23:18 +0200] - Entry "uid=dbeijk, ou=People, dc=directory,dc=intern" required attribute "objectclass" missing
[15/Jun/2010:09:23:18 +0200] - Entry "uid=dbeijk, ou=People, dc=directory,dc=intern" required attribute "objectclass" missing
[15/Jun/2010:09:23:18 +0200] - Entry "uid=dbeijk, ou=People, dc=directory,dc=intern" missing attribute "uid" required by object class "posixAccount"
[15/Jun/2010:09:23:18 +0200] - Entry "uid=dbeijk, ou=People, dc=directory,dc=intern" missing attribute "cn" required by object class "posixAccount"
[15/Jun/2010:09:23:18 +0200] - Entry "uid=dbeijk, ou=People, dc=directory,dc=intern" missing attribute "homeDirectory" required by object class "posixAccount"
[15/Jun/2010:09:24:56 +0200] - Entry "uid=DEL *.*, ou=People, dc=directory,dc=intern" missing attribute "homeDirectory" required by object class "posixAccount"
[15/Jun/2010:09:50:20 +0200] - Entry "cn=wchiman, ou=people, dc=directory,dc=intern" -- attribute "uidNumber" not allowed
[15/Jun/2010:10:12:43 +0200] NSMMReplicationPlugin - multimaster_be_state_change: replica dc=directory,dc=intern is going offline; disabling replication
[15/Jun/2010:10:12:43 +0200] - attrcrypt_unwrap_key: failed to unwrap key for cipher AES
[15/Jun/2010:10:12:43 +0200] - Failed to retrieve key for cipher AES in attrcrypt_cipher_init
[15/Jun/2010:10:12:43 +0200] - Failed to initialize cipher AES in attrcrypt_init
[15/Jun/2010:10:12:43 +0200] - WARNING: Import is running with nsslapd-db-private-import-mem on; No other process is allowed to access the database
[15/Jun/2010:10:12:48 +0200] - import userRoot: Workers finished; cleaning up...
[15/Jun/2010:10:12:49 +0200] - import userRoot: Workers cleaned up.
[15/Jun/2010:10:12:49 +0200] - import userRoot: Indexing complete.  Post-processing...
[15/Jun/2010:10:12:49 +0200] - import userRoot: Flushing caches...
[15/Jun/2010:10:12:49 +0200] - import userRoot: Closing files...
[15/Jun/2010:10:12:49 +0200] - import userRoot: Import complete.  Processed 4849 entries in 6 seconds. (808.17 entries/sec)
[15/Jun/2010:10:12:49 +0200] - attrcrypt_unwrap_key: failed to unwrap key for cipher AES
[15/Jun/2010:10:12:49 +0200] - Failed to retrieve key for cipher AES in attrcrypt_cipher_init
[15/Jun/2010:10:12:49 +0200] - Failed to initialize cipher AES in attrcrypt_init
[15/Jun/2010:10:12:49 +0200] NSMMReplicationPlugin - multimaster_be_state_change: replica dc=directory,dc=intern is coming online; enabling replication
[15/Jun/2010:10:12:49 +0200] NSMMReplicationPlugin - replica_reload_ruv: Warning: new data for replica dc=directory,dc=intern does not match the data in the changelog.
 Recreating the changelog file. This could affect replication with replica's  consumers in which case the consumers should be reinitialized.
[15/Jun/2010:10:12:49 +0200] - skipping cos definition cn=nsAccountInactivation_cos,dc=directory,dc=intern--no templates found
[15/Jun/2010:10:55:43 +0200] NSMMReplicationPlugin - multimaster_be_state_change: replica dc=directory,dc=intern is going offline; disabling replication
[15/Jun/2010:10:55:43 +0200] - attrcrypt_unwrap_key: failed to unwrap key for cipher AES
[15/Jun/2010:10:55:43 +0200] - Failed to retrieve key for cipher AES in attrcrypt_cipher_init
[15/Jun/2010:10:55:43 +0200] - Failed to initialize cipher AES in attrcrypt_init
[15/Jun/2010:10:55:43 +0200] - WARNING: Import is running with nsslapd-db-private-import-mem on; No other process is allowed to access the database
[15/Jun/2010:10:55:49 +0200] - import userRoot: Workers finished; cleaning up...
[15/Jun/2010:10:55:49 +0200] - import userRoot: Workers cleaned up.
[15/Jun/2010:10:55:49 +0200] - import userRoot: Indexing complete.  Post-processing...
[15/Jun/2010:10:55:49 +0200] - import userRoot: Flushing caches...
[15/Jun/2010:10:55:49 +0200] - import userRoot: Closing files...
[15/Jun/2010:10:55:49 +0200] - import userRoot: Import complete.  Processed 4850 entries in 5 seconds. (970.00 entries/sec)
[15/Jun/2010:10:55:49 +0200] - attrcrypt_unwrap_key: failed to unwrap key for cipher AES
[15/Jun/2010:10:55:49 +0200] - Failed to retrieve key for cipher AES in attrcrypt_cipher_init
[15/Jun/2010:10:55:49 +0200] - Failed to initialize cipher AES in attrcrypt_init
[15/Jun/2010:10:55:49 +0200] NSMMReplicationPlugin - multimaster_be_state_change: replica dc=directory,dc=intern is coming online; enabling replication
[15/Jun/2010:10:55:49 +0200] NSMMReplicationPlugin - replica_reload_ruv: Warning: new data for replica dc=directory,dc=intern does not match the data in the changelog.
 Recreating the changelog file. This could affect replication with replica's  consumers in which case the consumers should be reinitialized.
[15/Jun/2010:10:55:49 +0200] - skipping cos definition cn=nsAccountInactivation_cos,dc=directory,dc=intern--no templates found
[15/Jun/2010:10:59:57 +0200] - slapd shutting down - signaling operation threads
[15/Jun/2010:10:59:57 +0200] - slapd shutting down - waiting for 26 threads to terminate
[15/Jun/2010:10:59:57 +0200] - slapd shutting down - closing down internal subsystems and plugins
[15/Jun/2010:10:59:58 +0200] - Waiting for 4 database threads to stop
[15/Jun/2010:10:59:59 +0200] - All database threads now stopped
[15/Jun/2010:10:59:59 +0200] - slapd stopped.
        CentOS-Directory/8.1.0 B2009.134.1334
        <host>:<port> (/etc/dirsrv/slapd-serverb55)

[15/Jun/2010:11:00:01 +0200] - Entry "cn=schema" single-valued attribute "modifyTimestamp" has multiple values
        CentOS-Directory/8.1.0 B2009.134.1334
        serverb55:636 (/etc/dirsrv/slapd-serverb55)

[15/Jun/2010:11:00:01 +0200] - CentOS-Directory/8.1.0 B2009.134.1334 starting up
[15/Jun/2010:11:00:01 +0200] - I'm resizing my cache now...cache was 20000000 and is now 8000000
[15/Jun/2010:11:00:01 +0200] - attrcrypt_unwrap_key: failed to unwrap key for cipher AES
[15/Jun/2010:11:00:01 +0200] - Failed to retrieve key for cipher AES in attrcrypt_cipher_init
[15/Jun/2010:11:00:01 +0200] - Failed to initialize cipher AES in attrcrypt_init
[15/Jun/2010:11:00:01 +0200] - attrcrypt_unwrap_key: failed to unwrap key for cipher AES
[15/Jun/2010:11:00:01 +0200] - Failed to retrieve key for cipher AES in attrcrypt_cipher_init
[15/Jun/2010:11:00:01 +0200] - Failed to initialize cipher AES in attrcrypt_init
[15/Jun/2010:11:00:01 +0200] - skipping cos definition cn=nsAccountInactivation_cos,dc=directory,dc=intern--no templates found
[15/Jun/2010:11:00:01 +0200] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: data for replica dc=directory,dc=intern was reloaded and it no longer matches the data in the changelog (replica data > changelog). Recreating the changelog file. This could affect replication with replica's consumers in which case the consumers should be reinitialized.
[15/Jun/2010:11:00:01 +0200] - skipping cos definition cn=nsAccountInactivation_cos,dc=directory,dc=intern--no templates found
[15/Jun/2010:11:00:01 +0200] - slapd started.  Listening on All Interfaces port 389 for LDAP requests
[15/Jun/2010:11:00:01 +0200] - Listening on All Interfaces port 636 for LDAPS requests
[15/Jun/2010:11:29:59 +0200] - slapd shutting down - signaling operation threads
[15/Jun/2010:11:29:59 +0200] - slapd shutting down - closing down internal subsystems and plugins
[15/Jun/2010:11:30:00 +0200] - Waiting for 4 database threads to stop
[15/Jun/2010:11:30:00 +0200] - All database threads now stopped
[15/Jun/2010:11:30:00 +0200] - slapd stopped.
        CentOS-Directory/8.1.0 B2009.134.1334
        <host>:<port> (/etc/dirsrv/slapd-serverb55)

[15/Jun/2010:11:30:03 +0200] - Entry "cn=schema" single-valued attribute "modifyTimestamp" has multiple values
        CentOS-Directory/8.1.0 B2009.134.1334
        serverb55:636 (/etc/dirsrv/slapd-serverb55)

[15/Jun/2010:11:30:03 +0200] - CentOS-Directory/8.1.0 B2009.134.1334 starting up
[15/Jun/2010:11:30:03 +0200] - attrcrypt_unwrap_key: failed to unwrap key for cipher AES
[15/Jun/2010:11:30:03 +0200] - Failed to retrieve key for cipher AES in attrcrypt_cipher_init
[15/Jun/2010:11:30:03 +0200] - Failed to initialize cipher AES in attrcrypt_init
[15/Jun/2010:11:30:03 +0200] - attrcrypt_unwrap_key: failed to unwrap key for cipher AES
[15/Jun/2010:11:30:03 +0200] - Failed to retrieve key for cipher AES in attrcrypt_cipher_init
[15/Jun/2010:11:30:03 +0200] - Failed to initialize cipher AES in attrcrypt_init
[15/Jun/2010:11:30:03 +0200] - skipping cos definition cn=nsAccountInactivation_cos,dc=directory,dc=intern--no templates found
[15/Jun/2010:11:30:03 +0200] - skipping cos definition cn=nsAccountInactivation_cos,dc=directory,dc=intern--no templates found
[15/Jun/2010:11:30:03 +0200] - slapd started.  Listening on All Interfaces port 389 for LDAP requests
[15/Jun/2010:11:30:03 +0200] - Listening on All Interfaces port 636 for LDAPS requests
[15/Jun/2010:11:40:44 +0200] - Beginning export of 'userroot'
[15/Jun/2010:11:40:44 +0200] - export userRoot: Processed 139 entries (100%).
[15/Jun/2010:11:40:44 +0200] - Export finished.
[15/Jun/2010:11:46:12 +0200] NSMMReplicationPlugin - multimaster_be_state_change: replica dc=directory,dc=intern is going offline; disabling replication
[15/Jun/2010:11:46:12 +0200] - attrcrypt_unwrap_key: failed to unwrap key for cipher AES
[15/Jun/2010:11:46:12 +0200] - Failed to retrieve key for cipher AES in attrcrypt_cipher_init
[15/Jun/2010:11:46:12 +0200] - Failed to initialize cipher AES in attrcrypt_init
[15/Jun/2010:11:46:12 +0200] - WARNING: Import is running with nsslapd-db-private-import-mem on; No other process is allowed to access the database
[15/Jun/2010:11:46:14 +0200] - libdb: page 1: illegal page type or format
[15/Jun/2010:11:46:14 +0200] - libdb: PANIC: Invalid argument
[15/Jun/2010:11:46:14 +0200] - FATAL ERROR at by MCC ou=people  dc=directory dc=intern (77); server stopping as database recovery needed.
        CentOS-Directory/8.1.0 B2009.134.1334
        <host>:<port> (/etc/dirsrv/slapd-serverb55
======================================================================

Finally my questions :
What could be the cause of the problem ?
What would be the best procedure to get the serverb55 up and running again ?

Thanks for any advise.

Regards,
Mark

 

=======


On Tue, Jun 15, 2010 at 7:04 PM, Rich Megginson <rmeggins@redhat.com> wrote:
mark benschop wrote:
> Hi All,
>
> I'm having a problem on a CentOs Directory Server 8.1 multiple master
> setup.
> The database of one of the servers has been marked as corrupt and has
> been brought offline by the Directory Server.
Can you post any relevant error messages from the error log of the server?
> Ldapclients querying the ldapserver for e.g. loggin in of users get an
> errormessage, effectively disabling users to log in.
What error message?
>
> I'm wondering what the best method is to recover from this situation.
> I can think of a few :
> 1) Starting the ldapserver, deleting the database, recreating it and
> restoring a backup.

> 2) Starting the ldapserver, deleting the database and reinitialising
> the server from the other master.
If you reinitialize the problem server from another server, you don't
need to delete the database, reinit will do that for you.
>
> Can anyone give me some hints if this wil work or would another
> approach be better ?
>
> Thanks for your advise,
> Mark
> ------------------------------------------------------------------------
>
> --
> 389 users mailing list
> 389-users@lists.fedoraproject.org
> https://admin.fedoraproject.org/mailman/listinfo/389-users

--
389 users mailing list
389-users@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/389-users