On 08/02/2013 01:30 AM, Manel Gimeno Zaragozá wrote:
Hello,

Yesterday afternoon my LDAP server crashed without doing any modification, just consulting.

My environment is an openvz container:

# cat /etc/issue
CentOS release 6.4 (Final)
Kernel \r on an \m

# uname -a
Linux newldap.test.es 2.6.32-042stab053.5 #1 SMP Tue Mar 27 11:42:17 MSD 2012 x86_64 x86_64 x86_64 GNU/Linux

# rpm -qa | grep 389
389-console-1.1.7-1.el6.noarch
389-admin-1.1.29-1.el6.x86_64
389-admin-console-1.1.8-1.el6.noarch
389-ds-base-1.2.11.15-14.el6_4.x86_64
389-ds-console-1.2.6-1.el6.noarch
389-ds-base-libs-1.2.11.15-14.el6_4.x86_64
389-ds-base-devel-1.2.11.15-14.el6_4.x86_64
389-adminutil-1.1.15-1.el6.x86_64


Please find below the log when it crashes:

errors:10:[01/Aug/2013:14:09:44 +0200] configure_pr_socket - Unable to move socket file descriptor 62 above 64: OS error 24 (Too many open files)
errors:11:[01/Aug/2013:14:09:44 +0200] configure_pr_socket - Unable to move socket file descriptor 63 above 64: OS error 24 (Too many open files)
errors:12:[01/Aug/2013:14:09:44 +0200] - PR_Accept() failed, Netscape Portable Runtime error -5971 (Process open FD table is full.)
errors:13:[01/Aug/2013:14:09:44 +0200] - PR_Accept() failed, Netscape Portable Runtime error -5971 (Process open FD table is full.)
errors:14:[01/Aug/2013:14:09:44 +0200] - PR_Accept() failed, Netscape Portable Runtime error -5971 (Process open FD table is full.)
...
[01/Aug/2013:14:10:01 +0200] - PR_Accept() failed, Netscape Portable Runtime error -5971 (Process open FD table is full.)
[01/Aug/2013:14:10:01 +0200] - PR_Accept() failed, Netscape Portable Runtime error -5971 (Process open FD table is full.)
[01/Aug/2013:14:10:01 +0200] - libdb: PANIC: fatal region error detected; run recovery
[01/Aug/2013:14:10:01 +0200] - Serious Error---Failed in deadlock detect (aborted at 0x0), err=-30974 (DB_RUNRECOVERY: Fatal error, run database recovery)
[01/Aug/2013:14:10:01 +0200] - PR_Accept() failed, Netscape Portable Runtime error -5971 (Process open FD table is full.)
[01/Aug/2013:14:10:01 +0200] - PR_Accept() failed, Netscape Portable Runtime error -5971 (Process open FD table is full.)
...
[01/Aug/2013:14:12:27 +0200] - libdb: PANIC: fatal region error detected; run recovery
[01/Aug/2013:14:12:27 +0200] - Serious Error---Failed in deadlock detect (aborted at 0x0), err=-30974 (DB_RUNRECOVERY: Fatal error, run database recovery)
[01/Aug/2013:14:12:28 +0200] - libdb: PANIC: fatal region error detected; run recovery
[01/Aug/2013:14:12:28 +0200] - Serious Error---Failed in deadlock detect (aborted at 0x0), err=-30974 (DB_RUNRECOVERY: Fatal error, run database recovery)
[01/Aug/2013:14:12:28 +0200] - libdb: PANIC: fatal region error detected; run recovery
[01/Aug/2013:14:12:28 +0200] - Serious Error---Failed to checkpoint database, err=-30974 (DB_RUNRECOVERY: Fatal error, run database recovery)

when I noticed that the ldap had crashed, I tried to restart the process and the following errors show up:

[01/Aug/2013:14:20:15 +0200] - 389-Directory/1.2.11.15 B2013.105.2259 starting up
[01/Aug/2013:14:20:15 +0200] - WARNING: userRoot: entry cache size 10485760B is less than db size 14729216B; We recommend to increase the entry cache size nsslapd-cachememsize.
[01/Aug/2013:14:20:15 +0200] - libdb: PANIC: fatal region error detected; run recovery
[01/Aug/2013:14:20:15 +0200] - Opening database environment (/var/lib/dirsrv/slapd-ldap_kolab/db) failed. err=-30974: DB_RUNRECOVERY: Fatal error, run database recovery
[01/Aug/2013:14:20:15 +0200] - start: Failed to init database, err=-30974 DB_RUNRECOVERY: Fatal error, run database recovery
[01/Aug/2013:14:20:15 +0200] - Failed to start database plugin ldbm database
[01/Aug/2013:14:20:15 +0200] - WARNING: ldbm instance userRoot already exists
[01/Aug/2013:14:20:15 +0200] - ldbm_config_read_instance_entries: failed to add instance entry cn=userRoot,cn=ldbm database,cn=plugins,cn=config
[01/Aug/2013:14:20:15 +0200] - ldbm_config_load_dse_info: failed to read instance entries
[01/Aug/2013:14:20:15 +0200] - start: Loading database configuration failed
[01/Aug/2013:14:20:15 +0200] - Failed to start database plugin ldbm database
[01/Aug/2013:14:20:15 +0200] - Error: Failed to resolve plugin dependencies
[01/Aug/2013:14:20:15 +0200] - Error: preoperation plugin 7-bit check is not started
[01/Aug/2013:14:20:15 +0200] - Error: preoperation plugin Account Usability Plugin is not started
[01/Aug/2013:14:20:15 +0200] - Error: accesscontrol plugin ACL Plugin is not started
[01/Aug/2013:14:20:15 +0200] - Error: preoperation plugin ACL preoperation is not started
[01/Aug/2013:14:20:15 +0200] - Error: preoperation plugin Auto Membership Plugin is not started
[01/Aug/2013:14:20:15 +0200] - Error: object plugin Class of Service is not started
[01/Aug/2013:14:20:15 +0200] - Error: preoperation plugin deref is not started
[01/Aug/2013:14:20:15 +0200] - Error: preoperation plugin HTTP Client is not started
[01/Aug/2013:14:20:15 +0200] - Error: database plugin ldbm database is not started
[01/Aug/2013:14:20:15 +0200] - Error: object plugin Legacy Replication Plugin is not started
[01/Aug/2013:14:20:15 +0200] - Error: preoperation plugin Linked Attributes is not started
[01/Aug/2013:14:20:15 +0200] - Error: preoperation plugin Managed Entries is not started
[01/Aug/2013:14:20:15 +0200] - Error: object plugin Multimaster Replication Plugin is not started
[01/Aug/2013:14:20:15 +0200] - Error: object plugin Roles Plugin is not started
[01/Aug/2013:14:20:15 +0200] - Error: object plugin Views is not started

Can any one help me?

Looks like it did not recovery gracefully from running out of file descriptors.  You should increase the number of file descriptors.
http://port389.org/wiki/Performance_Tuning#Linux


Thanks for your help

Manel


--
389 users mailing list
389-users@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/389-users