Hi all,
after dirsrv crashed and trying to restart, I got the following errors and dirsrv doesn't start at all:
[13/Dec/2018:20:17:28 +0100] - 389-Directory/1.3.3.5 B2018.298.1116 starting up [13/Dec/2018:20:17:28 +0100] - Detected Disorderly Shutdown last time Directory Server was running, recovering database. [13/Dec/2018:20:17:29 +0100] - libdb: BDB3017 unable to allocate space from the buffer cache [13/Dec/2018:20:17:29 +0100] - libdb: BDB1521 Recovery function for LSN 6120 6259890 failed [13/Dec/2018:20:17:29 +0100] - libdb: BDB0061 PANIC: Cannot allocate memory [13/Dec/2018:20:17:29 +0100] - libdb: BDB1546 unable to join the environment [13/Dec/2018:20:17:29 +0100] - Database Recovery Process FAILED. The database is not recoverable. err=-30973: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery [13/Dec/2018:20:17:29 +0100] - Please make sure there is enough disk space for dbcache (400000 bytes) and db region files [13/Dec/2018:20:17:29 +0100] - start: Failed to init database, err=-30973 BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery [13/Dec/2018:20:17:29 +0100] - Failed to start database plugin ldbm database [13/Dec/2018:20:17:29 +0100] - WARNING: cache too small, increasing to 500K bytes [13/Dec/2018:20:17:29 +0100] - WARNING: ldbm instance www_local already exists [13/Dec/2018:20:17:29 +0100] - ldbm_config_read_instance_entries: failed to add instance entry cn=www_local,cn=ldbm database,cn=plugins,cn=config [13/Dec/2018:20:17:29 +0100] - ldbm_config_load_dse_info: failed to read instance entries [13/Dec/2018:20:17:29 +0100] - start: Loading database configuration failed [13/Dec/2018:20:17:29 +0100] - Failed to start database plugin ldbm database [13/Dec/2018:20:17:29 +0100] - Error: Failed to resolve plugin dependencies [13/Dec/2018:20:17:29 +0100] - Error: betxnpreoperation plugin 7-bit check is not started [13/Dec/2018:20:17:29 +0100] - Error: object plugin Account Policy Plugin is not started [13/Dec/2018:20:17:29 +0100] - Error: preoperation plugin Account Usability Plugin is not started [13/Dec/2018:20:17:29 +0100] - Error: accesscontrol plugin ACL Plugin is not started [13/Dec/2018:20:17:29 +0100] - Error: preoperation plugin ACL preoperation is not started [13/Dec/2018:20:17:29 +0100] - Error: betxnpreoperation plugin attribute uniqueness is not started [13/Dec/2018:20:17:29 +0100] - Error: betxnpreoperation plugin Auto Membership Plugin is not started [13/Dec/2018:20:17:29 +0100] - Error: object plugin Class of Service is not started [13/Dec/2018:20:17:29 +0100] - Error: preoperation plugin deref is not started [13/Dec/2018:20:17:29 +0100] - Error: preoperation plugin HTTP Client is not started [13/Dec/2018:20:17:29 +0100] - Error: database plugin ldbm database is not started [13/Dec/2018:20:17:29 +0100] - Error: object plugin Legacy Replication Plugin is not started [13/Dec/2018:20:17:29 +0100] - Error: betxnpreoperation plugin Linked Attributes is not started [13/Dec/2018:20:17:29 +0100] - Error: betxnpreoperation plugin Managed Entries is not started [13/Dec/2018:20:17:29 +0100] - Error: object plugin Multimaster Replication Plugin is not started [13/Dec/2018:20:17:29 +0100] - Error: betxnpostoperation plugin referential integrity postoperation is not started [13/Dec/2018:20:17:29 +0100] - Error: object plugin Roles Plugin is not started [13/Dec/2018:20:17:29 +0100] - Error: object plugin Views is not started [13/Dec/2018:20:17:29 +0100] - Error: extendedop plugin whoami is not started
Any idea what to do?
There is plenty of disk-space and 2GB Ram
Thanks and best regards Jan
On 12/13/2018 12:30 PM, Jan Kowalsky wrote:
after dirsrv crashed and trying to restart, I got the following errors and dirsrv doesn't start at all:
[13/Dec/2018:20:17:28 +0100] - 389-Directory/1.3.3.5 B2018.298.1116 starting up [13/Dec/2018:20:17:28 +0100] - Detected Disorderly Shutdown last time Directory Server was running, recovering database. [13/Dec/2018:20:17:29 +0100] - libdb: BDB3017 unable to allocate space from the buffer cache
^^^^^^^^^^^^ This looks to be where the train goes off the rails. Everything below is just smoke and flames that results.
Actually I am wondering : why did the process even continue running after seeing a fatal error. I think that's a bug. It should have just exited at that point?
[13/Dec/2018:20:17:29 +0100] - libdb: BDB1521 Recovery function for LSN 6120 6259890 failed [13/Dec/2018:20:17:29 +0100] - libdb: BDB0061 PANIC: Cannot allocate memory [13/Dec/2018:20:17:29 +0100] - libdb: BDB1546 unable to join the environment [13/Dec/2018:20:17:29 +0100] - Database Recovery Process FAILED. The database is not recoverable. err=-30973: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery [13/Dec/2018:20:17:29 +0100] - Please make sure there is enough disk space for dbcache (400000 bytes) and db region files
Any idea what to do?
First thing to do is to determine if this is a case of a system that worked in the past, and now doesn't. If so, ask what you changed that might have broken it (e.g. config change). If this is a new deployment that never worked, then I'd recommend running the ns-slapd process under strace to see what syscalls it is making, then figure out which one fails that might correspond to the "out of memory" condition in userspace.
Also try turning up the logging verbosity to the max. From memory the cache sizing code might print out its selected sizes. There may be other useful debug output you get. You don't need to look at anything in the resulting log after that fatal memory allocation error I cited above.
There is plenty of disk-space and 2GB Ram
Hmm...2G ram is very small fwiw, although obviously bigger than the machines we originally ran the DS on in the late 90's. There's always the possibility that something in the cache auto-sizing is just wrong for very small memory machines. I think it does some grokking of the physical memory size then tries to "auto-size" the caches accordingly. There may even be some issue where the physical memory size it gets is from the VM host, not the VM (so it would be horribly wrong).
Hi David,
thanks for answer,
Am 13.12.2018 um 20:53 schrieb David Boreham:
On 12/13/2018 12:30 PM, Jan Kowalsky wrote:
after dirsrv crashed and trying to restart, I got the following errors and dirsrv doesn't start at all:
[13/Dec/2018:20:17:28 +0100] - 389-Directory/1.3.3.5 B2018.298.1116 starting up [13/Dec/2018:20:17:28 +0100] - Detected Disorderly Shutdown last time Directory Server was running, recovering database. [13/Dec/2018:20:17:29 +0100] - libdb: BDB3017 unable to allocate space from the buffer cache
^^^^^^^^^^^^ This looks to be where the train goes off the rails. Everything below is just smoke and flames that results.
Actually I am wondering : why did the process even continue running after seeing a fatal error. I think that's a bug. It should have just exited at that point?
[13/Dec/2018:20:17:29 +0100] - libdb: BDB1521 Recovery function for LSN 6120 6259890 failed [13/Dec/2018:20:17:29 +0100] - libdb: BDB0061 PANIC: Cannot allocate memory [13/Dec/2018:20:17:29 +0100] - libdb: BDB1546 unable to join the environment [13/Dec/2018:20:17:29 +0100] - Database Recovery Process FAILED. The database is not recoverable. err=-30973: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery [13/Dec/2018:20:17:29 +0100] - Please make sure there is enough disk space for dbcache (400000 bytes) and db region files
Any idea what to do?
First thing to do is to determine if this is a case of a system that worked in the past, and now doesn't.
yes. It run for month - and didn't restart today.
If so, ask what you changed that might have broken it (e.g. config change). If this is a new deployment that never worked, then I'd recommend running the ns-slapd process under strace to see what syscalls it is making, then figure out which one fails that might correspond to the "out of memory" condition in userspace.
Well, we just added a new database on runtime which worked fine - 389ds was still running. After changing a replica I wanted to restart and resulted in the error.
Also try turning up the logging verbosity to the max. From memory the
How can I achive this? In dse.ldif I have:
nsslapd-errorlog-level: 32768
cache sizing code might print out its selected sizes. There may be other useful debug output you get. You don't need to look at anything in the resulting log after that fatal memory allocation error I cited above.
There is plenty of disk-space and 2GB Ram
Hmm...2G ram is very small fwiw, although obviously bigger than the machines we originally ran the DS on in the late 90's.
I increased to 3.5 GB (more I don't have at the moment in the virtualisation host. But still the same.
There's always the possibility that something in the cache auto-sizing is just wrong for very small memory machines. I think it does some grokking of the physical memory size then tries to "auto-size" the caches accordingly. There may even be some issue where the physical memory size it gets is from the VM host, not the VM (so it would be horribly wrong).
I don't assume - since it worked all the time... What I could imagine is that cangelogdb files had been smaller last reboot - so any memory limit didn't took effect.
I already have in /etc/dirsrv.systed:
[Service] # uncomment this line to raise the file descriptor limit LimitNOFILE=10240 LimitCORE=infinity
Kind regards Jan
On 12/13/2018 1:37 PM, Jan Kowalsky wrote:
Well, we just added a new database on runtime which worked fine - 389ds was still running. After changing a replica I wanted to restart and resulted in the error.
Also try turning up the logging verbosity to the max. From memory the
How can I achive this? In dse.ldif I have:
nsslapd-errorlog-level: 32768
The details are here : https://access.redhat.com/documentation/en-US/Red_Hat_Directory_Server/8.2/h... but I'd try 65535. That will get you everything useful, I think.
I don't assume - since it worked all the time... What I could imagine is that cangelogdb files had been smaller last reboot - so any memory limit didn't took effect.
It could be something like : the VM host changed (guest may have been migrated live) such that the physical memory is much larger. This combined with situation I mentioned earlier where the cache size is computed from the host physical memory not the guest might explain the symptoms. I'd definitely look at a) cache auto size (should be printed in the log somewhere, and you can just disable it by configuring fixed size caches that are appropriate in size) and b) strace the process to see why it is failing -- for example you may see an sbrk() call for a zillion bytes or an mmap() call for a huge region, that fails. I think strace might have an option to log only failing syscalls.
Am 13.12.2018 um 21:44 schrieb David Boreham:
It could be something like : the VM host changed (guest may have been migrated live) such that the physical memory is much larger. This combined with situation I mentioned earlier where the cache size is computed from the host physical memory not the guest might explain the symptoms. I'd definitely look at a) cache auto size (should be printed
Nothing of this happened. The host machine didn't change for month. No migration. Everything as before.
in the log somewhere, and you can just disable it by configuring fixed
We are on Version 1.3.3.5-4 - I think there isn't this autoconfig feature yet. Am I right? We are on debian jessie since it's part of an kolab environment.
size caches that are appropriate in size) and b) strace the process to see why it is failing -- for example you may see an sbrk() call for a zillion bytes or an mmap() call for a huge region, that fails. I think strace might have an option to log only failing syscalls.
Before struggling with this, I tried upgrading 389-ds in a snapshot: After upgrade to 1.3.5.17-2 dirsrv starts again. Migration of the databases and config worked.
Hm, I'll look deeper into tomorrow.
Thanks a lot for now!
Regards Jan
On 12/13/2018 2:44 PM, Jan Kowalsky wrote:
Before struggling with this, I tried upgrading 389-ds in a snapshot: After upgrade to 1.3.5.17-2 dirsrv starts again. Migration of the databases and config worked.
I'll make a bet that this is unrelated (sometimes it works, sometimes it doesn't), but I guess cross fingers and hope it keeps working!
389-users@lists.fedoraproject.org