Hi David,
thanks for answer,
Am 13.12.2018 um 20:53 schrieb David Boreham:
On 12/13/2018 12:30 PM, Jan Kowalsky wrote:
after dirsrv crashed and trying to restart, I got the following errors and dirsrv doesn't start at all:
[13/Dec/2018:20:17:28 +0100] - 389-Directory/1.3.3.5 B2018.298.1116 starting up [13/Dec/2018:20:17:28 +0100] - Detected Disorderly Shutdown last time Directory Server was running, recovering database. [13/Dec/2018:20:17:29 +0100] - libdb: BDB3017 unable to allocate space from the buffer cache
^^^^^^^^^^^^ This looks to be where the train goes off the rails. Everything below is just smoke and flames that results.
Actually I am wondering : why did the process even continue running after seeing a fatal error. I think that's a bug. It should have just exited at that point?
[13/Dec/2018:20:17:29 +0100] - libdb: BDB1521 Recovery function for LSN 6120 6259890 failed [13/Dec/2018:20:17:29 +0100] - libdb: BDB0061 PANIC: Cannot allocate memory [13/Dec/2018:20:17:29 +0100] - libdb: BDB1546 unable to join the environment [13/Dec/2018:20:17:29 +0100] - Database Recovery Process FAILED. The database is not recoverable. err=-30973: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery [13/Dec/2018:20:17:29 +0100] - Please make sure there is enough disk space for dbcache (400000 bytes) and db region files
Any idea what to do?
First thing to do is to determine if this is a case of a system that worked in the past, and now doesn't.
yes. It run for month - and didn't restart today.
If so, ask what you changed that might have broken it (e.g. config change). If this is a new deployment that never worked, then I'd recommend running the ns-slapd process under strace to see what syscalls it is making, then figure out which one fails that might correspond to the "out of memory" condition in userspace.
Well, we just added a new database on runtime which worked fine - 389ds was still running. After changing a replica I wanted to restart and resulted in the error.
Also try turning up the logging verbosity to the max. From memory the
How can I achive this? In dse.ldif I have:
nsslapd-errorlog-level: 32768
cache sizing code might print out its selected sizes. There may be other useful debug output you get. You don't need to look at anything in the resulting log after that fatal memory allocation error I cited above.
There is plenty of disk-space and 2GB Ram
Hmm...2G ram is very small fwiw, although obviously bigger than the machines we originally ran the DS on in the late 90's.
I increased to 3.5 GB (more I don't have at the moment in the virtualisation host. But still the same.
There's always the possibility that something in the cache auto-sizing is just wrong for very small memory machines. I think it does some grokking of the physical memory size then tries to "auto-size" the caches accordingly. There may even be some issue where the physical memory size it gets is from the VM host, not the VM (so it would be horribly wrong).
I don't assume - since it worked all the time... What I could imagine is that cangelogdb files had been smaller last reboot - so any memory limit didn't took effect.
I already have in /etc/dirsrv.systed:
[Service] # uncomment this line to raise the file descriptor limit LimitNOFILE=10240 LimitCORE=infinity
Kind regards Jan