Hi all,
suddenly one of our ldap-servers crashed and don't restart.
When restarting dirsrv we find in logs:
libdb: BDB2034 unable to allocate memory for mutex; resize mutex region mmap in opening database environment failed trying to allocate 500000 bytes. (OS err 12 - Cannot allocate memory)
Same error, if we run dbverify.
We are running version 3.5.17 of 389-ds on debian stretch:
389-ds 1.3.5.17-2
Ram doesn't seem to be the problem. Only 200 MB of 4GB is used.
The server is part of a replicated cluster. Other servers (running same software version - more or less on the same virtualisation hardware) are not affected.
But we got similar errors also some times in the past. But restarting the service was always possible.
Any ideas?
Thanks and kind regards Jan
Hi,
Tuning of DBD #mutex is not possible and BDB uses a default value based on #hash buckets. This error is quite rare and I have no explanation why it happened in your deployment.
Could you share the DB tuning entry (cn=config,cn=ldbm database,cn=plugins,cn=config). Also looking at the access/error logs can you identify some operations that contributed to this error ?
best regards thierry
On 10/7/20 9:39 AM, Jan Kowalsky wrote:
Hi all,
suddenly one of our ldap-servers crashed and don't restart.
When restarting dirsrv we find in logs:
libdb: BDB2034 unable to allocate memory for mutex; resize mutex region mmap in opening database environment failed trying to allocate 500000 bytes. (OS err 12 - Cannot allocate memory)
Same error, if we run dbverify.
We are running version 3.5.17 of 389-ds on debian stretch:
389-ds 1.3.5.17-2
Ram doesn't seem to be the problem. Only 200 MB of 4GB is used.
The server is part of a replicated cluster. Other servers (running same software version - more or less on the same virtualisation hardware) are not affected.
But we got similar errors also some times in the past. But restarting the service was always possible.
Any ideas?
Thanks and kind regards Jan _______________________________________________ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject....
libdb: BDB2034 unable to allocate memory for mutex; resize mutex region mmap in opening database environment failed trying to allocate 500000 bytes. (OS err 12 - Cannot allocate memory)
One observation: this is a mmap() call failure, not an ordinary "OOM" situation.
Some googling suggests that it shows up across multiple BDB-based products and is not specific to DS, and hasn't been properly diagnosed anywhere (lots of "try it again and see if it goes away").
Since you have the problem reliably reproducable, the best idea I have is to run the server under strace in order to see what system calls it is making before it goes off the rails. That might shed some light. Perhaps it's miscalculating the region size for example, and asking for a mapped segment bigger than the kernel is configured to allow, something like that.
There is a bug filed with Oracle on this : https://support.oracle.com/knowledge/More%20Applications%20and%20Technologie... but it seems to be a bug requiring $$$ to access.
On 7 Oct 2020, at 17:39, Jan Kowalsky jankow@datenkollektiv.net wrote:
Hi all,
suddenly one of our ldap-servers crashed and don't restart.
When restarting dirsrv we find in logs:
libdb: BDB2034 unable to allocate memory for mutex; resize mutex region mmap in opening database environment failed trying to allocate 500000 bytes. (OS err 12 - Cannot allocate memory)
Same error, if we run dbverify.
We are running version 3.5.17 of 389-ds on debian stretch:
389-ds 1.3.5.17-2
Ram doesn't seem to be the problem. Only 200 MB of 4GB is used.
The server is part of a replicated cluster. Other servers (running same software version - more or less on the same virtualisation hardware) are not affected.
But we got similar errors also some times in the past. But restarting the service was always possible.
Any ideas?
Given this is mmap and not malloc, is it possible you are hitting something like vm.max_map_count? I'm not sure what memory chunk size it's allocating but you could increase this parameter to see if that makes space for your mmap calls to function.
The other things to check are ulimits and cgroups if you have any of those limits set in your system,
Hope that helps,
— Sincerely,
William Brown
Senior Software Engineer, 389 Directory Server SUSE Labs, Australia
Hey,
thanks so much for your answers.
When restarting dirsrv we find in logs:
libdb: BDB2034 unable to allocate memory for mutex; resize mutex region mmap in opening database environment failed trying to allocate 500000 bytes. (OS err 12 - Cannot allocate memory)
Same error, if we run dbverify.
We are running version 3.5.17 of 389-ds on debian stretch:
389-ds 1.3.5.17-2
Ram doesn't seem to be the problem. Only 200 MB of 4GB is used.
I started with strace - but there are no actionable messages: I get a schema error - but this is not causal (it has to be fixed anyway...):
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 rt_sigprocmask(SIG_BLOCK, [INT CHLD], [], 8) = 0 rt_sigprocmask(SIG_BLOCK, [CHLD], [INT CHLD], 8) = 0 rt_sigprocmask(SIG_SETMASK, [INT CHLD], NULL, 8) = 0 clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f851e3c69d0) = 27590 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 rt_sigaction(SIGINT, {sa_handler=0x449930, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f851da16060}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f851da16060}, 8) = 0 wait4(-1, [09/Oct/2020:10:27:10.365741323 +0200] attr_syntax_create - Error: the EQUALITY matching rule [caseIgnoreIA5Match] is not compatible with the syntax [1.3.6.1.4.1.1466.115.121.1.15] for the attribute [dknFasPickupRule] [09/Oct/2020:10:27:10.420693888 +0200] attr_syntax_create - Error: the SUBSTR matching rule [caseIgnoreIA5SubstringsMatch] is not compatible with the syntax [1.3.6.1.4.1.1466.115.121.1.15] for the attribute [dknFasPickupRule] 0x7ffffeb57b60, 0, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set) --- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} --- wait4(-1, [09/Oct/2020:10:27:11.606290855 +0200] libdb: BDB2034 unable to allocate memory for mutex; resize mutex region [09/Oct/2020:10:27:12.331303940 +0200] mmap in opening database environment failed trying to allocate 500000 bytes. (OS err 12 - Cannot allocate memory) [09/Oct/2020:10:27:12.339630631 +0200] verify DB - dbverify: Failed to init database [{WIFEXITED(s) && WEXITSTATUS(s) == 1}], 0, NULL) = 27590
Given this is mmap and not malloc, is it possible you are hitting something like vm.max_map_count? I'm not sure what memory chunk size it's allocating but you could increase this parameter to see if that makes space for your mmap calls to function.
The other things to check are ulimits and cgroups if you have any of those limits set in your system,
Also what I did: checked vm.max_map_count (increased to vm.max_map_count = 524288) ulimit (unlimited)
Without success.
Could you share the DB tuning entry (cn=config,cn=ldbm database,cn=plugins,cn=config). Also looking at the access/error logs can you identify some operations that contributed to this error ?
My DB tuning entries:
dn: cn=config,cn=ldbm database,cn=plugins,cn=config objectClass: top objectClass: extensibleObject cn: config nsslapd-lookthroughlimit: 5000 nsslapd-mode: 600 nsslapd-idlistscanlimit: 4000 nsslapd-directory: /var/lib/dirsrv/slapd-ldap1/db nsslapd-dbcachesize: 500000 nsslapd-db-logdirectory: /var/lib/dirsrv/slapd-ldap1/db nsslapd-db-durable-transaction: on nsslapd-db-checkpoint-interval: 60 nsslapd-db-compactdb-interval: 2592000 nsslapd-db-transaction-batch-val: 0 nsslapd-db-transaction-batch-min-wait: 50 nsslapd-db-transaction-batch-max-wait: 50 nsslapd-db-logbuf-size: 0 nsslapd-db-locks: 10000 nsslapd-db-private-import-mem: on nsslapd-import-cache-autosize: -1 nsslapd-import-cachesize: 0 nsslapd-idl-switch: new nsslapd-search-bypass-filter-test: on nsslapd-search-use-vlv-index: on nsslapd-exclude-from-export: entrydn entryid dncomp parentid numSubordinates t ombstonenumsubordinates entryusn nsslapd-serial-lock: on nsslapd-subtree-rename-switch: on nsslapd-pagedlookthroughlimit: 0 nsslapd-pagedidlistscanlimit: 0 nsslapd-rangelookthroughlimit: 5000 nsslapd-backend-opt-level: 1 nsslapd-db-deadlock-policy: 9 numSubordinates: 1
It doesn't matter what value I use for nsslapd-dbcachesize. It's always exactly the size which is referenced in the error message: "failed trying to allocate .... bytes".
Since we have replication and an other ldap which is up to date, I just reverted the server to an earlier snapshot state where dirsrv started without problems. I did this already one or two years ago. But of course it would be nice to know what's actual the problem.
Thanks and Regards Jan
On 10/9/20 11:10 AM, Jan Kowalsky wrote:
Hey,
thanks so much for your answers.
When restarting dirsrv we find in logs:
libdb: BDB2034 unable to allocate memory for mutex; resize mutex region mmap in opening database environment failed trying to allocate 500000 bytes. (OS err 12 - Cannot allocate memory)
Same error, if we run dbverify.
We are running version 3.5.17 of 389-ds on debian stretch:
389-ds 1.3.5.17-2
Ram doesn't seem to be the problem. Only 200 MB of 4GB is used.
I started with strace - but there are no actionable messages: I get a schema error - but this is not causal (it has to be fixed anyway...):
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 rt_sigprocmask(SIG_BLOCK, [INT CHLD], [], 8) = 0 rt_sigprocmask(SIG_BLOCK, [CHLD], [INT CHLD], 8) = 0 rt_sigprocmask(SIG_SETMASK, [INT CHLD], NULL, 8) = 0 clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f851e3c69d0) = 27590 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 rt_sigaction(SIGINT, {sa_handler=0x449930, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f851da16060}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f851da16060}, 8) = 0 wait4(-1, [09/Oct/2020:10:27:10.365741323 +0200] attr_syntax_create - Error: the EQUALITY matching rule [caseIgnoreIA5Match] is not compatible with the syntax [1.3.6.1.4.1.1466.115.121.1.15] for the attribute [dknFasPickupRule] [09/Oct/2020:10:27:10.420693888 +0200] attr_syntax_create - Error: the SUBSTR matching rule [caseIgnoreIA5SubstringsMatch] is not compatible with the syntax [1.3.6.1.4.1.1466.115.121.1.15] for the attribute [dknFasPickupRule] 0x7ffffeb57b60, 0, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
This schema error will prevent the startup but does not explain the DB error. You may fix schema either defining dknFasPickupRule with syntax1.3.6.1.4.1.1466.115.121.1.26, or switching MR to EQUALITY caseIgnoreMatch / SUBSTR caseIgnoreSubstringsMatch.
Any other errors in error logs ?
--- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} --- wait4(-1, [09/Oct/2020:10:27:11.606290855 +0200] libdb: BDB2034 unable to allocate memory for mutex; resize mutex region [09/Oct/2020:10:27:12.331303940 +0200] mmap in opening database environment failed trying to allocate 500000 bytes. (OS err 12 - Cannot allocate memory) [09/Oct/2020:10:27:12.339630631 +0200] verify DB - dbverify: Failed to init database [{WIFEXITED(s) && WEXITSTATUS(s) == 1}], 0, NULL) = 27590
Given this is mmap and not malloc, is it possible you are hitting something like vm.max_map_count? I'm not sure what memory chunk size it's allocating but you could increase this parameter to see if that makes space for your mmap calls to function.
The other things to check are ulimits and cgroups if you have any of those limits set in your system,
Also what I did: checked vm.max_map_count (increased to vm.max_map_count = 524288) ulimit (unlimited)
Without success.
Could you share the DB tuning entry (cn=config,cn=ldbm database,cn=plugins,cn=config). Also looking at the access/error logs can you identify some operations that contributed to this error ?
My DB tuning entries:
dn: cn=config,cn=ldbm database,cn=plugins,cn=config objectClass: top objectClass: extensibleObject cn: config nsslapd-lookthroughlimit: 5000 nsslapd-mode: 600 nsslapd-idlistscanlimit: 4000 nsslapd-directory: /var/lib/dirsrv/slapd-ldap1/db
Any AVC when ns-slapd access /var/lib/dirsrv/slapd-ldap1/db?
nsslapd-dbcachesize: 500000 nsslapd-db-logdirectory: /var/lib/dirsrv/slapd-ldap1/db nsslapd-db-durable-transaction: on nsslapd-db-checkpoint-interval: 60 nsslapd-db-compactdb-interval: 2592000 nsslapd-db-transaction-batch-val: 0 nsslapd-db-transaction-batch-min-wait: 50 nsslapd-db-transaction-batch-max-wait: 50 nsslapd-db-logbuf-size: 0 nsslapd-db-locks: 10000 nsslapd-db-private-import-mem: on nsslapd-import-cache-autosize: -1 nsslapd-import-cachesize: 0 nsslapd-idl-switch: new nsslapd-search-bypass-filter-test: on nsslapd-search-use-vlv-index: on nsslapd-exclude-from-export: entrydn entryid dncomp parentid numSubordinates t ombstonenumsubordinates entryusn nsslapd-serial-lock: on nsslapd-subtree-rename-switch: on nsslapd-pagedlookthroughlimit: 0 nsslapd-pagedidlistscanlimit: 0 nsslapd-rangelookthroughlimit: 5000 nsslapd-backend-opt-level: 1 nsslapd-db-deadlock-policy: 9 numSubordinates: 1
It doesn't matter what value I use for nsslapd-dbcachesize. It's always exactly the size which is referenced in the error message: "failed trying to allocate .... bytes".
Since we have replication and an other ldap which is up to date, I just reverted the server to an earlier snapshot state where dirsrv started without problems. I did this already one or two years ago. But of course it would be nice to know what's actual the problem.
Thanks and Regards Jan
389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject....
On 10/9/2020 3:10 AM, Jan Kowalsky wrote:
I started with strace - but there are no actionable messages: I get a schema error - but this is not causal (it has to be fixed anyway...):
Try adding the -f flag to strace. Sometimes the target process forks and you only get output from the parent.
There should at least have been one call to mmap() in the strace output.
389-users@lists.fedoraproject.org