Hi I am working on large Directory Server topology, which is reaching very fast the amount of available locks in BDB ( cf https://bugzilla.redhat.com/show_bug.cgi?id=1831812 )
- Can the planned switch in 389-ds-base-1.4.next to LMDB help for such cases ? ( Especially after reading "The database structure is multi-versioned so readers run with no locks" on http://www.lmdb.tech/doc/index.html ) - Is the switch to lmdb planned in an amount of quartals or years? I know nobody likes to put dates on roadmaps, it's just that having a rough estimate would help me to know how much effort I should put into tuning BDB right now.
Thank you
On 6/23/2020 9:34 AM, Emmanuel Kasprzyk wrote:
I am working on large Directory Server topology, which is reaching very fast the amount of available locks in BDB ( cf https://bugzilla.redhat.com/show_bug.cgi?id=1831812 )
- Can the planned switch in 389-ds-base-1.4.next to LMDB help for such
cases ? ( Especially after reading "The database structure is multi-versioned so readers run with no locks" on http://www.lmdb.tech/doc/index.html )
Probably better to fix the bug in DS that cases it to run a long running transaction with repeatable reads isolation?
On 6/23/20 11:42 AM, David Boreham wrote:
On 6/23/2020 9:34 AM, Emmanuel Kasprzyk wrote:
I am working on large Directory Server topology, which is reaching very fast the amount of available locks in BDB ( cf https://bugzilla.redhat.com/show_bug.cgi?id=1831812 )
- Can the planned switch in 389-ds-base-1.4.next to LMDB help for
such cases ?
Yes, we should not see this particular issue with LMDB
( Especially after reading "The database structure is multi-versioned so readers run with no locks" on http://www.lmdb.tech/doc/index.html )
Probably better to fix the bug in DS that cases it to run a long running transaction with repeatable reads isolation?
In 389 what we are seeing is that our backend txn plugins are doing unindexed searches, but I would not call it a bug. It's really a configuration/indexing issue. But yes, there are long running operations/txns in regards to many plugins doing a lot of things while the database is being updated in the same nested operation. Now when these internal searches are properly indexed the db lock issue completely goes away. The bug that Emmanuel is referring to is still under investigation. We have yet to identify the internal unindexed search in this customer's case, but a patch/fix was created that would log the required information we need to track it down and fix the config.
389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject....
On 6/23/2020 10:07 AM, Mark Reynolds wrote:
In 389 what we are seeing is that our backend txn plugins are doing unindexed searches, but I would not call it a bug.
The unindexed search is fine per se (although probably not a great idea if you want the op the plugin hooked to complete quickly).
What's not fine is that all the DB reads under that search should be done in the same transaction with strong isolation.
It's really a configuration/indexing issue. But yes, there are long running operations/txns in regards to many plugins doing a lot of things while the database is being updated in the same nested operation. Now when these internal searches are properly indexed the db lock issue completely goes away.
If missing an index were to result in poor performance, agreed -- it's a configuration issue. The server process exiting seems quite an extreme consequence.
Wondering if this is the result of an old fix for a deadlock problem (bringing the internal op under the main transaction to cure the deadlock)?
How is a regular (non-internal) unindexed search run? Surely that doesn't burn through one lock per page touched?
On 6/23/20 12:22 PM, David Boreham wrote:
On 6/23/2020 10:07 AM, Mark Reynolds wrote:
In 389 what we are seeing is that our backend txn plugins are doing unindexed searches, but I would not call it a bug.
The unindexed search is fine per se (although probably not a great idea if you want the op the plugin hooked to complete quickly).
What's not fine is that all the DB reads under that search should be done in the same transaction with strong isolation.
First I'm not that intimately familiar with this issue, Thierry and Ludwig did most of that investigation. But this happens during a modify operation that triggers some BE txn plugins that do searches and updates under the same parent transaction. So under these conditions is when it just starts consuming a ton of db locks.
Unindexed searches by themselves do not cause this issue, it's when we are updating the database under the same txn. So the mod takes a lock on a db page, then we call the be postop plugins, which in turn starts doing these expensive searches and updates - that is when the db lock issue pops up. I seem to recall from previous similar cases that this "mod update" involved a very large static group, and the RI or memberOf plugin doing its work. Maybe Thierry recalls some of the past cases?
It's really a configuration/indexing issue. But yes, there are long running operations/txns in regards to many plugins doing a lot of things while the database is being updated in the same nested operation. Now when these internal searches are properly indexed the db lock issue completely goes away.
If missing an index were to result in poor performance, agreed -- it's a configuration issue. The server process exiting seems quite an extreme consequence.
It's not exactly crashing, but the db can get corrupted and it needs to be reinitialized. That sounds like a libdb bug to me :-) Running out of db locks should not corrupt the database.
Wondering if this is the result of an old fix for a deadlock problem (bringing the internal op under the main transaction to cure the deadlock)?
Maybe :-) Haven't looked at that code in quite a few years...
How is a regular (non-internal) unindexed search run? Surely that doesn't burn through one lock per page touched?
No it doesn't. See my comment above, standalone unindexed searches do not trigger this issue.
Mark
389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject....
On 23.06.20 19:19, Mark Reynolds wrote:
On 6/23/20 12:22 PM, David Boreham wrote:
On 6/23/2020 10:07 AM, Mark Reynolds wrote:
In 389 what we are seeing is that our backend txn plugins are doing unindexed searches, but I would not call it a bug.
The unindexed search is fine per se (although probably not a great idea if you want the op the plugin hooked to complete quickly).
What's not fine is that all the DB reads under that search should be done in the same transaction with strong isolation.
First I'm not that intimately familiar with this issue, Thierry and Ludwig did most of that investigation. But this happens during a modify operation that triggers some BE txn plugins that do searches and updates under the same parent transaction. So under these conditions is when it just starts consuming a ton of db locks.
If a transactional operation (eg modify) triggers a search by a plugin it already holds a coupke of page locks as write locks. If the search would try to access this pages without using the txn of the parent it would have to wait - and the whole operation would self deadlock. So all db accesses inside a txn need to use this txn directly or as a paent txn.
Unindexed searches by themselves do not cause this issue, it's when we are updating the database under the same txn. So the mod takes a lock on a db page, then we call the be postop plugins, which in turn starts doing these expensive searches and updates - that is when the db lock issue pops up. I seem to recall from previous similar cases that this "mod update" involved a very large static group, and the RI or memberOf plugin doing its work. Maybe Thierry recalls some of the past cases?
It's really a configuration/indexing issue. But yes, there are long running operations/txns in regards to many plugins doing a lot of things while the database is being updated in the same nested operation. Now when these internal searches are properly indexed the db lock issue completely goes away.
If missing an index were to result in poor performance, agreed -- it's a configuration issue. The server process exiting seems quite an extreme consequence.
It's not exactly crashing, but the db can get corrupted and it needs to be reinitialized. That sounds like a libdb bug to me :-) Running out of db locks should not corrupt the database.
Wondering if this is the result of an old fix for a deadlock problem (bringing the internal op under the main transaction to cure the deadlock)?
Maybe :-) Haven't looked at that code in quite a few years...
How is a regular (non-internal) unindexed search run? Surely that doesn't burn through one lock per page touched?
No it doesn't. See my comment above, standalone unindexed searches do not trigger this issue.
Mark
389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject....
389-users@lists.fedoraproject.org