September 2013 - 389-devel - Fedora Mailing-Lists

Re: [389-devel] RFC: New Design: Fine Grained ID List Size

by Howard Chu

Rich Megginson wrote: > On 09/12/2013 07:08 PM, David Boreham wrote: >> > On 9/11/2013 11:41 AM, Howard Chu wrote: >>> >> >>> >> Just out of curiosity, why is keeping a count per key a problem? If >>> >> you're using BDB duplicate key support, can't you just use >>> >> cursor->c_count() to get this? I.e., BDB already maintains key counts >>> >> internally, why not leverage that? >> > afaik you need to pass the DB_RECNUM flag at DB creation time to get >> > record counting behavior, and it imposes a performance and concurrency >> > penalty on writes. Also afaik 389DS does not set that flag except on >> > VLV indexes (which need it, and coincidentally were the original >> > reason for the feature being added to BDB). > I'm using bdb 4.7 on RHEL 6. > Looking at the code, it appears the dbc->count method for btree is > __bamc_count() in bt_cursor.c. I'm not sure, but it looks as though > this function has to iterate each page counting the duplicates on each > page, which makes it a non-starter. Unless I'm mistaken, it doesn't > look as though it keeps a counter on each update, then simply returns > the counter. I don't see any code which would make the behavior > different depending on if DB_RECNUM is used when the database is created. Ah totally forgot about that, it's been a couple years since I looked inside that code. LMDB updates record counts on every write op so returning the count is zero-cost. (Ironically we don't use this fact to optimize filter evaluation order in OpenLDAP. Probably should...) Also due to the fact that writing a leaf page requires every page up to the root to be updated (copy-on-write design), updating the counts also comes "for free" since the root page had to be updated anyway. (Or put another way, LMDB writes are already slow by design; updating the counters doesn't make them any slower.) -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

10 years, 7 months

2
1
0 / 0

Re: [389-devel] 389-devel Digest, Vol 99, Issue 6

by Howard Chu

389-devel-request(a)lists.fedoraproject.org wrote: > Send 389-devel mailing list submissions to > 389-devel(a)lists.fedoraproject.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://admin.fedoraproject.org/mailman/listinfo/389-devel > or, via email, send a message with subject or body 'help' to > 389-devel-request(a)lists.fedoraproject.org > > You can reach the person managing the list at > 389-devel-owner(a)lists.fedoraproject.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of 389-devel digest..." > > > Today's Topics: > > 1. Re: RFC: New Design: Fine Grained ID List Size (Rich Megginson) > 2. Re: RFC: New Design: Fine Grained ID List Size (Ludwig Krispenz) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Tue, 10 Sep 2013 08:29:14 -0600 > From: Rich Megginson <rmeggins(a)redhat.com> > To: Ludwig Krispenz <lkrispen(a)redhat.com> > Cc: "389 Directory server developer discussion." > <389-devel(a)lists.fedoraproject.org> > Subject: Re: [389-devel] RFC: New Design: Fine Grained ID List Size > Message-ID: <522F2CBA.1040205(a)redhat.com> > Content-Type: text/plain; charset=UTF-8; format=flowed > > On 09/10/2013 01:47 AM, Ludwig Krispenz wrote: >> >> On 09/09/2013 07:19 PM, Rich Megginson wrote: >>> On 09/09/2013 02:27 AM, Ludwig Krispenz wrote: >>>> >>>> On 09/07/2013 05:02 AM, David Boreham wrote: >>>>> On 9/6/2013 8:49 PM, Nathan Kinder wrote: >>>>>> This is a good idea, and it is something that we discussed briefly >>>>>> off-list. The only downside is that we need to change the index >>>>>> format to keep a count of ids for each key. Implementing this >>>>>> isn't a big problem, but it does mean that the existing indexes >>>>>> need to be updated to populate the count based off of the contents >>>>>> (as you mention above). >>>>> >>>>> I don't think you need to do this (I certainly wasn't advocating >>>>> doing so). The "statistics" state is much the same as that proposed >>>>> in Rich's design. In fact you could probably just use that same >>>>> information. My idea is more about where and how you use the >>>>> information. All you need is something associated with each index >>>>> that says "not much point looking here if you're after something >>>>> specific, move along, look somewhere else instead". This is much >>>>> the same information as "don't use a high scan limit here". >>>>> >>>>>> >>>>>> In the short term, we are looking for a way to be able to improve >>>>>> performance for specific search filters that are not possible to >>>>>> modify on the client side (for whatever reason) while leaving the >>>>>> index file format exactly as it is. I still feel that there is >>>>>> potentially great value in keeping a count of ids per key so we >>>>>> can optimize things on the server side automatically without the >>>>>> need for complex index configuration on the administrator's part. >>>>>> I think we should consider this for an additional future enhancement. >>>>> >>>>> I'm saying the same thing. Keeping a cardinality count per key is >>>>> way more than I'm proposing, and I'm not sure how useful that would >>>>> be anyway, unless you want to do OLAP in the DS ;) >>>> we have the cardinality of the key in old-idl and this makes some >>>> searches where parts of the filter are allids fast. >>>> >>>> I'm late in the discussion, but I think Rich's proposal is very >>>> promising to address all the problems related to allids in new-idl. >>>> >>>> We could then eventually rework filter ordering based on these >>>> configurations. Right now we only have a filter ordering based on >>>> index type and try to postpone "<=" or similar filter as they are >>>> known to be costly, but this could be more elaborate. >>>> >>>> An alternative would be to have some kind of index lookup caching. >>>> In the example in ticket 47474 the filter is >>>> (&(|(objectClass=organizationalPerson)(objectClass=inetOrgPerson)(objectClass=organization)(objectClass=organizationalUnit)(objectClass=groupOf >>>> Names)(objectClass=groupOfUniqueNames)(objectClass=group))(c3sUserID=EndUser0000078458))" >>>> and probably only the "c3sUserID=xxxxx" part will change, if we >>>> cache the result for the (&(|(objectClass=... part, even if it is >>>> expensive, it would be done only once. >>> >>> Thanks everyone for the comments. I have added Noriko's suggestion: >>> http://port389.org/wiki/Design/Fine_Grained_ID_List_Size >>> >>> David, Ludwig: Does the current design address your concerns, and/or >>> provide the necessary first step for further refinements? >> yes, the topic of filter reordering or caching could be looked at >> independently. >> >> Just one concern abou the syntax: >> >> nsIndexIDListScanLimit: >> maxsize[:indextype][:flag[,flag...]][:value[,value...]] >> >> since everything is optional, how do you decide if in >> nsIndexIDListScanLimit: 6:eq:AND "AND" is a value or a flag ? >> and as it defines limits for specific keys, could the attributname >> reflect this, eg nsIndexKeyIDListScanLimit or nsIndexKeyScanLimit or >> ... ? > > Thanks, yes, it is ambiguous. > I think it may have to use keyword=value, so something like this: > > nsIndexIDListScanLimit: limit=NNN [type=eq[,sub]] [flags=ADD[,OR]] > [values=val[,val...]] > > That should be easy to parse for both humans and machines. > For values, will have to figure out a way to have escapes (e.g. if a > value contains a comma or an escape character). Was thinking of using > LDAP escapes (e.g. \, or \032) > >>> >>>>> >>>>> >>>>> -- >>>>> 389-devel mailing list >>>>> 389-devel(a)lists.fedoraproject.org >>>>> https://admin.fedoraproject.org/mailman/listinfo/389-devel >>>> >>>> -- >>>> 389-devel mailing list >>>> 389-devel(a)lists.fedoraproject.org >>>> https://admin.fedoraproject.org/mailman/listinfo/389-devel >>> >> > > > > ------------------------------ > > Message: 2 > Date: Tue, 10 Sep 2013 16:35:07 +0200 > From: Ludwig Krispenz <lkrispen(a)redhat.com> > To: Rich Megginson <rmeggins(a)redhat.com> > Cc: "389 Directory server developer discussion." > <389-devel(a)lists.fedoraproject.org> > Subject: Re: [389-devel] RFC: New Design: Fine Grained ID List Size > Message-ID: <522F2E1B.9010600(a)redhat.com> > Content-Type: text/plain; charset=UTF-8; format=flowed > > > On 09/10/2013 04:29 PM, Rich Megginson wrote: >> On 09/10/2013 01:47 AM, Ludwig Krispenz wrote: >>> >>> On 09/09/2013 07:19 PM, Rich Megginson wrote: >>>> On 09/09/2013 02:27 AM, Ludwig Krispenz wrote: >>>>> >>>>> On 09/07/2013 05:02 AM, David Boreham wrote: >>>>>> On 9/6/2013 8:49 PM, Nathan Kinder wrote: >>>>>>> This is a good idea, and it is something that we discussed >>>>>>> briefly off-list. The only downside is that we need to change >>>>>>> the index format to keep a count of ids for each key. >>>>>>> Implementing this isn't a big problem, but it does mean that the >>>>>>> existing indexes need to be updated to populate the count based >>>>>>> off of the contents (as you mention above). >>>>>> >>>>>> I don't think you need to do this (I certainly wasn't advocating >>>>>> doing so). The "statistics" state is much the same as that >>>>>> proposed in Rich's design. In fact you could probably just use >>>>>> that same information. My idea is more about where and how you use >>>>>> the information. All you need is something associated with each >>>>>> index that says "not much point looking here if you're after >>>>>> something specific, move along, look somewhere else instead". This >>>>>> is much the same information as "don't use a high scan limit here". >>>>>> >>>>>>> >>>>>>> In the short term, we are looking for a way to be able to improve >>>>>>> performance for specific search filters that are not possible to >>>>>>> modify on the client side (for whatever reason) while leaving the >>>>>>> index file format exactly as it is. I still feel that there is >>>>>>> potentially great value in keeping a count of ids per key so we >>>>>>> can optimize things on the server side automatically without the >>>>>>> need for complex index configuration on the administrator's part. >>>>>>> I think we should consider this for an additional future >>>>>>> enhancement. >>>>>> >>>>>> I'm saying the same thing. Keeping a cardinality count per key is >>>>>> way more than I'm proposing, and I'm not sure how useful that >>>>>> would be anyway, unless you want to do OLAP in the DS ;) >>>>> we have the cardinality of the key in old-idl and this makes some >>>>> searches where parts of the filter are allids fast. Just out of curiosity, why is keeping a count per key a problem? If you're using BDB duplicate key support, can't you just use cursor->c_count() to get this? I.e., BDB already maintains key counts internally, why not leverage that? -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

10 years, 7 months

4
5
0 / 0

please review: Ticket 47512 - backend txn plugin fixup tasks should be done in a txn

by Mark Reynolds

https://fedorahosted.org/389/ticket/47512 https://fedorahosted.org/389/attachment/ticket/47512/0001-Ticket-47512-ba... -- Mark Reynolds 389 Development Team Red Hat, Inc mreynolds(a)redhat.com

10 years, 7 months

1
0
0 / 0

Please Review: (ticket 449) Possible to set invalid macros in Macro ACIs

by Nathan Kinder

https://fedorahosted.org/389/ticket/449 https://fedorahosted.org/389/attachment/ticket/449/0001-Ticket-449-Allow-...

10 years, 7 months

1
0
0 / 0

RFC: New Design: Fine Grained ID List Size

by Rich Megginson

Please review and comment: http://port389.org/wiki/Design/Fine_Grained_ID_List_Size

10 years, 7 months

6
13
0 / 0

Please review: [389 Project] #460: support multiple subtrees and filters

by Noriko Hosoi

https://fedorahosted.org/389/ticket/460 https://fedorahosted.org/389/attachment/ticket/460/0001-Ticket-460-suppor... Description: 1. support multiple subtrees new config parameter in windwows sync agreement: winSyncSubtreePair: <DS Subtree>:<AD Subtree> Example: winSyncSubtreePair: ou=OU1,dc=DSexample,dc=com:ou=OU1,DC=ADexample,DC=com winSyncSubtreePair: ou=OU2,dc=DSexample,dc=com:ou=OU2,DC=ADexample,DC=com winSyncSubtreePair: ou=OU3,dc=DSexample,dc=com:ou=OU3,DC=ADexample,DC=com . Attribute type "winSyncSubtreePair" is added to the objectclass "nsDSWindowsReplicationAgreement". . If "winSyncSubtreePair" is not set, there is not behavioral difference: the AD subtree "nsds7WindowsReplicaSubtree" and the DS subtree "nsds7DirectoryReplicaSubtree" are used for the sync target checks. . When "winSyncSubtreePair" is set, the above 2 config parameters are ignored. To determine if an entry is the target of the synchronization, the DN is examined whether the DN is a descendent of any of the subtrees or not. If it is, the subtree of the counter part is retrieved. Moving an entry from one subtree to another is synchronized. Members of a group is synchronized as long as the member entry is in any of the defined subtrees. 2. support filters new config parameters in windwows sync agreement: nsds7WindowsFilter: <additional filter on AD> nsds7DirectoryFilter: <additional filter on DS> Example: nsds7WindowsFilter: (|(cn=*user*)(cn=*group*)) nsds7DirectoryFilter: (|(uid=*user*)(cn=*group*)) . The filters are set to the windows_userfilter and directory_ userfilter in the private area in the windows agreement. And when each server is searched the filters are added to the internal filter. For instance, filters shown in the above Example allow synchronizing the entries which CN contains "user" or "group". 3. Misc . Added slapi_sdn_set_ndn_byref, slapi_sdn_set_ndn_passin, and slapi_sdn_common_ancestor to dn.c (see also slapi-plugin.h). . Fixed memory leaks. . Fixed some of the mixed indentations.

10 years, 7 months

1
0
0 / 0

Please review ticket 47489: Under specific values of nsDS5ReplicaName, replication may get broken or updates missing

by thierry bordaz

https://fedorahosted.org/389/attachment/ticket/47489/0001-Ticket-47489-Un... https://fedorahosted.org/389/ticket/47489

10 years, 7 months

1
0
0 / 0

Please review: #314: ChainOnUpdate behaviour for root user

by Ludwig Krispenz

https://fedorahosted.org/389/ticket/314 https://fedorahosted.org/389/attachment/ticket/314/0001-Ticket-314-ChaiOn...

10 years, 7 months

1
0
0 / 0

Please review: #47502: updates to ruv entry are written to retro changelog

by Ludwig Krispenz

https://fedorahosted.org/389/ticket/47502 https://fedorahosted.org/389/attachment/ticket/47502/0001-Ticket-47502-up...

10 years, 7 months

1
0
0 / 0

Please review: fix regression in fix for #569

by Ludwig Krispenz

https://fedorahosted.org/389/attachment/ticket/569/0001-Fix-regression-fr...

10 years, 7 months

1
0
0 / 0

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

389-devel September 2013