[389-devel] 389-devel Digest, Vol 99, Issue 6

Thu Sep 12 12:57:46 UTC 2013

On 09/11/2013 07:41 PM, Howard Chu wrote:
> 389-devel-request at lists.fedoraproject.org wrote:
>> Send 389-devel mailing list submissions to
>>     389-devel at lists.fedoraproject.org
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>>     https://admin.fedoraproject.org/mailman/listinfo/389-devel
>> or, via email, send a message with subject or body 'help' to
>>     389-devel-request at lists.fedoraproject.org
>>
>> You can reach the person managing the list at
>>     389-devel-owner at lists.fedoraproject.org
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of 389-devel digest..."
>>
>>
>> Today's Topics:
>>
>>     1. Re: RFC: New Design: Fine Grained ID List Size (Rich Megginson)
>>     2. Re: RFC: New Design: Fine Grained ID List Size (Ludwig Krispenz)
>>
>>
>> ----------------------------------------------------------------------
>>
>> Message: 1
>> Date: Tue, 10 Sep 2013 08:29:14 -0600
>> From: Rich Megginson <rmeggins at redhat.com>
>> To: Ludwig Krispenz <lkrispen at redhat.com>
>> Cc: "389 Directory server developer discussion."
>>     <389-devel at lists.fedoraproject.org>
>> Subject: Re: [389-devel] RFC: New Design: Fine Grained ID List Size
>> Message-ID: <522F2CBA.1040205 at redhat.com>
>> Content-Type: text/plain; charset=UTF-8; format=flowed
>>
>> On 09/10/2013 01:47 AM, Ludwig Krispenz wrote:
>>>
>>> On 09/09/2013 07:19 PM, Rich Megginson wrote:
>>>> On 09/09/2013 02:27 AM, Ludwig Krispenz wrote:
>>>>>
>>>>> On 09/07/2013 05:02 AM, David Boreham wrote:
>>>>>> On 9/6/2013 8:49 PM, Nathan Kinder wrote:
>>>>>>> This is a good idea, and it is something that we discussed briefly
>>>>>>> off-list.  The only downside is that we need to change the index
>>>>>>> format to keep a count of ids for each key. Implementing this
>>>>>>> isn't a big problem, but it does mean that the existing indexes
>>>>>>> need to be updated to populate the count based off of the contents
>>>>>>> (as you mention above).
>>>>>>
>>>>>> I don't think you need to do this (I certainly wasn't advocating
>>>>>> doing so). The "statistics" state is much the same as that proposed
>>>>>> in Rich's design. In fact you could probably just use that same
>>>>>> information. My idea is more about where and how you use the
>>>>>> information. All you need is something associated with each index
>>>>>> that says "not much point looking here if you're after something
>>>>>> specific, move along, look somewhere else instead". This is much
>>>>>> the same information as "don't use a high scan limit here".
>>>>>>
>>>>>>>
>>>>>>> In the short term, we are looking for a way to be able to improve
>>>>>>> performance for specific search filters that are not possible to
>>>>>>> modify on the client side (for whatever reason) while leaving the
>>>>>>> index file format exactly as it is.  I still feel that there is
>>>>>>> potentially great value in keeping a count of ids per key so we
>>>>>>> can optimize things on the server side automatically without the
>>>>>>> need for complex index configuration on the administrator's part.
>>>>>>> I think we should consider this for an additional future 
>>>>>>> enhancement.
>>>>>>
>>>>>> I'm saying the same thing. Keeping a cardinality count per key is
>>>>>> way more than I'm proposing, and I'm not sure how useful that would
>>>>>> be anyway, unless you want to do OLAP in the DS ;)
>>>>> we have the cardinality of the key in old-idl and this makes some
>>>>> searches where parts of the filter are allids fast.
>>>>>
>>>>> I'm late in the discussion, but I think Rich's proposal is very
>>>>> promising to address all the problems related to allids in new-idl.
>>>>>
>>>>> We could then eventually rework filter ordering based on these
>>>>> configurations. Right now we only have a filter ordering based on
>>>>> index type and try to postpone "<=" or similar filter as they are
>>>>> known to be costly, but this could be more elaborate.
>>>>>
>>>>> An alternative would be to have some kind of index lookup caching.
>>>>> In the example in ticket 47474 the filter is
>>>>> (&(|(objectClass=organizationalPerson)(objectClass=inetOrgPerson)(objectClass=organization)(objectClass=organizationalUnit)(objectClass=groupOf 
>>>>>
>>>>> Names)(objectClass=groupOfUniqueNames)(objectClass=group))(c3sUserID=EndUser0000078458))" 
>>>>>
>>>>> and probably only the "c3sUserID=xxxxx" part will change, if we
>>>>> cache the result for the (&(|(objectClass=... part, even if it is
>>>>> expensive, it would be done only once.
>>>>
>>>> Thanks everyone for the comments.  I have added Noriko's suggestion:
>>>> http://port389.org/wiki/Design/Fine_Grained_ID_List_Size
>>>>
>>>> David, Ludwig: Does the current design address your concerns, and/or
>>>> provide the necessary first step for further refinements?
>>> yes, the topic of filter reordering or caching could be looked at
>>> independently.
>>>
>>> Just one concern abou the syntax:
>>>
>>> nsIndexIDListScanLimit:
>>> maxsize[:indextype][:flag[,flag...]][:value[,value...]]
>>>
>>> since everything is optional, how do you decide if in
>>> nsIndexIDListScanLimit: 6:eq:AND "AND" is a value or a flag ?
>>> and as it defines limits for specific keys, could the attributname
>>> reflect this, eg nsIndexKeyIDListScanLimit or nsIndexKeyScanLimit or
>>> ... ?
>>
>> Thanks, yes, it is ambiguous.
>> I think it may have to use keyword=value, so something like this:
>>
>> nsIndexIDListScanLimit: limit=NNN [type=eq[,sub]] [flags=ADD[,OR]]
>> [values=val[,val...]]
>>
>> That should be easy to parse for both humans and machines.
>> For values, will have to figure out a way to have escapes (e.g. if a
>> value contains a comma or an escape character).   Was thinking of using
>> LDAP escapes (e.g. \, or \032)
>>
>>>>
>>>>>>
>>>>>>
>>>>>> -- 
>>>>>> 389-devel mailing list
>>>>>> 389-devel at lists.fedoraproject.org
>>>>>> https://admin.fedoraproject.org/mailman/listinfo/389-devel
>>>>>
>>>>> -- 
>>>>> 389-devel mailing list
>>>>> 389-devel at lists.fedoraproject.org
>>>>> https://admin.fedoraproject.org/mailman/listinfo/389-devel
>>>>
>>>
>>
>>
>>
>> ------------------------------
>>
>> Message: 2
>> Date: Tue, 10 Sep 2013 16:35:07 +0200
>> From: Ludwig Krispenz <lkrispen at redhat.com>
>> To: Rich Megginson <rmeggins at redhat.com>
>> Cc: "389 Directory server developer discussion."
>>     <389-devel at lists.fedoraproject.org>
>> Subject: Re: [389-devel] RFC: New Design: Fine Grained ID List Size
>> Message-ID: <522F2E1B.9010600 at redhat.com>
>> Content-Type: text/plain; charset=UTF-8; format=flowed
>>
>>
>> On 09/10/2013 04:29 PM, Rich Megginson wrote:
>>> On 09/10/2013 01:47 AM, Ludwig Krispenz wrote:
>>>>
>>>> On 09/09/2013 07:19 PM, Rich Megginson wrote:
>>>>> On 09/09/2013 02:27 AM, Ludwig Krispenz wrote:
>>>>>>
>>>>>> On 09/07/2013 05:02 AM, David Boreham wrote:
>>>>>>> On 9/6/2013 8:49 PM, Nathan Kinder wrote:
>>>>>>>> This is a good idea, and it is something that we discussed
>>>>>>>> briefly off-list.  The only downside is that we need to change
>>>>>>>> the index format to keep a count of ids for each key.
>>>>>>>> Implementing this isn't a big problem, but it does mean that the
>>>>>>>> existing indexes need to be updated to populate the count based
>>>>>>>> off of the contents (as you mention above).
>>>>>>>
>>>>>>> I don't think you need to do this (I certainly wasn't advocating
>>>>>>> doing so). The "statistics" state is much the same as that
>>>>>>> proposed in Rich's design. In fact you could probably just use
>>>>>>> that same information. My idea is more about where and how you use
>>>>>>> the information. All you need is something associated with each
>>>>>>> index that says "not much point looking here if you're after
>>>>>>> something specific, move along, look somewhere else instead". This
>>>>>>> is much the same information as "don't use a high scan limit here".
>>>>>>>
>>>>>>>>
>>>>>>>> In the short term, we are looking for a way to be able to improve
>>>>>>>> performance for specific search filters that are not possible to
>>>>>>>> modify on the client side (for whatever reason) while leaving the
>>>>>>>> index file format exactly as it is.  I still feel that there is
>>>>>>>> potentially great value in keeping a count of ids per key so we
>>>>>>>> can optimize things on the server side automatically without the
>>>>>>>> need for complex index configuration on the administrator's part.
>>>>>>>> I think we should consider this for an additional future
>>>>>>>> enhancement.
>>>>>>>
>>>>>>> I'm saying the same thing. Keeping a cardinality count per key is
>>>>>>> way more than I'm proposing, and I'm not sure how useful that
>>>>>>> would be anyway, unless you want to do OLAP in the DS ;)
>>>>>> we have the cardinality of the key in old-idl and this makes some
>>>>>> searches where parts of the filter are allids fast.
>
> Just out of curiosity, why is keeping a count per key a problem? If 
> you're using BDB duplicate key support, can't you just use  to get 
> this? I.e., BDB already maintains key counts internally, why not 
> leverage that?
well, cursor->c_count() is a function which counts the duplicates and to 
do this will have to access all the pages containing duplicates, so I am 
not sure we will gain much