[389-devel] RFC: New Design: Fine Grained ID List Size

Tue Sep 10 14:35:07 UTC 2013

On 09/10/2013 04:29 PM, Rich Megginson wrote:
> On 09/10/2013 01:47 AM, Ludwig Krispenz wrote:
>>
>> On 09/09/2013 07:19 PM, Rich Megginson wrote:
>>> On 09/09/2013 02:27 AM, Ludwig Krispenz wrote:
>>>>
>>>> On 09/07/2013 05:02 AM, David Boreham wrote:
>>>>> On 9/6/2013 8:49 PM, Nathan Kinder wrote:
>>>>>> This is a good idea, and it is something that we discussed 
>>>>>> briefly off-list.  The only downside is that we need to change 
>>>>>> the index format to keep a count of ids for each key.  
>>>>>> Implementing this isn't a big problem, but it does mean that the 
>>>>>> existing indexes need to be updated to populate the count based 
>>>>>> off of the contents (as you mention above).
>>>>>
>>>>> I don't think you need to do this (I certainly wasn't advocating 
>>>>> doing so). The "statistics" state is much the same as that 
>>>>> proposed in Rich's design. In fact you could probably just use 
>>>>> that same information. My idea is more about where and how you use 
>>>>> the information. All you need is something associated with each 
>>>>> index that says "not much point looking here if you're after 
>>>>> something specific, move along, look somewhere else instead". This 
>>>>> is much the same information as "don't use a high scan limit here".
>>>>>
>>>>>>
>>>>>> In the short term, we are looking for a way to be able to improve 
>>>>>> performance for specific search filters that are not possible to 
>>>>>> modify on the client side (for whatever reason) while leaving the 
>>>>>> index file format exactly as it is.  I still feel that there is 
>>>>>> potentially great value in keeping a count of ids per key so we 
>>>>>> can optimize things on the server side automatically without the 
>>>>>> need for complex index configuration on the administrator's part. 
>>>>>> I think we should consider this for an additional future 
>>>>>> enhancement.
>>>>>
>>>>> I'm saying the same thing. Keeping a cardinality count per key is 
>>>>> way more than I'm proposing, and I'm not sure how useful that 
>>>>> would be anyway, unless you want to do OLAP in the DS ;)
>>>> we have the cardinality of the key in old-idl and this makes some 
>>>> searches where parts of the filter are allids fast.
>>>>
>>>> I'm late in the discussion, but I think Rich's proposal is very 
>>>> promising to address all the problems related to allids in new-idl.
>>>>
>>>> We could then eventually rework filter ordering based on these 
>>>> configurations. Right now we only have a filter ordering based on 
>>>> index type and try to postpone "<=" or similar filter as they are 
>>>> known to be costly, but this could be more elaborate.
>>>>
>>>> An alternative would be to have some kind of index lookup caching. 
>>>> In the example in ticket 47474 the filter is 
>>>> (&(|(objectClass=organizationalPerson)(objectClass=inetOrgPerson)(objectClass=organization)(objectClass=organizationalUnit)(objectClass=groupOf
>>>> Names)(objectClass=groupOfUniqueNames)(objectClass=group))(c3sUserID=EndUser0000078458))" 
>>>> and probably only the "c3sUserID=xxxxx" part will change, if we 
>>>> cache the result for the (&(|(objectClass=... part, even if it is 
>>>> expensive, it would be done only once.
>>>
>>> Thanks everyone for the comments.  I have added Noriko's suggestion:
>>> http://port389.org/wiki/Design/Fine_Grained_ID_List_Size
>>>
>>> David, Ludwig: Does the current design address your concerns, and/or 
>>> provide the necessary first step for further refinements?
>> yes, the topic of filter reordering or caching could be looked at 
>> independently.
>>
>> Just one concern abou the syntax:
>>
>> nsIndexIDListScanLimit: 
>> maxsize[:indextype][:flag[,flag...]][:value[,value...]]
>>
>> since everything is optional, how do you decide if in 
>> nsIndexIDListScanLimit: 6:eq:AND "AND" is a value or a flag ?
>> and as it defines limits for specific keys, could the attributname 
>> reflect this, eg nsIndexKeyIDListScanLimit or nsIndexKeyScanLimit or 
>> ... ?
>
> Thanks, yes, it is ambiguous.
> I think it may have to use keyword=value, so something like this:
>
> nsIndexIDListScanLimit: limit=NNN [type=eq[,sub]] [flags=ADD[,OR]] 
> [values=val[,val...]]
>
> That should be easy to parse for both humans and machines.
> For values, will have to figure out a way to have escapes (e.g. if a 
> value contains a comma or an escape character).   Was thinking of 
> using LDAP escapes (e.g. \, or \032)
they should be treated as in filters and normalized, in the config it 
should be the string representation according to the attributetype
>
>>>
>>>>>
>>>>>
>>>>> -- 
>>>>> 389-devel mailing list
>>>>> 389-devel at lists.fedoraproject.org
>>>>> https://admin.fedoraproject.org/mailman/listinfo/389-devel
>>>>
>>>> -- 
>>>> 389-devel mailing list
>>>> 389-devel at lists.fedoraproject.org
>>>> https://admin.fedoraproject.org/mailman/listinfo/389-devel
>>>
>>
>