[389-devel] RFC: New Design: Fine Grained ID List Size

Ludwig Krispenz lkrispen at redhat.com
Thu Sep 12 18:24:32 UTC 2013


On 09/12/2013 04:40 PM, Rich Megginson wrote:
> On 09/12/2013 07:39 AM, thierry bordaz wrote:
>> On 09/10/2013 04:35 PM, Ludwig Krispenz wrote:
>>>
>>> On 09/10/2013 04:29 PM, Rich Megginson wrote:
>>>> On 09/10/2013 01:47 AM, Ludwig Krispenz wrote:
>>>>>
>>>>> On 09/09/2013 07:19 PM, Rich Megginson wrote:
>>>>>> On 09/09/2013 02:27 AM, Ludwig Krispenz wrote:
>>>>>>>
>>>>>>> On 09/07/2013 05:02 AM, David Boreham wrote:
>>>>>>>> On 9/6/2013 8:49 PM, Nathan Kinder wrote:
>>>>>>>>> This is a good idea, and it is something that we discussed 
>>>>>>>>> briefly off-list. The only downside is that we need to change 
>>>>>>>>> the index format to keep a count of ids for each key. 
>>>>>>>>> Implementing this isn't a big problem, but it does mean that 
>>>>>>>>> the existing indexes need to be updated to populate the count 
>>>>>>>>> based off of the contents (as you mention above).
>>>>>>>>
>>>>>>>> I don't think you need to do this (I certainly wasn't 
>>>>>>>> advocating doing so). The "statistics" state is much the same 
>>>>>>>> as that proposed in Rich's design. In fact you could probably 
>>>>>>>> just use that same information. My idea is more about where and 
>>>>>>>> how you use the information. All you need is something 
>>>>>>>> associated with each index that says "not much point looking 
>>>>>>>> here if you're after something specific, move along, look 
>>>>>>>> somewhere else instead". This is much the same information as 
>>>>>>>> "don't use a high scan limit here".
>>>>>>>>
>>>>>>>>>
>>>>>>>>> In the short term, we are looking for a way to be able to 
>>>>>>>>> improve performance for specific search filters that are not 
>>>>>>>>> possible to modify on the client side (for whatever reason) 
>>>>>>>>> while leaving the index file format exactly as it is.  I still 
>>>>>>>>> feel that there is potentially great value in keeping a count 
>>>>>>>>> of ids per key so we can optimize things on the server side 
>>>>>>>>> automatically without the need for complex index configuration 
>>>>>>>>> on the administrator's part. I think we should consider this 
>>>>>>>>> for an additional future enhancement.
>>>>>>>>
>>>>>>>> I'm saying the same thing. Keeping a cardinality count per key 
>>>>>>>> is way more than I'm proposing, and I'm not sure how useful 
>>>>>>>> that would be anyway, unless you want to do OLAP in the DS ;)
>>>>>>> we have the cardinality of the key in old-idl and this makes 
>>>>>>> some searches where parts of the filter are allids fast.
>>>>>>>
>>>>>>> I'm late in the discussion, but I think Rich's proposal is very 
>>>>>>> promising to address all the problems related to allids in new-idl.
>>>>>>>
>>>>>>> We could then eventually rework filter ordering based on these 
>>>>>>> configurations. Right now we only have a filter ordering based 
>>>>>>> on index type and try to postpone "<=" or similar filter as they 
>>>>>>> are known to be costly, but this could be more elaborate.
>>>>>>>
>>>>>>> An alternative would be to have some kind of index lookup 
>>>>>>> caching. In the example in ticket 47474 the filter is 
>>>>>>> (&(|(objectClass=organizationalPerson)(objectClass=inetOrgPerson)(objectClass=organization)(objectClass=organizationalUnit)(objectClass=groupOf
>>>>>>> Names)(objectClass=groupOfUniqueNames)(objectClass=group))(c3sUserID=EndUser0000078458))" 
>>>>>>> and probably only the "c3sUserID=xxxxx" part will change, if we 
>>>>>>> cache the result for the (&(|(objectClass=... part, even if it 
>>>>>>> is expensive, it would be done only once.
>>>>>>
>>>>>> Thanks everyone for the comments.  I have added Noriko's suggestion:
>>>>>> http://port389.org/wiki/Design/Fine_Grained_ID_List_Size
>>>>>>
>>>>>> David, Ludwig: Does the current design address your concerns, 
>>>>>> and/or provide the necessary first step for further refinements?
>>>>> yes, the topic of filter reordering or caching could be looked at 
>>>>> independently.
>>>>>
>>>>> Just one concern abou the syntax:
>>>>>
>>>>> nsIndexIDListScanLimit: 
>>>>> maxsize[:indextype][:flag[,flag...]][:value[,value...]]
>>>>>
>>>>> since everything is optional, how do you decide if in 
>>>>> nsIndexIDListScanLimit: 6:eq:AND "AND" is a value or a flag ?
>>>>> and as it defines limits for specific keys, could the attributname 
>>>>> reflect this, eg nsIndexKeyIDListScanLimit or nsIndexKeyScanLimit 
>>>>> or ... ?
>>>>
>>>> Thanks, yes, it is ambiguous.
>>>> I think it may have to use keyword=value, so something like this:
>>>>
>>>> nsIndexIDListScanLimit: limit=NNN [type=eq[,sub]] [flags=ADD[,OR]] 
>>>> [values=val[,val...]]
>>>>
>>>> That should be easy to parse for both humans and machines.
>>>> For values, will have to figure out a way to have escapes (e.g. if 
>>>> a value contains a comma or an escape character). Was thinking of 
>>>> using LDAP escapes (e.g. \, or \032)
>>> they should be treated as in filters and normalized, in the config 
>>> it should be the string representation according to the attributetype
>>
>> Hi,
>>
>>     I was wondering if this configuration attribute at the index
>>     level, could not also be implemented at the bind-base level.
>>
>
> It could be - it would be more difficult to do - you would have to 
> have the nsIndexIDListScanLimit attribute specified in the user entry, 
> and it would have to specify the attribute type e.g.
>
> dn: uid=admin,....
> nsIndexIDListScanLimit: limit=xxxx attr=objectclass type=eq 
> value=inetOrgPerson
>
> Or perhaps a new attribute - nsIndexIDListScanLimit should be not 
> operational for use in nsIndex, but should be operational for use in a 
> user entry.
Or it could be handled as a policy, like password policy, have a default 
one and the possibility to assign a specific one at the bind
>
>>     If an application use to bind with a given entry, it could use
>>     its own limitations put for example into operational attribute in
>>     the bound entry itself.
>>
> Yes, and we already do this for other limits.
>>
>>     So that two applications, using the same filter component could
>>     have their specific idlist size.
>>     Anyway if it makes sense it could be added later.
>>
>
> Yes, thanks.
>
>> best regards
>> thierry
>>>>
>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> -- 
>>>>>>>> 389-devel mailing list
>>>>>>>> 389-devel at lists.fedoraproject.org
>>>>>>>> https://admin.fedoraproject.org/mailman/listinfo/389-devel
>>>>>>>
>>>>>>> -- 
>>>>>>> 389-devel mailing list
>>>>>>> 389-devel at lists.fedoraproject.org
>>>>>>> https://admin.fedoraproject.org/mailman/listinfo/389-devel
>>>>>>
>>>>>
>>>>
>>>
>>> -- 
>>> 389-devel mailing list
>>> 389-devel at lists.fedoraproject.org
>>> https://admin.fedoraproject.org/mailman/listinfo/389-devel
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.fedoraproject.org/pipermail/389-devel/attachments/20130912/62068a1b/attachment-0001.html>


More information about the 389-devel mailing list