[389-devel] RFC: New Design: Fine Grained ID List Size
Rich Megginson
rmeggins at redhat.com
Thu Sep 12 14:40:26 UTC 2013
On 09/12/2013 07:39 AM, thierry bordaz wrote:
> On 09/10/2013 04:35 PM, Ludwig Krispenz wrote:
>>
>> On 09/10/2013 04:29 PM, Rich Megginson wrote:
>>> On 09/10/2013 01:47 AM, Ludwig Krispenz wrote:
>>>>
>>>> On 09/09/2013 07:19 PM, Rich Megginson wrote:
>>>>> On 09/09/2013 02:27 AM, Ludwig Krispenz wrote:
>>>>>>
>>>>>> On 09/07/2013 05:02 AM, David Boreham wrote:
>>>>>>> On 9/6/2013 8:49 PM, Nathan Kinder wrote:
>>>>>>>> This is a good idea, and it is something that we discussed
>>>>>>>> briefly off-list. The only downside is that we need to change
>>>>>>>> the index format to keep a count of ids for each key.
>>>>>>>> Implementing this isn't a big problem, but it does mean that
>>>>>>>> the existing indexes need to be updated to populate the count
>>>>>>>> based off of the contents (as you mention above).
>>>>>>>
>>>>>>> I don't think you need to do this (I certainly wasn't advocating
>>>>>>> doing so). The "statistics" state is much the same as that
>>>>>>> proposed in Rich's design. In fact you could probably just use
>>>>>>> that same information. My idea is more about where and how you
>>>>>>> use the information. All you need is something associated with
>>>>>>> each index that says "not much point looking here if you're
>>>>>>> after something specific, move along, look somewhere else
>>>>>>> instead". This is much the same information as "don't use a high
>>>>>>> scan limit here".
>>>>>>>
>>>>>>>>
>>>>>>>> In the short term, we are looking for a way to be able to
>>>>>>>> improve performance for specific search filters that are not
>>>>>>>> possible to modify on the client side (for whatever reason)
>>>>>>>> while leaving the index file format exactly as it is. I still
>>>>>>>> feel that there is potentially great value in keeping a count
>>>>>>>> of ids per key so we can optimize things on the server side
>>>>>>>> automatically without the need for complex index configuration
>>>>>>>> on the administrator's part. I think we should consider this
>>>>>>>> for an additional future enhancement.
>>>>>>>
>>>>>>> I'm saying the same thing. Keeping a cardinality count per key
>>>>>>> is way more than I'm proposing, and I'm not sure how useful that
>>>>>>> would be anyway, unless you want to do OLAP in the DS ;)
>>>>>> we have the cardinality of the key in old-idl and this makes some
>>>>>> searches where parts of the filter are allids fast.
>>>>>>
>>>>>> I'm late in the discussion, but I think Rich's proposal is very
>>>>>> promising to address all the problems related to allids in new-idl.
>>>>>>
>>>>>> We could then eventually rework filter ordering based on these
>>>>>> configurations. Right now we only have a filter ordering based on
>>>>>> index type and try to postpone "<=" or similar filter as they are
>>>>>> known to be costly, but this could be more elaborate.
>>>>>>
>>>>>> An alternative would be to have some kind of index lookup
>>>>>> caching. In the example in ticket 47474 the filter is
>>>>>> (&(|(objectClass=organizationalPerson)(objectClass=inetOrgPerson)(objectClass=organization)(objectClass=organizationalUnit)(objectClass=groupOf
>>>>>> Names)(objectClass=groupOfUniqueNames)(objectClass=group))(c3sUserID=EndUser0000078458))"
>>>>>> and probably only the "c3sUserID=xxxxx" part will change, if we
>>>>>> cache the result for the (&(|(objectClass=... part, even if it is
>>>>>> expensive, it would be done only once.
>>>>>
>>>>> Thanks everyone for the comments. I have added Noriko's suggestion:
>>>>> http://port389.org/wiki/Design/Fine_Grained_ID_List_Size
>>>>>
>>>>> David, Ludwig: Does the current design address your concerns,
>>>>> and/or provide the necessary first step for further refinements?
>>>> yes, the topic of filter reordering or caching could be looked at
>>>> independently.
>>>>
>>>> Just one concern abou the syntax:
>>>>
>>>> nsIndexIDListScanLimit:
>>>> maxsize[:indextype][:flag[,flag...]][:value[,value...]]
>>>>
>>>> since everything is optional, how do you decide if in
>>>> nsIndexIDListScanLimit: 6:eq:AND "AND" is a value or a flag ?
>>>> and as it defines limits for specific keys, could the attributname
>>>> reflect this, eg nsIndexKeyIDListScanLimit or nsIndexKeyScanLimit
>>>> or ... ?
>>>
>>> Thanks, yes, it is ambiguous.
>>> I think it may have to use keyword=value, so something like this:
>>>
>>> nsIndexIDListScanLimit: limit=NNN [type=eq[,sub]] [flags=ADD[,OR]]
>>> [values=val[,val...]]
>>>
>>> That should be easy to parse for both humans and machines.
>>> For values, will have to figure out a way to have escapes (e.g. if a
>>> value contains a comma or an escape character). Was thinking of
>>> using LDAP escapes (e.g. \, or \032)
>> they should be treated as in filters and normalized, in the config it
>> should be the string representation according to the attributetype
>
> Hi,
>
> I was wondering if this configuration attribute at the index
> level, could not also be implemented at the bind-base level.
>
It could be - it would be more difficult to do - you would have to have
the nsIndexIDListScanLimit attribute specified in the user entry, and it
would have to specify the attribute type e.g.
dn: uid=admin,....
nsIndexIDListScanLimit: limit=xxxx attr=objectclass type=eq
value=inetOrgPerson
Or perhaps a new attribute - nsIndexIDListScanLimit should be not
operational for use in nsIndex, but should be operational for use in a
user entry.
> If an application use to bind with a given entry, it could use its
> own limitations put for example into operational attribute in the
> bound entry itself.
>
Yes, and we already do this for other limits.
>
> So that two applications, using the same filter component could
> have their specific idlist size.
> Anyway if it makes sense it could be added later.
>
Yes, thanks.
> best regards
> thierry
>>>
>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> 389-devel mailing list
>>>>>>> 389-devel at lists.fedoraproject.org
>>>>>>> https://admin.fedoraproject.org/mailman/listinfo/389-devel
>>>>>>
>>>>>> --
>>>>>> 389-devel mailing list
>>>>>> 389-devel at lists.fedoraproject.org
>>>>>> https://admin.fedoraproject.org/mailman/listinfo/389-devel
>>>>>
>>>>
>>>
>>
>> --
>> 389-devel mailing list
>> 389-devel at lists.fedoraproject.org
>> https://admin.fedoraproject.org/mailman/listinfo/389-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.fedoraproject.org/pipermail/389-devel/attachments/20130912/eb3e983c/attachment.html>
More information about the 389-devel
mailing list