[389-devel] RFC: New Design: Fine Grained ID List Size
Rich Megginson
rmeggins at redhat.com
Tue Sep 10 14:29:14 UTC 2013
On 09/10/2013 01:47 AM, Ludwig Krispenz wrote:
>
> On 09/09/2013 07:19 PM, Rich Megginson wrote:
>> On 09/09/2013 02:27 AM, Ludwig Krispenz wrote:
>>>
>>> On 09/07/2013 05:02 AM, David Boreham wrote:
>>>> On 9/6/2013 8:49 PM, Nathan Kinder wrote:
>>>>> This is a good idea, and it is something that we discussed briefly
>>>>> off-list. The only downside is that we need to change the index
>>>>> format to keep a count of ids for each key. Implementing this
>>>>> isn't a big problem, but it does mean that the existing indexes
>>>>> need to be updated to populate the count based off of the contents
>>>>> (as you mention above).
>>>>
>>>> I don't think you need to do this (I certainly wasn't advocating
>>>> doing so). The "statistics" state is much the same as that proposed
>>>> in Rich's design. In fact you could probably just use that same
>>>> information. My idea is more about where and how you use the
>>>> information. All you need is something associated with each index
>>>> that says "not much point looking here if you're after something
>>>> specific, move along, look somewhere else instead". This is much
>>>> the same information as "don't use a high scan limit here".
>>>>
>>>>>
>>>>> In the short term, we are looking for a way to be able to improve
>>>>> performance for specific search filters that are not possible to
>>>>> modify on the client side (for whatever reason) while leaving the
>>>>> index file format exactly as it is. I still feel that there is
>>>>> potentially great value in keeping a count of ids per key so we
>>>>> can optimize things on the server side automatically without the
>>>>> need for complex index configuration on the administrator's part.
>>>>> I think we should consider this for an additional future enhancement.
>>>>
>>>> I'm saying the same thing. Keeping a cardinality count per key is
>>>> way more than I'm proposing, and I'm not sure how useful that would
>>>> be anyway, unless you want to do OLAP in the DS ;)
>>> we have the cardinality of the key in old-idl and this makes some
>>> searches where parts of the filter are allids fast.
>>>
>>> I'm late in the discussion, but I think Rich's proposal is very
>>> promising to address all the problems related to allids in new-idl.
>>>
>>> We could then eventually rework filter ordering based on these
>>> configurations. Right now we only have a filter ordering based on
>>> index type and try to postpone "<=" or similar filter as they are
>>> known to be costly, but this could be more elaborate.
>>>
>>> An alternative would be to have some kind of index lookup caching.
>>> In the example in ticket 47474 the filter is
>>> (&(|(objectClass=organizationalPerson)(objectClass=inetOrgPerson)(objectClass=organization)(objectClass=organizationalUnit)(objectClass=groupOf
>>> Names)(objectClass=groupOfUniqueNames)(objectClass=group))(c3sUserID=EndUser0000078458))"
>>> and probably only the "c3sUserID=xxxxx" part will change, if we
>>> cache the result for the (&(|(objectClass=... part, even if it is
>>> expensive, it would be done only once.
>>
>> Thanks everyone for the comments. I have added Noriko's suggestion:
>> http://port389.org/wiki/Design/Fine_Grained_ID_List_Size
>>
>> David, Ludwig: Does the current design address your concerns, and/or
>> provide the necessary first step for further refinements?
> yes, the topic of filter reordering or caching could be looked at
> independently.
>
> Just one concern abou the syntax:
>
> nsIndexIDListScanLimit:
> maxsize[:indextype][:flag[,flag...]][:value[,value...]]
>
> since everything is optional, how do you decide if in
> nsIndexIDListScanLimit: 6:eq:AND "AND" is a value or a flag ?
> and as it defines limits for specific keys, could the attributname
> reflect this, eg nsIndexKeyIDListScanLimit or nsIndexKeyScanLimit or
> ... ?
Thanks, yes, it is ambiguous.
I think it may have to use keyword=value, so something like this:
nsIndexIDListScanLimit: limit=NNN [type=eq[,sub]] [flags=ADD[,OR]]
[values=val[,val...]]
That should be easy to parse for both humans and machines.
For values, will have to figure out a way to have escapes (e.g. if a
value contains a comma or an escape character). Was thinking of using
LDAP escapes (e.g. \, or \032)
>>
>>>>
>>>>
>>>> --
>>>> 389-devel mailing list
>>>> 389-devel at lists.fedoraproject.org
>>>> https://admin.fedoraproject.org/mailman/listinfo/389-devel
>>>
>>> --
>>> 389-devel mailing list
>>> 389-devel at lists.fedoraproject.org
>>> https://admin.fedoraproject.org/mailman/listinfo/389-devel
>>
>
More information about the 389-devel
mailing list