[389-devel] RFC: New Design: Fine Grained ID List Size

Rich Megginson rmeggins at redhat.com
Mon Sep 9 17:19:06 UTC 2013


On 09/09/2013 02:27 AM, Ludwig Krispenz wrote:
>
> On 09/07/2013 05:02 AM, David Boreham wrote:
>> On 9/6/2013 8:49 PM, Nathan Kinder wrote:
>>> This is a good idea, and it is something that we discussed briefly 
>>> off-list.  The only downside is that we need to change the index 
>>> format to keep a count of ids for each key.  Implementing this isn't 
>>> a big problem, but it does mean that the existing indexes need to be 
>>> updated to populate the count based off of the contents (as you 
>>> mention above).
>>
>> I don't think you need to do this (I certainly wasn't advocating 
>> doing so). The "statistics" state is much the same as that proposed 
>> in Rich's design. In fact you could probably just use that same 
>> information. My idea is more about where and how you use the 
>> information. All you need is something associated with each index 
>> that says "not much point looking here if you're after something 
>> specific, move along, look somewhere else instead". This is much the 
>> same information as "don't use a high scan limit here".
>>
>>>
>>> In the short term, we are looking for a way to be able to improve 
>>> performance for specific search filters that are not possible to 
>>> modify on the client side (for whatever reason) while leaving the 
>>> index file format exactly as it is.  I still feel that there is 
>>> potentially great value in keeping a count of ids per key so we can 
>>> optimize things on the server side automatically without the need 
>>> for complex index configuration on the administrator's part. I think 
>>> we should consider this for an additional future enhancement.
>>
>> I'm saying the same thing. Keeping a cardinality count per key is way 
>> more than I'm proposing, and I'm not sure how useful that would be 
>> anyway, unless you want to do OLAP in the DS ;)
> we have the cardinality of the key in old-idl and this makes some 
> searches where parts of the filter are allids fast.
>
> I'm late in the discussion, but I think Rich's proposal is very 
> promising to address all the problems related to allids in new-idl.
>
> We could then eventually rework filter ordering based on these 
> configurations. Right now we only have a filter ordering based on 
> index type and try to postpone "<=" or similar filter as they are 
> known to be costly, but this could be more elaborate.
>
> An alternative would be to have some kind of index lookup caching. In 
> the example in ticket 47474 the filter is 
> (&(|(objectClass=organizationalPerson)(objectClass=inetOrgPerson)(objectClass=organization)(objectClass=organizationalUnit)(objectClass=groupOf
> Names)(objectClass=groupOfUniqueNames)(objectClass=group))(c3sUserID=EndUser0000078458))" 
> and probably only the "c3sUserID=xxxxx" part will change, if we cache 
> the result for the (&(|(objectClass=... part, even if it is expensive, 
> it would be done only once.

Thanks everyone for the comments.  I have added Noriko's suggestion:
http://port389.org/wiki/Design/Fine_Grained_ID_List_Size

David, Ludwig: Does the current design address your concerns, and/or 
provide the necessary first step for further refinements?

>>
>>
>> -- 
>> 389-devel mailing list
>> 389-devel at lists.fedoraproject.org
>> https://admin.fedoraproject.org/mailman/listinfo/389-devel
>
> -- 
> 389-devel mailing list
> 389-devel at lists.fedoraproject.org
> https://admin.fedoraproject.org/mailman/listinfo/389-devel



More information about the 389-devel mailing list