[389-devel] RFC: New Design: Fine Grained ID List Size

Sat Sep 7 02:49:57 UTC 2013

On 09/06/2013 05:30 PM, David Boreham wrote:
> On 9/6/2013 3:05 PM, Rich Megginson wrote:
>> Please review and comment:
>>
>> http://port389.org/wiki/Design/Fine_Grained_ID_List_Size
>
> This looks interesting. I suppose this is similar to a SQL database's 
> concept of index statistics, and also query hints supplied by the 
> client. Perhaps more of a "server index hint".
>
> This may already been discussed, but in reading through the design 
> doc, I was wondering about having the query planner (such as there is 
> one in the DS) take note of the index hints prior to executing any 
> lookups. This is similar to a SQL database's behavior executing such a 
> query, when it sees index statistics that indicate low cardinality. To 
> expand:
>
> I seem to recall that there is already code to avoid looking up a low 
> cardinality index, if and only if the intersection predicates are 
> ordered suitably, by checking the id list size between index lookups. 
> Thus, if there is a filter for uid=foo & objectclass=bar (apologies 
> for not using the wacky LDAP string filter syntax...), then if only 
> one hit is seen from the uid=foo lookup, the objectclass=bar lookup is 
> skipped. If that's still the case, then the example "bad" search would 
> become good if the client were to re-order the predicates. Of course 
> often the client can not be modified, so:
>
> My thought is to add that functionality to the server -- the client 
> can then submit filters without regard to the internal workings of the 
> server. The server checks the predicates against Rich's new index 
> hints, and can therefore make the correct ordering itself. The benefit 
> would be that no additional index lookup would be done, vs one that 
> meets the id limit pertaining to the index, and the administrator only 
> has to know that the index has low cardinality. A further "refinement" 
> would be to make a tool that populates the hint data based on analysis 
> of the index content, a la SQL "UPDATE STATISTICS".
Thanks for your feedback David!

This is a good idea, and it is something that we discussed briefly 
off-list.  The only downside is that we need to change the index format 
to keep a count of ids for each key.  Implementing this isn't a big 
problem, but it does mean that the existing indexes need to be updated 
to populate the count based off of the contents (as you mention above).

In the short term, we are looking for a way to be able to improve 
performance for specific search filters that are not possible to modify 
on the client side (for whatever reason) while leaving the index file 
format exactly as it is.  I still feel that there is potentially great 
value in keeping a count of ids per key so we can optimize things on the 
server side automatically without the need for complex index 
configuration on the administrator's part.  I think we should consider 
this for an additional future enhancement.

Thanks,
-NGK
>
> Hopefully this makes sense. Apologies if it has already been considered.
>
>
> -- 
> 389-devel mailing list
> 389-devel at lists.fedoraproject.org
> https://admin.fedoraproject.org/mailman/listinfo/389-devel
>
>