[389-devel] RFC: New Design: Fine Grained ID List Size

Rich Megginson rmeggins at redhat.com
Fri Sep 13 21:41:16 UTC 2013


On 09/13/2013 02:39 PM, David Boreham wrote:
> On 9/13/2013 2:18 PM, Rich Megginson wrote:
>> On 09/12/2013 07:08 PM, David Boreham wrote:
>>> On 9/11/2013 11:41 AM, Howard Chu wrote:
>>>>
>>>> Just out of curiosity, why is keeping a count per key a problem? If 
>>>> you're using BDB duplicate key support, can't you just use 
>>>> cursor->c_count() to get this? I.e., BDB already maintains key 
>>>> counts internally, why not leverage that?
>>>>
>>>
>>> afaik you need to pass the DB_RECNUM flag at DB creation time to get 
>>> record counting behavior, and it imposes a performance and 
>>> concurrency penalty on writes. Also afaik 389DS does not set that 
>>> flag except on VLV indexes (which need it, and coincidentally were 
>>> the original reason for the feature being added to BDB).
>>
>> I'm using bdb 4.7 on RHEL 6.
>> Looking at the code, it appears the dbc->count method for btree is 
>> __bamc_count() in bt_cursor.c.  I'm not sure, but it looks as though 
>> this function has to iterate each page counting the duplicates on 
>> each page, which makes it a non-starter. Unless I'm mistaken, it 
>> doesn't look as though it keeps a counter on each update, then simply 
>> returns the counter.  I don't see any code which would make the 
>> behavior different depending on if DB_RECNUM is used when the 
>> database is created.
>
> The DB_RECNUM count feature is not accessed via dbc->count() but 
> through the dbc->c_get() call, passing DB_GET_RECNO, positioning at 
> the last key. You do also need to use nested btrees for it to count 
> the dups, afaik (but we're doing that in the DS indexes already I 
> believe).

I wrote a small bdbtest.py script which uses the python bdb interface.
https://github.com/richm/scripts/blob/master/bdbtest.py

This creates an env, opens a db with 
bsddb.db.DB_DUPSORT|bsddb.db.DB_RECNUM, adds several non-dup and dup 
records, opens a cursor and iterates them.  This is the output:

open dbenv in /var/tmp/dbtest
open db /var/tmp/dbtest/dbtest.db4
no txn records
     key=key0 val=data0
     extra=('', '\x01\x00\x00\x00')
<snip>
     key=key9 val=data9
     extra=('', '\n\x00\x00\x00')
     key=multikey val=multidata0
     extra=('', '\x0b\x00\x00\x00')
<snip>
     key=multikey val=multidata9
     extra=('', '\x0b\x00\x00\x00')

The extra is the str() output of cur.get(bsddb.db.DB_GET_RECNO)

So for all of the dup records, the recno is the same '\b' == 11?

I'm probably missing something, but how do I use this to get the number 
of duplicates?
>
>
>
>
> -- 
> 389-devel mailing list
> 389-devel at lists.fedoraproject.org
> https://admin.fedoraproject.org/mailman/listinfo/389-devel



More information about the 389-devel mailing list