[389-devel] RFC: New Design: Fine Grained ID List Size
Rich Megginson
rmeggins at redhat.com
Fri Sep 13 21:41:16 UTC 2013
On 09/13/2013 02:39 PM, David Boreham wrote:
> On 9/13/2013 2:18 PM, Rich Megginson wrote:
>> On 09/12/2013 07:08 PM, David Boreham wrote:
>>> On 9/11/2013 11:41 AM, Howard Chu wrote:
>>>>
>>>> Just out of curiosity, why is keeping a count per key a problem? If
>>>> you're using BDB duplicate key support, can't you just use
>>>> cursor->c_count() to get this? I.e., BDB already maintains key
>>>> counts internally, why not leverage that?
>>>>
>>>
>>> afaik you need to pass the DB_RECNUM flag at DB creation time to get
>>> record counting behavior, and it imposes a performance and
>>> concurrency penalty on writes. Also afaik 389DS does not set that
>>> flag except on VLV indexes (which need it, and coincidentally were
>>> the original reason for the feature being added to BDB).
>>
>> I'm using bdb 4.7 on RHEL 6.
>> Looking at the code, it appears the dbc->count method for btree is
>> __bamc_count() in bt_cursor.c. I'm not sure, but it looks as though
>> this function has to iterate each page counting the duplicates on
>> each page, which makes it a non-starter. Unless I'm mistaken, it
>> doesn't look as though it keeps a counter on each update, then simply
>> returns the counter. I don't see any code which would make the
>> behavior different depending on if DB_RECNUM is used when the
>> database is created.
>
> The DB_RECNUM count feature is not accessed via dbc->count() but
> through the dbc->c_get() call, passing DB_GET_RECNO, positioning at
> the last key. You do also need to use nested btrees for it to count
> the dups, afaik (but we're doing that in the DS indexes already I
> believe).
I wrote a small bdbtest.py script which uses the python bdb interface.
https://github.com/richm/scripts/blob/master/bdbtest.py
This creates an env, opens a db with
bsddb.db.DB_DUPSORT|bsddb.db.DB_RECNUM, adds several non-dup and dup
records, opens a cursor and iterates them. This is the output:
open dbenv in /var/tmp/dbtest
open db /var/tmp/dbtest/dbtest.db4
no txn records
key=key0 val=data0
extra=('', '\x01\x00\x00\x00')
<snip>
key=key9 val=data9
extra=('', '\n\x00\x00\x00')
key=multikey val=multidata0
extra=('', '\x0b\x00\x00\x00')
<snip>
key=multikey val=multidata9
extra=('', '\x0b\x00\x00\x00')
The extra is the str() output of cur.get(bsddb.db.DB_GET_RECNO)
So for all of the dup records, the recno is the same '\b' == 11?
I'm probably missing something, but how do I use this to get the number
of duplicates?
>
>
>
>
> --
> 389-devel mailing list
> 389-devel at lists.fedoraproject.org
> https://admin.fedoraproject.org/mailman/listinfo/389-devel
More information about the 389-devel
mailing list