2017-08-04 16:03 GMT+03:00 Ludwig Krispenz <lkrispen@redhat.com>:

On 08/04/2017 02:08 PM, Ilias Stamatis wrote:

Okay, now that I have read and understood dbscan's code, I have a few more questions.

2017-08-03 10:10 GMT+03:00 Ludwig Krispenz <lkrispen@redhat.com>:

Hi, now that I know the context here are some more comments.

If the purpose is to create a useful ldif file, which could eventually be used for import then formatting an entry correctly is not enough. Order of entries matters: parents need to come before children. We already handle this in db2ldif or replication total update.
That said, whenever you write an entry you always have seen the parent and could stack the dn with the parentid and createt the dn without using the entryrdn index.
You even need not to keep track of all the entry rdsn/dns - only the ones with children will be needed later, the presence of "numsubordinates"
identifies a parent.

Is it guaranteed that parents are going to appear before children in id2entry.db?

no. that's what I said before, it is possible that parentid > entryid. It happens if an entry is moved by modrdn to aother subtree

Ooh, you're right. I got confused, sorry.

I'm also having a hard time finding where this functionality is implemented in db2ldif. :/

If I tried to do it "from scratch", I think we go back to this (because we need to grab something that is located after where the cursor is currently pointing):

On 08/02/2017 09:12 PM, Mark Reynolds wrote:

I have not looked closely into it - so it might not be necessary to use entryrdn. I thought it might be more efficient to use it. If you just use id2entry, you have to keep scanning it over and over, and starting over every time you need to read the next entry. Maybe not though, maybe you can just "search" it and not have to scan it sequentially when trying to find parents and entries. I'll leave that up to you to find out ;-)

BDB has this method: https://docs.oracle.com/cd/E17275_01/html/api_reference/C/dbget.html
It allows you to retrieve a key / data pair directly, without a need for iterating over cursor->c_get(cursor, &key, &data, DB_NEXT).

The thing is that I don't know how it is implemented. Does it scan the DB sequentially or or is it faster than that (I hope and guess it's the latter)?

If it's not that efficient, maybe it does make sense to use entryrdn instead finally?