Hi everybody,

Following Ludwig's and Mark's suggestions on how to perform a database dump in LDIF format from dbscan, I have come up with a strategy. I'm talking about ticket #47567: https://pagure.io/389-ds-base/issue/47567

I'd like to discuss this strategy and get some feedback.

The general idea is:

- We are cursing though id2entry.db printing entries in order.
- Parents should be printed before children.
- Hence if for some entry parenitd > entryid, we have to print its parent first (out of order) and track that we did so.
- In order to do the above, we don't need to move the db cursor. We can just go fetch something from a random place in the db and the cursor will remain in its place, so we can continue from where we left off after we're done with printing a parent.
- We also need to construct and print the DN of each entry using its RDN and the DN of its father.
- Let's use something like a hash map to pair entryids with dns (only for entries that has children), e.g. map[1] = "dc=example,dc=com", map[4] = "ou=People,dc=example,dc=com", etc.

I'll present the algorithm that I came up with in python-like pseudo-code.

First, the following function constructs the entry's DN and updates the hash map if needed. We can know whether an entry is a parent or not, by the presence of the numSubordinates attribute.

# assumes that parentid already exists in the map
function display_entry(e, map):
    if not e.parentid:
        e.dn = e.rdn
    else:
        e.dn = e.rdn + map[e.parentid]
    if isparent(e):
        map[e.entryid] = e.dn
    print_to_ldif_format(e)

Then, the main loop:

map = new(hashmap)
printed_in_advance = []

for e in entries:
    if e.entryid in printed_in_advance:
        continue # skip this, we have already displayed it

    if e.parentid < e.entryid:
        display_entry(e, map)

    else:
        # we need to display parents before children

        list_of_parents = []

        p = e
        while p.parentid > p.entryid:
            p = get_from_db(key = p.parentid)
            list_of_parents.append(p) # see note below (*)

        for p in reversed(list_of_parents):
            display_entry(p, map)
            printed_in_advance.append(p.entryid)


* here we store the whole entry in the list (aka its data) and not just
its id, in order to avoid fetching it from the database again

As a map, we can use a B+ tree implementation from libsds.

I would like to know how the above idea sounds to you before going ahead and implementing it.

Thank you!
Ilias