[389-users] sub-tree synchronization/watching: persistent search questions

Fri Jun 7 15:57:42 UTC 2013

On 7.6.2013 16:51, Rich Megginson wrote:
> On 06/07/2013 08:44 AM, Petr Spacek wrote:
>> On 7.6.2013 16:11, Rich Megginson wrote:
>>> On 06/07/2013 05:42 AM, Petr Spacek wrote:
>>>> I would like to get opinions from 389 gurus to following problem.
>>>>
>>>> I have an application (DNS server), which needs to read content of whole one
>>>> sub-tree (cn=dns, dc=test) and keep it synchronized.
>>>>
>>>> The work flow is:
>>>> 1) Application (DNS server) starts
>>>> 2) Application reads all existing data out from the sub-tree
>>>> 3) Application does /something/ with the existing data and starts replying
>>>> to application clients
>>>> 4) Sub-tree has to be kept in sync with LDAP server, i.e. updates from LDAP
>>>> server should be incrementally applied to the 'state' inside the application
>>>>
>>>>
>>>> The problem with persistent search is that it doesn't offer any reliable
>>>> 'signal' that step (2) ended. The search is just running for infinite time
>>>> and I can't find any signal that all existing entries were read already and
>>>> now the application will get only Entry Change Notifications.
>>>>
>>>>
>>>> Basically, I'm looking for something like LDAP syncRepl in refreshAndPersist
>>>> mode with no cookie (RFC 4533 section 1.3.2 and section 3.4).
>>>>
>>>>
>>>> I know that Entry Change Notification from persistent search contains bit
>>>> field which denotes if the entry was added/modified/deleted/nothing (i.e.
>>>> not modified, just read). Unfortunately, this bit field can't be used for
>>>> *reliable* detection that all existing entries were read.
>>>>
>>>>
>>>> Could this 'hack' work reliably?
>>>> 1) Start persistent search (in separate application thread), but suspend
>>>> result processing.
>>>> 2) In another application thread, do the normal sub-tree search on the same
>>>> sub-tree. Normal search will be started *after* the persistent search.
>>>> 3) Process all results from normal search first
>>>> 4) Do /something application specific/
>>>> 5) Start processing updates from persistent search
>>>>
>>>> In my application I can cope with duplicates, when 'normal' search returned
>>>> entry cn=xyz and the persistent search returned the same entry cn=xyz again.
>>>
>>> Could you use entryUSN?  For example - keep searching until the entryUSN in
>>> the entry is the same as the global entryUSN, then fallback to persistent
>>> search?
>>
>> Could you elaborate it a bit more, please? I'm not sure if I understood.
>> What exactly 'global' entryUSN means?
>> Do you mean 'lastUSN' value on particular server?
> Yes.
>> Can it work on server where modification are scarce? (Note that I do
>> sub-tree search on subset of the whole database.)
> Not sure what you mean.  What difference does it make if modifications are
> scarce?  By modifications do you mean adds/mods/modrdn/delete - that is, any
> update?

I need to operate on one sub-tree in the database, not the whole database. I 
think that for this reason I can't depend on fact that sub-tree search will 
encounter entryUSN == lastUSN.

This will never happen if 'my' sub-tree wasn't modified as the last part of 
sub-tree, right? (That is why spoke about 'scarce' updates, and yes, update = 
any modification in given sub-tree.)

Did I misunderstand something?

>> I considered normal search followed by persistent search with entryUSN
>> filter, but IMHO it will not work with entry deletion.
>>
>> For example:
>> 1) Start normal search and request entryUSN attribute (among others)
>> 2) Process all results from search and compute max(entryUSN)
>> 3) Start persistent search with filter (entryUSN > computedMaxValue)
>>
>> I can see the race condition if an entry is deleted between steps (2) and (3).
>>
>> That is exactly what I tried to solve with 'parallel' searches, i.e.
>> effectively avoid any time gap between steps (2) and (3).
>
> I'm not sure what difference it makes if the update is a deletion or not, but
> yes, there is a race condition.
>
>>
>>
>> Of course, I could read entryUSN during normal and persistent search and
>> then skip all results from persistent search with entryUSN <
>> computedMaxValue. Is that what you meant?
>
> Yes.

Anyway, do you think that the approach with 'normal & persistent searches in 
parallel' is enough to avoid the race condition? I.e. Does it prevent me from 
missing any update? (Let's suppose that duplicate-detection is solved :-))

>>>> I can see another option:
>>>> To implement 389 plugin which will provide (very partial) support for RFC
>>>> 4533. The idea is to implement only state-less pieces (no cookies) and
>>>> return some error when client attempts to use a cookie.
>>>
>>> This would also likely use entryUSN for the cookie, internallly.
>> Yes, that was also my idea, but I don't want to implement the 'state-full
>> part' of the RFC in all it's complexity. Now I'm interested only in
>> detection that all existing entries were read :-)
>
> Sure, but it would be nice to implement the whole syncrepl protocol if you're
> going to have to implement it partially anyway.
I definitely agree, but unfortunately, I'm tasked with something different and 
this syncRepl episode is only the small piece of the whole story :-)

>>>> Could somebody judge how difficult it can be? From my (naive) point of view
>>>> are state-less parts of RFC 4533 only 'persistent search encapsulated in
>>>> another LDAP controls'.

-- 
Petr^2 Spacek