[389-users] sub-tree synchronization/watching: persistent search questions

Fri Jun 7 16:12:48 UTC 2013

On 06/07/2013 09:57 AM, Petr Spacek wrote:
> On 7.6.2013 16:51, Rich Megginson wrote:
>> On 06/07/2013 08:44 AM, Petr Spacek wrote:
>>> On 7.6.2013 16:11, Rich Megginson wrote:
>>>> On 06/07/2013 05:42 AM, Petr Spacek wrote:
>>>>> I would like to get opinions from 389 gurus to following problem.
>>>>>
>>>>> I have an application (DNS server), which needs to read content of 
>>>>> whole one
>>>>> sub-tree (cn=dns, dc=test) and keep it synchronized.
>>>>>
>>>>> The work flow is:
>>>>> 1) Application (DNS server) starts
>>>>> 2) Application reads all existing data out from the sub-tree
>>>>> 3) Application does /something/ with the existing data and starts 
>>>>> replying
>>>>> to application clients
>>>>> 4) Sub-tree has to be kept in sync with LDAP server, i.e. updates 
>>>>> from LDAP
>>>>> server should be incrementally applied to the 'state' inside the 
>>>>> application
>>>>>
>>>>>
>>>>> The problem with persistent search is that it doesn't offer any 
>>>>> reliable
>>>>> 'signal' that step (2) ended. The search is just running for 
>>>>> infinite time
>>>>> and I can't find any signal that all existing entries were read 
>>>>> already and
>>>>> now the application will get only Entry Change Notifications.
>>>>>
>>>>>
>>>>> Basically, I'm looking for something like LDAP syncRepl in 
>>>>> refreshAndPersist
>>>>> mode with no cookie (RFC 4533 section 1.3.2 and section 3.4).
>>>>>
>>>>>
>>>>> I know that Entry Change Notification from persistent search 
>>>>> contains bit
>>>>> field which denotes if the entry was 
>>>>> added/modified/deleted/nothing (i.e.
>>>>> not modified, just read). Unfortunately, this bit field can't be 
>>>>> used for
>>>>> *reliable* detection that all existing entries were read.
>>>>>
>>>>>
>>>>> Could this 'hack' work reliably?
>>>>> 1) Start persistent search (in separate application thread), but 
>>>>> suspend
>>>>> result processing.
>>>>> 2) In another application thread, do the normal sub-tree search on 
>>>>> the same
>>>>> sub-tree. Normal search will be started *after* the persistent 
>>>>> search.
>>>>> 3) Process all results from normal search first
>>>>> 4) Do /something application specific/
>>>>> 5) Start processing updates from persistent search
>>>>>
>>>>> In my application I can cope with duplicates, when 'normal' search 
>>>>> returned
>>>>> entry cn=xyz and the persistent search returned the same entry 
>>>>> cn=xyz again.
>>>>
>>>> Could you use entryUSN?  For example - keep searching until the 
>>>> entryUSN in
>>>> the entry is the same as the global entryUSN, then fallback to 
>>>> persistent
>>>> search?
>>>
>>> Could you elaborate it a bit more, please? I'm not sure if I 
>>> understood.
>>> What exactly 'global' entryUSN means?
>>> Do you mean 'lastUSN' value on particular server?
>> Yes.
>>> Can it work on server where modification are scarce? (Note that I do
>>> sub-tree search on subset of the whole database.)
>> Not sure what you mean.  What difference does it make if 
>> modifications are
>> scarce?  By modifications do you mean adds/mods/modrdn/delete - that 
>> is, any
>> update?
>
> I need to operate on one sub-tree in the database, not the whole 
> database. I think that for this reason I can't depend on fact that 
> sub-tree search will encounter entryUSN == lastUSN.
>
> This will never happen if 'my' sub-tree wasn't modified as the last 
> part of sub-tree, right? (That is why spoke about 'scarce' updates, 
> and yes, update = any modification in given sub-tree.)
>
> Did I misunderstand something?

No, I see what you mean.

>
>>> I considered normal search followed by persistent search with entryUSN
>>> filter, but IMHO it will not work with entry deletion.
>>>
>>> For example:
>>> 1) Start normal search and request entryUSN attribute (among others)
>>> 2) Process all results from search and compute max(entryUSN)
>>> 3) Start persistent search with filter (entryUSN > computedMaxValue)
>>>
>>> I can see the race condition if an entry is deleted between steps 
>>> (2) and (3).
>>>
>>> That is exactly what I tried to solve with 'parallel' searches, i.e.
>>> effectively avoid any time gap between steps (2) and (3).
>>
>> I'm not sure what difference it makes if the update is a deletion or 
>> not, but
>> yes, there is a race condition.
>>
>>>
>>>
>>> Of course, I could read entryUSN during normal and persistent search 
>>> and
>>> then skip all results from persistent search with entryUSN <
>>> computedMaxValue. Is that what you meant?
>>
>> Yes.
>
> Anyway, do you think that the approach with 'normal & persistent 
> searches in parallel' is enough to avoid the race condition? I.e. Does 
> it prevent me from missing any update? (Let's suppose that 
> duplicate-detection is solved :-))

I think so - or at least, I don't see any other way to do this, short of 
the full syncrepl.

>
>>>>> I can see another option:
>>>>> To implement 389 plugin which will provide (very partial) support 
>>>>> for RFC
>>>>> 4533. The idea is to implement only state-less pieces (no cookies) 
>>>>> and
>>>>> return some error when client attempts to use a cookie.
>>>>
>>>> This would also likely use entryUSN for the cookie, internallly.
>>> Yes, that was also my idea, but I don't want to implement the 
>>> 'state-full
>>> part' of the RFC in all it's complexity. Now I'm interested only in
>>> detection that all existing entries were read :-)
>>
>> Sure, but it would be nice to implement the whole syncrepl protocol 
>> if you're
>> going to have to implement it partially anyway.
> I definitely agree, but unfortunately, I'm tasked with something 
> different and this syncRepl episode is only the small piece of the 
> whole story :-)

Sure, but this might be enough motivation for the core 389 team to pick 
and finish syncrepl based on what you started.

>
>>>>> Could somebody judge how difficult it can be? From my (naive) 
>>>>> point of view
>>>>> are state-less parts of RFC 4533 only 'persistent search 
>>>>> encapsulated in
>>>>> another LDAP controls'.
>