[389-devel] fractional replication monitoring proposal

Thu Oct 17 09:06:10 UTC 2013

On 10/17/2013 10:56 AM, thierry bordaz wrote:
> On 10/17/2013 10:49 AM, Ludwig Krispenz wrote:
>>
>> On 10/17/2013 10:15 AM, thierry bordaz wrote:
>>> On 10/16/2013 05:41 PM, Ludwig Krispenz wrote:
>>>>
>>>> On 10/16/2013 05:28 PM, Mark Reynolds wrote:
>>>>>
>>>>> On 10/16/2013 11:05 AM, Ludwig Krispenz wrote:
>>>>>>
>>>>>> On 10/15/2013 10:41 PM, Mark Reynolds wrote:
>>>>>>> https://fedorahosted.org/389/ticket/47368
>>>>>>>
>>>>>>> So we run into issues when trying to figure out if replicas are 
>>>>>>> in synch(if those replicas use fractional replication and "strip 
>>>>>>> mods").  What happens is that an update is made on master A, but 
>>>>>>> due to fractional replication there is no update made to any 
>>>>>>> replicas. So if you look at the ruv in the tombstone entry on 
>>>>>>> each server, it would appear they are out of synch.  So using 
>>>>>>> the ruv in the db tombstone is no longer accurate when using 
>>>>>>> fractional replication.
>>>>>>>
>>>>>>> I'm proposing a new ruv to be stored in the backend replica 
>>>>>>> entry: e.g. cn=replica,cn="dc=example,dc=com",cn=mapping 
>>>>>>> tree,cn=config. I'm calling this the "replicated ruv".  So 
>>>>>>> whenever we actually send an update to a replica, this ruv will 
>>>>>>> get updated.
>>>>>> I don't see how this will help, you have an additional info on 
>>>>>> waht has been replicated (which is available on the consumer as 
>>>>>> well) and you have a max csn, but you don't know if there are 
>>>>>> outstanding fractional changes to be sent.
>>>>> Well you will know on master A what operations get replicated(this 
>>>>> updates the new ruv before sending any changes), and you can use 
>>>>> this ruv to compare against the other master B's ruv(in its 
>>>>> replication agreement). Maybe I am missing your point? 
>>>> MY point is that the question is, what is NOT yet replicated. 
>>>> Without fractional replication you have states of the ruv on all 
>>>> servers, and if ruv(A) > ruv(B) you know there are updates missing 
>>>> on B. With fractional, if (ruv(A) > ruv(B) this might be ok or not. 
>>>> If you keep an additional ruv on A when sending updates to be, you 
>>>> can only record what ws sent or attempted to send, but not what 
>>>> still has to be sent
>>>
>>> I agree with you Ludwig, but unless I missed something would not be 
>>> enough to know that the replica B is late or in sync ?
>>>
>>> For example, we have updates U1 U2 U3 and U4. U3 should be skipped 
>>> by fractional replication.
>>>
>>> replica RUV (tombstone) on master_A contains U4 and master_B replica 
>>> RUV contains U1.
>>> Let's assume that as initial value of the "replicated ruv" on 
>>> master_A we have U1.
>>> Starting a replication session, master_A should send U2 and update 
>>> the "replicated ruv" to U2.
>>> If the update is successfully applied on master_B, master_B replica 
>>> ruv is U2 and monitoring the two ruv shoud show they are in sync.
>> They are not, since U4 is not yet replicated, in master_A you see the 
>> "normal" ruv as U4 and the "replicated" ruv as U2, but you don't know 
>> how many changes are between U2 and U4 an if any of them should be 
>> replicated, the replicated ruv is more or less a local copy of the 
>> remote ruv
>
> Yes I agree they are not this is a transient status. Transient because 
> the RA will continue going through the changelog until it hits U4. At 
> this point it will write U4 in the "replicated RUV" and until master_B 
> will apply U4 both server will appear out of sync.
> My understanding is that this "replicated RUV" only says it is in sync 
> or not, but does not address how far a server is out of sync from the 
> other (how many updates are missing). When you say it is more or less 
> a copy, it is exactly what it is. If it is a copy => in sync, if it 
> different => out of sync.
maybe we need to define what "in sync" means. For me in sync means both 
servers have the same set of updates applied.

Forget fractional for a moment, if we have standard replication and 
master A is at U4 and master B is at U2, we say they are not in sync - 
or not ? You could keep a replicated ruv for thos as well, but this 
wouldn't change things.
>
>>> If the update is not applierd, master_B replica ruv stays at U1 and 
>>> the two ruv will show out of sync.
>>>
>>> In the first case, we have a transient status of 'in sync' because 
>>> the replica agreement will evaluate U3 then U4 then send U4 and 
>>> store it into the "replicated ruv". At this point master_A and 
>>> master_B will appear out of sync until master_B will apply U4.
>>> If U4 was to be skipped by fractional we have master_B ruv and 
>>> Master_A replicated ruv both showing U2 and that is correct both 
>>> servers are in sync.
>>>
>>> Mark instead of storing the replicated ruv in the replica, would not 
>>> be possible to store it into the replica agreement (one replicated 
>>> ruv per RA). So that it can solve the problem of different 
>>> fractional replication policy ?
>>>
>>>>> Do you mean changes that have not been read from the changelog 
>>>>> yet?  My plan was to update the new ruv in perform_operation() - 
>>>>> right after all the "stripping" has been done and there is 
>>>>> something to replicate.  We need to have a ruv for replicated 
>>>>> operations.
>>>>>
>>>>> I guess there are other scenarios I didn't think of, like if 
>>>>> replication is in a backoff state, and valid changes are coming 
>>>>> in.  Maybe, we could do test "stripping" earlier in the 
>>>>> replication process(when writing to the changelog?), and then 
>>>>> update the new ruv there instead of waiting until we try and send 
>>>>> the changes.
>>>>>>> Since we can not compare this "replicated ruv" to the replicas 
>>>>>>> tombstone ruv, we can instead compare the "replicated ruv" to 
>>>>>>> the ruv in the replica's repl agreement(unless it is a dedicated 
>>>>>>> consumer - here we might be able to still look at the db 
>>>>>>> tombstone ruv to determine the status).
>>>>>>>
>>>>>>> Problems with this approach:
>>>>>>>
>>>>>>> -  All the servers need to have the same replication 
>>>>>>> configuration(the same fractional replication policy and 
>>>>>>> attribute stripping) to give accurate results.
>>>>>>>
>>>>>>> -  If one replica has an agreement that does NOT filter the 
>>>>>>> updates, but has agreements that do filter updates, then we can 
>>>>>>> not correctly determine its synchronization state with the 
>>>>>>> fractional replicas.
>>>>>>>
>>>>>>> -  Performance hit from updating another ruv(in cn=config)?
>>>>>>>
>>>>>>>
>>>>>>> Fractional replication simply breaks our monitoring process.  
>>>>>>> I'm not sure, not without updating the repl protocol, that we 
>>>>>>> can cover all deployment scenarios(mixed fractional repl agmts, 
>>>>>>> etc). However, I "think" this approach would work for most 
>>>>>>> deployments(compared to none at the moment).  For IPA, since 
>>>>>>> they don't use consumers, this approach would work for them.  
>>>>>>> And finally, all of this would have to be handled by a updated 
>>>>>>> version of repl-monitor.pl.
>>>>>>>
>>>>>>> This is just my preliminary idea on how to handle this.  
>>>>>>> Feedback is welcome!!
>>>>>>>
>>>>>>> Thanks in advance,
>>>>>>> Mark
>>>>>>>
>>>>>>> -- 
>>>>>>> Mark Reynolds
>>>>>>> 389 Development Team
>>>>>>> Red Hat, Inc
>>>>>>> mreynolds at redhat.com
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> 389-devel mailing list
>>>>>>> 389-devel at lists.fedoraproject.org
>>>>>>> https://admin.fedoraproject.org/mailman/listinfo/389-devel
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> 389-devel mailing list
>>>>>> 389-devel at lists.fedoraproject.org
>>>>>> https://admin.fedoraproject.org/mailman/listinfo/389-devel
>>>>>
>>>>> -- 
>>>>> Mark Reynolds
>>>>> 389 Development Team
>>>>> Red Hat, Inc
>>>>> mreynolds at redhat.com
>>>>
>>>>
>>>>
>>>> --
>>>> 389-devel mailing list
>>>> 389-devel at lists.fedoraproject.org
>>>> https://admin.fedoraproject.org/mailman/listinfo/389-devel
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.fedoraproject.org/pipermail/389-devel/attachments/20131017/a6ff3b0f/attachment.html>