[389-devel] fractional replication monitoring proposal
Mark Reynolds
mareynol at redhat.com
Thu Oct 17 16:01:38 UTC 2013
Thanks everyone for your feedback!
Ok I have written an initial fix, and here is how it works and what I am
seeing...
[1] An update comes it and we update the local RUV.
[2] We check this update against the fractional/stripped attrs in each
agmt.
[3] If this update does replicate to at least one agmt, we write a new
attribute to the local ruv (currently call "nsds50replruv" - we can
improve the names later). If it doesn't replicate to any replicas then
we don't update the new ruv attribute. This all happens at the same
time in write_changelog_and_ruv(). So there is no delay or copying of
useless ruv info, and we write to the local RUV instead of a new RUV in
cn=config(which I had originally proposed).
[4] Here we made an update that is stripped by fractional replication:
Master A:
ldapsearch -h localhost -D cn=dm -w password -b "dc=example,dc=com"
-xLLL
'(&(nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff)(objectclass=nstombstone))'
nsds50ruv nsds50replruv
dn: cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping tree,cn=config
nsds50ruv: {replica 1 ldap://localhost.localdomain:389}
52583d80000000010000 52600339000000010000
nsds50replruv: {replica 1 ldap://localhost.localdomain:389}
52583d80000000010000 5260030d000000010000
...
Master B
ldapsearch -h localhost -D cn=dm -w password -b "dc=example,dc=com"
-xLLL -p 22222
'(&(nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff)(objectclass=nstombstone))'
nsds50ruv nsds50replruv
dn: cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping tree,cn=config
nsds50ruv: {replica 1 ldap://localhost.localdomain:389}
52583d80000000010000 5260030d000000010000
nsds50replruv: {replica 1 ldap://localhost.localdomain:389}
52583d80000000010000 5260030d000000010000
...
[5] If we look at the "fractional" ruv (nsds50replruv) on Master A, it
does correctly line up with the ruv on master B(nsds50ruv).
[6] Then we make an update that does replicate, and now all the ruv's
line up.
Master A
ldapsearch -h localhost -D cn=dm -w password -b "dc=example,dc=com"
-xLLL
'(&(nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff)(objectclass=nstombstone))'
nsds50ruv nsds50replruv
dn: cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping tree,cn=config
nsds50ruv: {replica 1 ldap://localhost.localdomain:389}
52583d80000000010000 52600790000000010000
nsds50replruv: {replica 1 ldap://localhost.localdomain:389}
52583d80000000010000 52600790000000010000
Master B
ldapsearch -h localhost -D cn=dm -w password -b "dc=example,dc=com"
-xLLL -p 22222
'(&(nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff)(objectclass=nstombstone))'
nsds50ruv nsds50replruv
dn: cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping tree,cn=config
nsds50ruv: {replica 1 ldap://localhost.localdomain:389}
52583d80000000010000 52600790000000010000
nsds50replruv: {replica 1 ldap://localhost.localdomain:389}
52583d80000000010000 52600790000000010000
There are still the same problems with fix, as I mentioned before,
except we're not updating the dse config. Now, I am concerned about the
performance hit of checking to see if a mod gets "replicated".
As for the "sync" question, this fix does change how that behaves, or
how repl-monitor already works. It's either behind(by a certain amount
of time), or in sync. I'm not trying to improve the current repl status
model.
Anyway, I just wanted to see if I could get this working. Comments welcome.
Thanks again,
Mark
On 10/17/2013 05:44 AM, thierry bordaz wrote:
> On 10/17/2013 11:06 AM, Ludwig Krispenz wrote:
>>
>> On 10/17/2013 10:56 AM, thierry bordaz wrote:
>>> On 10/17/2013 10:49 AM, Ludwig Krispenz wrote:
>>>>
>>>> On 10/17/2013 10:15 AM, thierry bordaz wrote:
>>>>> On 10/16/2013 05:41 PM, Ludwig Krispenz wrote:
>>>>>>
>>>>>> On 10/16/2013 05:28 PM, Mark Reynolds wrote:
>>>>>>>
>>>>>>> On 10/16/2013 11:05 AM, Ludwig Krispenz wrote:
>>>>>>>>
>>>>>>>> On 10/15/2013 10:41 PM, Mark Reynolds wrote:
>>>>>>>>> https://fedorahosted.org/389/ticket/47368
>>>>>>>>>
>>>>>>>>> So we run into issues when trying to figure out if replicas
>>>>>>>>> are in synch(if those replicas use fractional replication and
>>>>>>>>> "strip mods"). What happens is that an update is made on
>>>>>>>>> master A, but due to fractional replication there is no update
>>>>>>>>> made to any replicas. So if you look at the ruv in the
>>>>>>>>> tombstone entry on each server, it would appear they are out
>>>>>>>>> of synch. So using the ruv in the db tombstone is no longer
>>>>>>>>> accurate when using fractional replication.
>>>>>>>>>
>>>>>>>>> I'm proposing a new ruv to be stored in the backend replica
>>>>>>>>> entry: e.g. cn=replica,cn="dc=example,dc=com",cn=mapping
>>>>>>>>> tree,cn=config. I'm calling this the "replicated ruv". So
>>>>>>>>> whenever we actually send an update to a replica, this ruv
>>>>>>>>> will get updated.
>>>>>>>> I don't see how this will help, you have an additional info on
>>>>>>>> waht has been replicated (which is available on the consumer as
>>>>>>>> well) and you have a max csn, but you don't know if there are
>>>>>>>> outstanding fractional changes to be sent.
>>>>>>> Well you will know on master A what operations get
>>>>>>> replicated(this updates the new ruv before sending any changes),
>>>>>>> and you can use this ruv to compare against the other master B's
>>>>>>> ruv(in its replication agreement). Maybe I am missing your point?
>>>>>> MY point is that the question is, what is NOT yet replicated.
>>>>>> Without fractional replication you have states of the ruv on all
>>>>>> servers, and if ruv(A) > ruv(B) you know there are updates
>>>>>> missing on B. With fractional, if (ruv(A) > ruv(B) this might be
>>>>>> ok or not. If you keep an additional ruv on A when sending
>>>>>> updates to be, you can only record what ws sent or attempted to
>>>>>> send, but not what still has to be sent
>>>>>
>>>>> I agree with you Ludwig, but unless I missed something would not
>>>>> be enough to know that the replica B is late or in sync ?
>>>>>
>>>>> For example, we have updates U1 U2 U3 and U4. U3 should be skipped
>>>>> by fractional replication.
>>>>>
>>>>> replica RUV (tombstone) on master_A contains U4 and master_B
>>>>> replica RUV contains U1.
>>>>> Let's assume that as initial value of the "replicated ruv" on
>>>>> master_A we have U1.
>>>>> Starting a replication session, master_A should send U2 and update
>>>>> the "replicated ruv" to U2.
>>>>> If the update is successfully applied on master_B, master_B
>>>>> replica ruv is U2 and monitoring the two ruv shoud show they are
>>>>> in sync.
>>>> They are not, since U4 is not yet replicated, in master_A you see
>>>> the "normal" ruv as U4 and the "replicated" ruv as U2, but you
>>>> don't know how many changes are between U2 and U4 an if any of them
>>>> should be replicated, the replicated ruv is more or less a local
>>>> copy of the remote ruv
>>>
>>> Yes I agree they are not this is a transient status. Transient
>>> because the RA will continue going through the changelog until it
>>> hits U4. At this point it will write U4 in the "replicated RUV" and
>>> until master_B will apply U4 both server will appear out of sync.
>>> My understanding is that this "replicated RUV" only says it is in
>>> sync or not, but does not address how far a server is out of sync
>>> from the other (how many updates are missing). When you say it is
>>> more or less a copy, it is exactly what it is. If it is a copy => in
>>> sync, if it different => out of sync.
>> maybe we need to define what "in sync" means. For me in sync means
>> both servers have the same set of updates applied.
>>
>> Forget fractional for a moment, if we have standard replication and
>> master A is at U4 and master B is at U2, we say they are not in sync
>> - or not ? You could keep a replicated ruv for thos as well, but this
>> wouldn't change things.
>
> I agree we need to agree of what "in sync" means :-)
>
> I would prefer to speak of 'fractional ruv' (in place of 'replicated
> ruv') for the new ruv proposed by Mark.
> 'replica ruv' being for the traditional ruv (tombstone) used in
> standard replication.
>
> With 'replica ruv' we are in sync when the 'replica ruv' on both side
> have the same value.
> With 'fractional ruv' we are in sync when the 'fractional ruv' on the
> supplier and the 'replica ruv' have the same value.
>
> In fractional replication, we have updates U1, U2, U3 and U4. Let's U3
> and U4 being skipped by fractional
> Let master_A 'replica ruv' is U4 and master_B 'replica ruv' is U2. And
> no new updates.
> From a standard replication point of view they are out of sync, but
> for fractional they are in sync.
>
> For fractional, how to know that that both masters are in sync. With
> Mark solution 'fractional ruv' shows U2.
>
> Now a new update arrives U5 that is not skipped by fractional.
> master_A 'replicat ruv' is U5 and master_B 'replica ruv' is U2.
> until the replica agreement starts a new replication session,
> 'fractional ruv' shows U2.
> The servers are shown 'in sync', because the RA has not yet started.
> From my understanding, the solution proposed by Mark has a drawback
> where for a transient period (time to the RA to start its jobs,
> evaluate and send U5, store it into the 'fractional ruv'), the servers
> will appear 'in sync' although they are not. It could be an issue with
> schedule replication but should be transient wrong status under normal
> condition.
>
>>>
>>>>> If the update is not applierd, master_B replica ruv stays at U1
>>>>> and the two ruv will show out of sync.
>>>>>
>>>>> In the first case, we have a transient status of 'in sync' because
>>>>> the replica agreement will evaluate U3 then U4 then send U4 and
>>>>> store it into the "replicated ruv". At this point master_A and
>>>>> master_B will appear out of sync until master_B will apply U4.
>>>>> If U4 was to be skipped by fractional we have master_B ruv and
>>>>> Master_A replicated ruv both showing U2 and that is correct both
>>>>> servers are in sync.
>>>>>
>>>>> Mark instead of storing the replicated ruv in the replica, would
>>>>> not be possible to store it into the replica agreement (one
>>>>> replicated ruv per RA). So that it can solve the problem of
>>>>> different fractional replication policy ?
>>>>>
>>>>>>> Do you mean changes that have not been read from the changelog
>>>>>>> yet? My plan was to update the new ruv in perform_operation() -
>>>>>>> right after all the "stripping" has been done and there is
>>>>>>> something to replicate. We need to have a ruv for replicated
>>>>>>> operations.
>>>>>>>
>>>>>>> I guess there are other scenarios I didn't think of, like if
>>>>>>> replication is in a backoff state, and valid changes are coming
>>>>>>> in. Maybe, we could do test "stripping" earlier in the
>>>>>>> replication process(when writing to the changelog?), and then
>>>>>>> update the new ruv there instead of waiting until we try and
>>>>>>> send the changes.
>>>>>>>>> Since we can not compare this "replicated ruv" to the replicas
>>>>>>>>> tombstone ruv, we can instead compare the "replicated ruv" to
>>>>>>>>> the ruv in the replica's repl agreement(unless it is a
>>>>>>>>> dedicated consumer - here we might be able to still look at
>>>>>>>>> the db tombstone ruv to determine the status).
>>>>>>>>>
>>>>>>>>> Problems with this approach:
>>>>>>>>>
>>>>>>>>> - All the servers need to have the same replication
>>>>>>>>> configuration(the same fractional replication policy and
>>>>>>>>> attribute stripping) to give accurate results.
>>>>>>>>>
>>>>>>>>> - If one replica has an agreement that does NOT filter the
>>>>>>>>> updates, but has agreements that do filter updates, then we
>>>>>>>>> can not correctly determine its synchronization state with the
>>>>>>>>> fractional replicas.
>>>>>>>>>
>>>>>>>>> - Performance hit from updating another ruv(in cn=config)?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Fractional replication simply breaks our monitoring process.
>>>>>>>>> I'm not sure, not without updating the repl protocol, that we
>>>>>>>>> can cover all deployment scenarios(mixed fractional repl
>>>>>>>>> agmts, etc). However, I "think" this approach would work for
>>>>>>>>> most deployments(compared to none at the moment). For IPA,
>>>>>>>>> since they don't use consumers, this approach would work for
>>>>>>>>> them. And finally, all of this would have to be handled by a
>>>>>>>>> updated version of repl-monitor.pl.
>>>>>>>>>
>>>>>>>>> This is just my preliminary idea on how to handle this.
>>>>>>>>> Feedback is welcome!!
>>>>>>>>>
>>>>>>>>> Thanks in advance,
>>>>>>>>> Mark
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Mark Reynolds
>>>>>>>>> 389 Development Team
>>>>>>>>> Red Hat, Inc
>>>>>>>>> mreynolds at redhat.com
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> 389-devel mailing list
>>>>>>>>> 389-devel at lists.fedoraproject.org
>>>>>>>>> https://admin.fedoraproject.org/mailman/listinfo/389-devel
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> 389-devel mailing list
>>>>>>>> 389-devel at lists.fedoraproject.org
>>>>>>>> https://admin.fedoraproject.org/mailman/listinfo/389-devel
>>>>>>>
>>>>>>> --
>>>>>>> Mark Reynolds
>>>>>>> 389 Development Team
>>>>>>> Red Hat, Inc
>>>>>>> mreynolds at redhat.com
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> 389-devel mailing list
>>>>>> 389-devel at lists.fedoraproject.org
>>>>>> https://admin.fedoraproject.org/mailman/listinfo/389-devel
>>>>>
>>>>
>>>
>>
>
--
Mark Reynolds
389 Development Team
Red Hat, Inc
mreynolds at redhat.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.fedoraproject.org/pipermail/389-devel/attachments/20131017/bc4c20e1/attachment.html>
More information about the 389-devel
mailing list