Thanks everyone for your feedback!
Ok I have written an initial fix, and here is how it works and what I am
seeing...
[1] An update comes it and we update the local RUV.
[2] We check this update against the fractional/stripped attrs in each
agmt.
[3] If this update does replicate to at least one agmt, we write a new
attribute to the local ruv (currently call "nsds50replruv" - we can
improve the names later). If it doesn't replicate to any replicas then
we don't update the new ruv attribute. This all happens at the same
time in write_changelog_and_ruv(). So there is no delay or copying of
useless ruv info, and we write to the local RUV instead of a new RUV in
cn=config(which I had originally proposed).
[4] Here we made an update that is stripped by fractional replication:
Master A:
ldapsearch -h localhost -D cn=dm -w password -b "dc=example,dc=com"
-xLLL
'(&(nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff)(objectclass=nstombstone))'
nsds50ruv nsds50replruv
dn: cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping tree,cn=config
nsds50ruv: {replica 1 ldap://localhost.localdomain:389}
52583d80000000010000 52600339000000010000
nsds50replruv: {replica 1 ldap://localhost.localdomain:389}
52583d80000000010000 5260030d000000010000
...
Master B
ldapsearch -h localhost -D cn=dm -w password -b "dc=example,dc=com"
-xLLL -p 22222
'(&(nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff)(objectclass=nstombstone))'
nsds50ruv nsds50replruv
dn: cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping tree,cn=config
nsds50ruv: {replica 1 ldap://localhost.localdomain:389}
52583d80000000010000 5260030d000000010000
nsds50replruv: {replica 1 ldap://localhost.localdomain:389}
52583d80000000010000 5260030d000000010000
...
[5] If we look at the "fractional" ruv (nsds50replruv) on Master A, it
does correctly line up with the ruv on master B(nsds50ruv).
[6] Then we make an update that does replicate, and now all the ruv's
line up.
Master A
ldapsearch -h localhost -D cn=dm -w password -b "dc=example,dc=com"
-xLLL
'(&(nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff)(objectclass=nstombstone))'
nsds50ruv nsds50replruv
dn: cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping tree,cn=config
nsds50ruv: {replica 1 ldap://localhost.localdomain:389}
52583d80000000010000 52600790000000010000
nsds50replruv: {replica 1 ldap://localhost.localdomain:389}
52583d80000000010000 52600790000000010000
Master B
ldapsearch -h localhost -D cn=dm -w password -b "dc=example,dc=com"
-xLLL -p 22222
'(&(nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff)(objectclass=nstombstone))'
nsds50ruv nsds50replruv
dn: cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping tree,cn=config
nsds50ruv: {replica 1 ldap://localhost.localdomain:389}
52583d80000000010000 52600790000000010000
nsds50replruv: {replica 1 ldap://localhost.localdomain:389}
52583d80000000010000 52600790000000010000
There are still the same problems with fix, as I mentioned before,
except we're not updating the dse config. Now, I am concerned about the
performance hit of checking to see if a mod gets "replicated".
As for the "sync" question, this fix does change how that behaves, or
how repl-monitor already works. It's either behind(by a certain amount
of time), or in sync. I'm not trying to improve the current repl status
model.
Anyway, I just wanted to see if I could get this working. Comments welcome.
Thanks again,
Mark
On 10/17/2013 05:44 AM, thierry bordaz wrote:
On 10/17/2013 11:06 AM, Ludwig Krispenz wrote:
>
> On 10/17/2013 10:56 AM, thierry bordaz wrote:
>> On 10/17/2013 10:49 AM, Ludwig Krispenz wrote:
>>>
>>> On 10/17/2013 10:15 AM, thierry bordaz wrote:
>>>> On 10/16/2013 05:41 PM, Ludwig Krispenz wrote:
>>>>>
>>>>> On 10/16/2013 05:28 PM, Mark Reynolds wrote:
>>>>>>
>>>>>> On 10/16/2013 11:05 AM, Ludwig Krispenz wrote:
>>>>>>>
>>>>>>> On 10/15/2013 10:41 PM, Mark Reynolds wrote:
>>>>>>>>
https://fedorahosted.org/389/ticket/47368
>>>>>>>>
>>>>>>>> So we run into issues when trying to figure out if
replicas
>>>>>>>> are in synch(if those replicas use fractional replication
and
>>>>>>>> "strip mods"). What happens is that an update
is made on
>>>>>>>> master A, but due to fractional replication there is no
update
>>>>>>>> made to any replicas. So if you look at the ruv in the
>>>>>>>> tombstone entry on each server, it would appear they are
out
>>>>>>>> of synch. So using the ruv in the db tombstone is no
longer
>>>>>>>> accurate when using fractional replication.
>>>>>>>>
>>>>>>>> I'm proposing a new ruv to be stored in the backend
replica
>>>>>>>> entry: e.g.
cn=replica,cn="dc=example,dc=com",cn=mapping
>>>>>>>> tree,cn=config. I'm calling this the "replicated
ruv". So
>>>>>>>> whenever we actually send an update to a replica, this
ruv
>>>>>>>> will get updated.
>>>>>>> I don't see how this will help, you have an additional
info on
>>>>>>> waht has been replicated (which is available on the consumer
as
>>>>>>> well) and you have a max csn, but you don't know if there
are
>>>>>>> outstanding fractional changes to be sent.
>>>>>> Well you will know on master A what operations get
>>>>>> replicated(this updates the new ruv before sending any changes),
>>>>>> and you can use this ruv to compare against the other master
B's
>>>>>> ruv(in its replication agreement). Maybe I am missing your
point?
>>>>> MY point is that the question is, what is NOT yet replicated.
>>>>> Without fractional replication you have states of the ruv on all
>>>>> servers, and if ruv(A) > ruv(B) you know there are updates
>>>>> missing on B. With fractional, if (ruv(A) > ruv(B) this might be
>>>>> ok or not. If you keep an additional ruv on A when sending
>>>>> updates to be, you can only record what ws sent or attempted to
>>>>> send, but not what still has to be sent
>>>>
>>>> I agree with you Ludwig, but unless I missed something would not
>>>> be enough to know that the replica B is late or in sync ?
>>>>
>>>> For example, we have updates U1 U2 U3 and U4. U3 should be skipped
>>>> by fractional replication.
>>>>
>>>> replica RUV (tombstone) on master_A contains U4 and master_B
>>>> replica RUV contains U1.
>>>> Let's assume that as initial value of the "replicated ruv"
on
>>>> master_A we have U1.
>>>> Starting a replication session, master_A should send U2 and update
>>>> the "replicated ruv" to U2.
>>>> If the update is successfully applied on master_B, master_B
>>>> replica ruv is U2 and monitoring the two ruv shoud show they are
>>>> in sync.
>>> They are not, since U4 is not yet replicated, in master_A you see
>>> the "normal" ruv as U4 and the "replicated" ruv as U2,
but you
>>> don't know how many changes are between U2 and U4 an if any of them
>>> should be replicated, the replicated ruv is more or less a local
>>> copy of the remote ruv
>>
>> Yes I agree they are not this is a transient status. Transient
>> because the RA will continue going through the changelog until it
>> hits U4. At this point it will write U4 in the "replicated RUV" and
>> until master_B will apply U4 both server will appear out of sync.
>> My understanding is that this "replicated RUV" only says it is in
>> sync or not, but does not address how far a server is out of sync
>> from the other (how many updates are missing). When you say it is
>> more or less a copy, it is exactly what it is. If it is a copy => in
>> sync, if it different => out of sync.
> maybe we need to define what "in sync" means. For me in sync means
> both servers have the same set of updates applied.
>
> Forget fractional for a moment, if we have standard replication and
> master A is at U4 and master B is at U2, we say they are not in sync
> - or not ? You could keep a replicated ruv for thos as well, but this
> wouldn't change things.
I agree we need to agree of what "in sync" means :-)
I would prefer to speak of 'fractional ruv' (in place of 'replicated
ruv') for the new ruv proposed by Mark.
'replica ruv' being for the traditional ruv (tombstone) used in
standard replication.
With 'replica ruv' we are in sync when the 'replica ruv' on both side
have the same value.
With 'fractional ruv' we are in sync when the 'fractional ruv' on the
supplier and the 'replica ruv' have the same value.
In fractional replication, we have updates U1, U2, U3 and U4. Let's U3
and U4 being skipped by fractional
Let master_A 'replica ruv' is U4 and master_B 'replica ruv' is U2. And
no new updates.
From a standard replication point of view they are out of sync, but
for fractional they are in sync.
For fractional, how to know that that both masters are in sync. With
Mark solution 'fractional ruv' shows U2.
Now a new update arrives U5 that is not skipped by fractional.
master_A 'replicat ruv' is U5 and master_B 'replica ruv' is U2.
until the replica agreement starts a new replication session,
'fractional ruv' shows U2.
The servers are shown 'in sync', because the RA has not yet started.
From my understanding, the solution proposed by Mark has a drawback
where for a transient period (time to the RA to start its jobs,
evaluate and send U5, store it into the 'fractional ruv'), the servers
will appear 'in sync' although they are not. It could be an issue with
schedule replication but should be transient wrong status under normal
condition.
>>
>>>> If the update is not applierd, master_B replica ruv stays at U1
>>>> and the two ruv will show out of sync.
>>>>
>>>> In the first case, we have a transient status of 'in sync'
because
>>>> the replica agreement will evaluate U3 then U4 then send U4 and
>>>> store it into the "replicated ruv". At this point master_A and
>>>> master_B will appear out of sync until master_B will apply U4.
>>>> If U4 was to be skipped by fractional we have master_B ruv and
>>>> Master_A replicated ruv both showing U2 and that is correct both
>>>> servers are in sync.
>>>>
>>>> Mark instead of storing the replicated ruv in the replica, would
>>>> not be possible to store it into the replica agreement (one
>>>> replicated ruv per RA). So that it can solve the problem of
>>>> different fractional replication policy ?
>>>>
>>>>>> Do you mean changes that have not been read from the changelog
>>>>>> yet? My plan was to update the new ruv in perform_operation() -
>>>>>> right after all the "stripping" has been done and there
is
>>>>>> something to replicate. We need to have a ruv for replicated
>>>>>> operations.
>>>>>>
>>>>>> I guess there are other scenarios I didn't think of, like if
>>>>>> replication is in a backoff state, and valid changes are coming
>>>>>> in. Maybe, we could do test "stripping" earlier in the
>>>>>> replication process(when writing to the changelog?), and then
>>>>>> update the new ruv there instead of waiting until we try and
>>>>>> send the changes.
>>>>>>>> Since we can not compare this "replicated ruv"
to the replicas
>>>>>>>> tombstone ruv, we can instead compare the
"replicated ruv" to
>>>>>>>> the ruv in the replica's repl agreement(unless it is
a
>>>>>>>> dedicated consumer - here we might be able to still look
at
>>>>>>>> the db tombstone ruv to determine the status).
>>>>>>>>
>>>>>>>> Problems with this approach:
>>>>>>>>
>>>>>>>> - All the servers need to have the same replication
>>>>>>>> configuration(the same fractional replication policy and
>>>>>>>> attribute stripping) to give accurate results.
>>>>>>>>
>>>>>>>> - If one replica has an agreement that does NOT filter
the
>>>>>>>> updates, but has agreements that do filter updates, then
we
>>>>>>>> can not correctly determine its synchronization state
with the
>>>>>>>> fractional replicas.
>>>>>>>>
>>>>>>>> - Performance hit from updating another ruv(in
cn=config)?
>>>>>>>>
>>>>>>>>
>>>>>>>> Fractional replication simply breaks our monitoring
process.
>>>>>>>> I'm not sure, not without updating the repl protocol,
that we
>>>>>>>> can cover all deployment scenarios(mixed fractional repl
>>>>>>>> agmts, etc). However, I "think" this approach
would work for
>>>>>>>> most deployments(compared to none at the moment). For
IPA,
>>>>>>>> since they don't use consumers, this approach would
work for
>>>>>>>> them. And finally, all of this would have to be handled
by a
>>>>>>>> updated version of repl-monitor.pl.
>>>>>>>>
>>>>>>>> This is just my preliminary idea on how to handle this.
>>>>>>>> Feedback is welcome!!
>>>>>>>>
>>>>>>>> Thanks in advance,
>>>>>>>> Mark
>>>>>>>>
>>>>>>>> --
>>>>>>>> Mark Reynolds
>>>>>>>> 389 Development Team
>>>>>>>> Red Hat, Inc
>>>>>>>> mreynolds(a)redhat.com
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> 389-devel mailing list
>>>>>>>> 389-devel(a)lists.fedoraproject.org
>>>>>>>>
https://admin.fedoraproject.org/mailman/listinfo/389-devel
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> 389-devel mailing list
>>>>>>> 389-devel(a)lists.fedoraproject.org
>>>>>>>
https://admin.fedoraproject.org/mailman/listinfo/389-devel
>>>>>>
>>>>>> --
>>>>>> Mark Reynolds
>>>>>> 389 Development Team
>>>>>> Red Hat, Inc
>>>>>> mreynolds(a)redhat.com
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> 389-devel mailing list
>>>>> 389-devel(a)lists.fedoraproject.org
>>>>>
https://admin.fedoraproject.org/mailman/listinfo/389-devel
>>>>
>>>
>>
>
--
Mark Reynolds
389 Development Team
Red Hat, Inc
mreynolds(a)redhat.com