On Thu, Mar 06, 2014 at 03:56:25PM -0500, Nir Soffer wrote:
field bits
-----------------
host_id 12
generation 32
send_msg 8
send_seq 12
Even if I didn't need to be consistent with the way this works elsewhere,
creating ad hoc field sizes like this would be unmanageable.
Acknowledge is just another message, so we send in the same field.
RESET = 0x01
ACK = 0xFF
The sender detects the ack by getting an ACK message in the lease of the
receiver with the sender host_id and generation and the same send_seq as
the sent message.
It doesn't work because we can easily have unregulated overlapping of
sending/receiving/acking messages, all trying to use the same fields at
once. The result is chaos. Perhaps in your very specific usage this
wouldn't happen, but this is at least a minimally generalized capability.
> Receiving acknowledgement
> -------------------------
>
> sanlock will not keep any state about the host messages it has sent or try
> to match acknowledgements. But, sanlock does keep track of other host's
> delta lease state, and that could include recv_from_host_id/recv_seq. We
> can add an api for the caller to query the recv_from_host_id/recv_seq for
> a given host_id.
This means that the clients has to remember sent messages sequence, so
implementing a simple fence agent script will be impossible. You will have
to create another process running from start of fencing, remembering the sent
message sequence, and polling sanlock daemon for the result.
I thought the program that sent the message (and got send_seq), would
itself want to watch for the ack (recv_seq). If it got the ack, it would
then procede to monitor the host status.
If they are different programs, there are ways of passing a number between
them. If it's truely difficult, then perhaps we could query both send_seq
and recv_seq from sanlock.
If sanlock does remember sent messages and check for acks, it will be
easier
to use it from other tools.
I think it's too unrelated to sanlock's main job. I'm really aiming for
as minimal and primitive and unintrusive as possible. One reason I don't
like the idea of acks is because a system that needs acks probably wants a
level of sophistication which sanlock simply can't provide (and shouldn't
because it's not the purpose of sanlock.) So I want to add the absolute
minimum that we need to implement your function.
> 1. It will not work with a fast reset option using
/proc/sysrq-trigger
> because there will not be enough time for the acknowledgement to be
> written before the host is reset. (With another independent message
> area, we could write an acknowledgement immediately, but borrowing the
> lockspace lease means we do not have this option.)
You can do a fast reset after the write to the storage finished, assuming
that the write is not asynchronous.
I'm not sure how I'd use sysrq-trigger yet anyway -- I don't like the idea
of encoding such a specific feature directly into sanlock. So, I'd need
to figure that out, and maybe whatever is doing sysrq-trigger could add
some delay to given the next renewal (with ack) a chance to complete.