version2, I didn't check well previously and I missed the fact that
both wdmd and sanlock fork. This new version fixes that and worked
for me on a f20 VM.
We might want to consider running wdmd and sanlock non-forking
(passing the -D parameter) with systemd so that the output goes
to the journal.
Antoni S. Puimedon (1):
systemd: prevent sigkill from being sent
init.d/sanlock-tmpfile.conf | 1 +
init.d/sanlock.service | 4 ++--
init.d/wdmd | 3 +++
init.d/wdmd-tmpfile.conf | 1 +
init.d/wdmd.service | 5 +++--
5 files changed, 10 insertions(+), 4 deletions(-)
create mode 100644 init.d/sanlock-tmpfile.conf
create mode 100644 init.d/wdmd-tmpfile.conf
I'm working now on sanlock fencing feature for ovirt.
I have some questions and suggestions regarding the patch.
> A host can send a predefined msg_num to another host.
> The host messages are sent from one host to another via
> a lockspace that both hosts are using. If no lockspace
> name is specified, the sanlock daemon will search for a
> common lockspace to use. (N.B. hosts do not necessarily
> use the same host_id in all lockspaces, so not specifying
> the lockspace could result in targeting the wrong host.)
I think that making the lockspace a required parameter makes
more sense and will avoid fatal errors.
> The lockspace used to transmit the message may or may not
> have any other relation to the message itself.
> A host can send one message to a one other host at a time.
Can we increase this number, simplifying (unlikely) case where
more then one host need to be fenced?
> The message is placed in the sending host's delta lease,
> and remains there for two renewals. When the receiving
> host renews its own delta lease, it checks the delta leases
> of all other hosts, and sees itself addressed in the sending
> host's lease. It then processes the message from the
> sending host.
Why did you choose the keep the message for two renewals?
We would like to have this value configurable, to make
it easier to solve issues in the field.
I think we have to handle the (unlikely) case, where a host
lost its lease without seeing the WD_RESET message, then
acquire the lease again (not sure if this is possible in vdsm
currently). The fencing host may assume wrongly that the host
was fenced in this case.
What if we leave the WD_RESET message until the fencing host
send a WD_UNRESET message?
This way I can send a WD_RESET message, wait for some renwals,
ensuring that the host either lost it's lease, or will *not*
get a new lease, until I decide to allow the host to get one.
We may have a case where we cannot access a host, we fence it,
ensuring that it cannot access the storage, but the host never
see the fence request, and keeping it in "fenced" mode is
required until we can reboot the bost using power management
> If a message is currently active in a lockspace, the
> sending host_message call will return -EBUSY. After two
> renewals (around 40 seconds), another message may be sent.
> An optional host generation can be included, in which
> case the receiving host_id will accept the message only
> if its current generation matches.
> The single msg_num defined here is WD_RESET (1), which
> means that the host receiving the message should use
> its watchdog device to reset itself as soon as possible.
> The WD_RESET message has no effect on any lockspaces
> or resources that may exist. Existing lockspaces and
> resources continue to operate as usual until the reset.
> (A watchdog reset due to "standard" lockspace failure
> could in fact occur before the watchdog reset caused
> by the host message.)
> Because host messages may not be received if the
> destination host fails, or looses storage access,
> there are no guaranteed times associated with the
> delivery, processing or effect of a host message.
> Guaranteed times for another host being dead should
> continue to be based on either acquiring a resource,
> or sanlock_get_hosts().
What would be the best way to detect that the host was fenced?
> TODO: will be adding another msg_num to cause the
> destination to use /proc/sysrq-trigger to reboot itself.
> (After setting up the watchdog to reset the machine in
> case the sysrq mechanism fails.) The sysrq reboot is
> immediate, whereas the watchdog takes a minute to reset.
Maybe use WD_FENCE, and let sanlock use the best available
method for fencing?
Can we customize the actions taken by sanlock when receiving
the WD_RESET message? For example, running a script after
the message was received?
What would be the best way to detect if a host supports
the new fencing feature - check its sanlock version?
On Sun, Mar 02, 2014 at 04:33:24AM -0500, Saggi Mizrahi wrote:
> I think the problem is that we are even trying to "fence" the host.
> What we really want is to for sanlock to try and release a HostID lease
> IIRC if sanlock can't do it (kill all the related processes etc)
> it fences the host.
I don't yet understand what the desired outcome in each situation is, i.e.
when do we want the host to be reset, when do we want to migrate vms, when
do we want to suspend or kill vms so we can release the leases, etc. The
mechanisms should mostly exist to do what we want, but how and when to
apply them is unclear to me.
> If we make a unique ID (call it instance) for every time we acquire
> a hostID lease (but use the same when we renew) we could have the
> message address the lockspace and an instance. This means that if
> the host was able to release everything the message would no longer
> apply to him. Couple this with the host ID we can see that the
> hostID\instanceID pair has changed and clear the message.
The instanceID sounds a lot like the existing "host id generation", which
is incremented each time that a host_id is acquired. The host messages
are already addressed to a specific host_id/host_generation, so if a host
loses its host_id lease, returns, and reacquires it (with a new generation
number), the previous message will no longer apply.