Hi Dave,
Sorry for the late response, but we know now better what we would
like to do, and will hopefully waste less of your time.
----- Original Message -----
From: "David Teigland" <teigland(a)redhat.com>
To: "Nir Soffer" <nsoffer(a)redhat.com>
Cc: sanlock-devel(a)lists.fedorahosted.org, "Ayal Baron"
<abaron(a)redhat.com>, "Federico Simoncelli"
<fsimonce(a)redhat.com>, "Allon Mureinik" <amureini(a)redhat.com>
Sent: Wednesday, January 29, 2014 7:16:09 PM
Subject: Re: [PATCH] sanlock: host_message
On Wed, Jan 29, 2014 at 07:49:29AM -0500, Nir Soffer wrote:
> > The host messages are sent from one host to another via
> > a lockspace that both hosts are using. If no lockspace
> > name is specified, the sanlock daemon will search for a
> > common lockspace to use. (N.B. hosts do not necessarily
> > use the same host_id in all lockspaces, so not specifying
> > the lockspace could result in targeting the wrong host.)
>
> I think that making the lockspace a required parameter makes
> more sense and will avoid fatal errors.
You should specify the lockspace if you know it, then this won't matter.
Can you describe a situation where guessing the lockspace is useful?
> > The lockspace used to transmit the message may or may not
> > have any other relation to the message itself.
> >
> > A host can send one message to a one other host at a time.
>
> Can we increase this number, simplifying (unlikely) case where
> more then one host need to be fenced?
This was the one big question I had about the design. If it's necessary
to address more than one host simultaneously I can do that, but I'll need
to go back and come up with a more complex design. The existing design is
simple (and completely compatible with the existing format) because it
uses three unused fields in the delta lease area. So, perhaps think a
little more about how important this would be and let me know.
We would like to be able to fence more than once host at a time, but having
backward compatible format is more important.
This can help when you have some network issue that cause many hosts to
become in accessible, and you have high-available vms on those hosts, that
should be started as soon as possible on another host.
> > The message is placed in the sending host's delta lease,
> > and remains there for two renewals. When the receiving
> > host renews its own delta lease, it checks the delta leases
> > of all other hosts, and sees itself addressed in the sending
> > host's lease. It then processes the message from the
> > sending host.
>
> Why did you choose the keep the message for two renewals?
Because the targeted host would generally observe it in that time.
> We would like to have this value configurable, to make
> it easier to solve issues in the field.
How configurable would you need:
1. a daemon config option (set it when the daemon starts)
I think this will good enough.
2. an duration-based api option (set it when you call the function
in terms of seconds to remain active.)
3. an on/off api option (one function call to set the message,
and a second function call to turn it off.)
> I think we have to handle the (unlikely) case, where a host
> lost its lease without seeing the WD_RESET message, then
> acquire the lease again (not sure if this is possible in vdsm
> currently).
I don't understand this. Maybe it's helpful to think about the
differences among:
- power fencing
. toggle the power of victim host on a switch
. assume that worked
. let programs use locks/resources that the victim had been using
- storage fencing
. cut off storage access of victim host by turning off a switch port,
(or removing its SCSI persistent reservation)
. assume that worked
. let programs use locks/resources that the victim had been using
- sanlock/wdmd/watchdog lease protection
. wait for a fixed timeout from the victim host's last storage renewal
. assume that the victim's watchdog has reset due to no lease renewal
. let programs use locks/resources that the victim had been using
- sanlock/wdmd/watchdog lease protection + WD_RESET host_message
. send victim the WD_RESET host_message, which would cause the victim
to force it's own watchdog to expire in a minute
. assume nothing about the receipt or effect of WD_RESET
. wait for a fixed timeout from the victim host's last storage renewal
. assume that the victim's watchdog has reset due to no lease renewal
. let programs use locks/resources that the victim had been using
We do not want to assume that victim's watcdog has reset the machine.
What we plan to do is to wait until the host is up and query the state
of the vms, before we start these vms on another host.
Notice that:
- the end goal/result is the same in all cases
- there are assumptions made in all cases
Some assumptions are more likely. When you talk to a power management
device and it tells you that the machine is powered off, there is very
little chance that this is not true.
Nir