On Sun, Mar 02, 2014 at 05:32:07AM -0500, Nir Soffer wrote:
----- Original Message -----
> From: "David Teigland" <teigland(a)redhat.com>
> To: "Nir Soffer" <nsoffer(a)redhat.com>
> Cc: "Allon Mureinik" <amureini(a)redhat.com>, "Ayal Baron"
<abaron(a)redhat.com>, sanlock-devel(a)lists.fedorahosted.org,
> fsimonce(a)redhat.com, smizrahi(a)redhat.com
> Sent: Wednesday, February 26, 2014 11:58:25 PM
> Subject: Re: [PATCH] sanlock: host_message
>
> I think the following example illustrates the problem with the current
> plan to use the vdsm lockspace for fencing:
>
> hostA and hostB vdsm using a common lockspace
Do you mean the host id lockspace?
Sorry for being unclear here. I'm not entirely sure what lockspace(s)
vdsm is using, so I was being vague. I guess we'll need to sort out
exactly what lockspaces we're talking about.
> hostB sends hostA a "reset yourself" message via the
vdsm lockspace
> hostA storage fails for vdsm lockspace around the same time
What do you mean by fails for vdsm lockspace?
I meant that sanlock fails to renew its host_id lease in this vaguely
defined lockspace. We speak about this condition with some interchangable
terms:
- storage is lost / storage access is lost
- the storage is the storage on which the sanlock leases exist
- the effect is that sanlock can no longer renew its host_id lease
- we may say that the lockspace fails at this point or that lockspace
enters recovery
That's all referencing the same situation.
> Create a new lockspace dedicated to the fencing behavior.
> (This is what the fence_sanlock daemon/agent do.)
Do you mean host id like lockspace (1MB for 2000 hosts)?
Yes, a sanlock_write_lockspace() / sanlock_add_lockspace() that is
dedicated to the fencing behavior and not used for anything else.
> This new lockspace would *not* be killed or gracefully cleaned
up
> if storage is lost. This way, either the lockspace message would cause
> the host to be reset, or if lockspace storage is lost, (and no messages
> are possible), the failure to renew the lockspace lease would cause the
> host to be reset. The same result is guaranteed in either case.
I don't think we like to fence a host if it lost access to some storage,
or even to all storage.
If we can access the host through the network, we would like to migrate
the vms on it to another host instead of killing the vms.
OK, I'll need to learn a little more about what behavior you want in each
circumstance.