I think the following example illustrates the problem with the current
plan to use the vdsm lockspace for fencing:
hostA and hostB vdsm using a common lockspace
hostB sends hostA a "reset yourself" message via the vdsm lockspace
hostA storage fails for vdsm lockspace around the same time
hostA sanlock gracefully shuts down vdsm and removes the lockspace
hostA has no storage access and cannot see the message from B
The "problem" here is the graceful cleanup and removal of the vdsm
lockspace. This graceful cleanup is done precisely to *avoid*
having the watchdog reset the host. In effect what we want are two
different lockspaces with two opposite behaviors:
1. the vdsm lockspace that wants to *avoid* a watchdog reset at all costs
2. a "fencing" lockspace that wants to *cause* a watchdog reset at all costs
Trying to tweak the behavior of sanlock to do these two opposite things is
not going to work out well, I suspect. sanlock tries very hard to either
"avoid" or "cause" the reset in each case.
Here's a solution that I think would work well at the sanlock level, but
it requires a new domain format for vdsm:
Create a new lockspace dedicated to the fencing behavior.
(This is what the fence_sanlock daemon/agent do.)
This new lockspace would *not* be killed or gracefully cleaned up
if storage is lost. This way, either the lockspace message would cause
the host to be reset, or if lockspace storage is lost, (and no messages
are possible), the failure to renew the lockspace lease would cause the
host to be reset. The same result is guaranteed in either case.
Dave