> I think the following example illustrates the problem with the
current
> plan to use the vdsm lockspace for fencing:
I thought I was being somewhat unfair to the existing plan for
vdsm-sanlock-fencing, so I worked out a description of it for
myself that I think would be a reasonable solution.
I think the plan was to solve the limited problem of fencing a
host with sanlock when network communication is lost, but the
storage remains functional. (And intend to solve the loss of
both later.) The problem I described in the last mail covered
the loss of both.
If we assume that storage remains functional, then we can
assume that a "reset yourself" message is received, and can
verify this by watching the host status in the functional
lockspace (it will eventually become "dead" after the
necessary host_id lease timeout.)
However, we still need to be aware of the case when both
network and storage are lost, and revert to a reasonable
state, even if it's not to be solved entirely.
I think the possible outcomes would be:
1. The vdsm lockspace is cleanly removed due to the loss of
lockspace storage. The "reset yourself" message that was
sent through the vdsm lockspace may or may not have been
received (depending on how quickly the lockspace was cleared.)
1.a) if the message was received, the host will reset itself,
even though the lockspace was removed.
1.b) if the message was not received, then the host will not
reset itself, and will remain running with no lockspace.
2. The vdsm lockspace cannot be cleanly removed. sanlock
will reset the host, either because of the "reset yourself"
message, or because the lockspace lease expires when it
cannot be cleared. The result is the same regardless.
1.b is the condition that we do not intend to solve immediately,
and the question that interests me is whether this state could
be reliably detected. I think there's a fair chance it could be.
The condition should be implied by the fact that the host has
cleanly released its host_id lease in all lockspaces. This would
not be true for the other conditions. One doubt I have is how this
would be distinguished from a fresh, initial state of the host.
Perhaps rhev/vdsm could distinguish this, but I don't think sanlock
could.
Dave