On Sun, Mar 02, 2014 at 05:38:23AM -0500, Nir Soffer wrote:
> 1. The vdsm lockspace is cleanly removed due to the loss of
> lockspace storage. The "reset yourself" message that was
> sent through the vdsm lockspace may or may not have been
> received (depending on how quickly the lockspace was cleared.)
>
> 1.a) if the message was received, the host will reset itself,
> even though the lockspace was removed.
>
> 1.b) if the message was not received, then the host will not
> reset itself, and will remain running with no lockspace.
>
> 2. The vdsm lockspace cannot be cleanly removed. sanlock
What do you mean by cannot be cleanly removed?
Roughly, here's how vdsm uses sanlock:
1. vdsm joins a lockspace (= acquires a host_id lease)
2. vdsm acquires a resource lease
3. vdsm specifies a "killpath" program or script that sanlock
can call to gracefully shut it down if its lease cannot be renewed.
(Things operate normally here for some time...)
(Then storage is lost.)
4. sanlock fails to renew the host_id lease
5. sanlock runs a "killpath" program or script against vdsm
6. vdsm tries to shut down gracefully
7. if vdsm can shut down cleanly, it releases its resource lease
8. sanlock sees no more resource leases are held in the lockspace
and can clear the lockspace, which disables the watchdog
9. if all of that happens within a given time, then the host
will not be reset by the watchdog
If vdsm was not able to cleanly shut down, it will not release its
resource lease, so sanlock will not be able to remove the lockspace, and
eventually the local watchdog will fire.