On Tue, Aug 05, 2014 at 11:20:52AM -0500, Russell Jones wrote:
Are there any features/code within Sanlock that would cause it to
stop petting the watchdog if it can't renew/reach a lockspace?
sanlock tries its best to ensure that the watchdog will trigger
if lockspace access is lost, so long as processes are running
that are using it. Once all pids have exited (or been suspended
if that is configured), sanlock tries its best to prevent the
watchdog from firing.
There are about 50 seconds for all the pids to exit (or suspend themselves
and release their leases). There are a couple of simple explanations for
why one or more pids may not be able to do this within 50 seconds:
- If the pids are configured to suspend themselves or shut down cleanly,
this can take more than the allowed time. Without this "graceful"
shutdown period, sanlock would immediatley use SIGTERM/SIGKILL on them,
which is more likely to complete in time.
- If the pids were using the lost storage, they can get stuck doing i/o.
This could either block a clean shutdown, or make it unkillable if the
i/o path is stuck in an uninterruptible sleep.