On Tue, Jul 23, 2013 at 07:28:54AM +0000, Qixiaozhen wrote:
Sanlock daemon was monitored by watchdog as default in the code.
With the watchdog monitor on , if the connection(iscsi session, fc and
etc.) between one node with SAN was off, the node that had been added
into the lockspace would be reset.
In my experiment, my server had been connected to several SANs, and a
cluster that contained several virtual machines was running in these
SANs. If one of these SANs was disconnected, the server would reboot. In
this situation, my virtual machines would be crashed, and so was the
When a storage device is disconnected, sanlock will first try to shut down
the processes using leases from the lost device. The "shut down" is done
in multiple steps:
- first it will try "graceful shutdown" by running a configured killpath
against the processes. This killpath program can do something like
suspend the process and release its leases, or cause them to
- if killpath is not configured, sanlock will send the processes SIGTERM.
- 40 seconds after killpath or SIGTERM, if the processes have not exited,
sanlock will send SIGKILL to the processes.
- 10 seconds after sending SIGKILL, if the processes have not exited,
then the watchdog will reset the host.
So, if you use all the capabilities of sanlock, you should be able to
handle the loss of storage without a reboot; at least most of the time.
The watchdog reboot exists as the last line of defense if your
applications cannot gracefully suspend/shutdown, and if they do not
response to SIGKILL. It should not be a common occurance.
I don't want the reset of server when it was disconnected with
I run the command "sanlock daemon -w 0" to solve this problem. And I
also modify the code in src/sanlock_internal.h, set the default value of
DEFAULT_USE_WATCHDOG to be 0.
You could set daemon options using /etc/sysconfig/sanlock
I am not familiar with source code of sanlock. So what the effect of
that if I have run command "sanlock daemon -w 0"?
If you do not use the watchdog, it's possible for multiple hosts to hold
the same lease at the same time, which can lead to corrupting the resource
that the lease protects.