Hi, all
Sanlock daemon was monitored by watchdog as default in the code.
With the watchdog monitor on , if the connection(iscsi session, fc and etc.) between one node with SAN was off, the node that had been added into the lockspace would be reset.
In my experiment, my server had been connected to several SANs, and a cluster that contained several virtual machines was running in these SANs. If one of these SANs was disconnected, the server would reboot. In this situation, my virtual machines would be crashed, and so was the cluster.
I don't want the reset of server when it was disconnected with SAN.
I run the command "sanlock daemon -w 0" to solve this problem. And I also modify the code in src/sanlock_internal.h, set the default value of DEFAULT_USE_WATCHDOG to be 0.
I am not familiar with source code of sanlock. So what the effect of that if I have run command "sanlock daemon -w 0"?
Thank you.
Qi
On Tue, Jul 23, 2013 at 07:28:54AM +0000, Qixiaozhen wrote:
Sanlock daemon was monitored by watchdog as default in the code.
With the watchdog monitor on , if the connection(iscsi session, fc and etc.) between one node with SAN was off, the node that had been added into the lockspace would be reset.
In my experiment, my server had been connected to several SANs, and a cluster that contained several virtual machines was running in these SANs. If one of these SANs was disconnected, the server would reboot. In this situation, my virtual machines would be crashed, and so was the cluster.
When a storage device is disconnected, sanlock will first try to shut down the processes using leases from the lost device. The "shut down" is done in multiple steps:
- first it will try "graceful shutdown" by running a configured killpath against the processes. This killpath program can do something like suspend the process and release its leases, or cause them to gracefully/cleanly exit.
- if killpath is not configured, sanlock will send the processes SIGTERM.
- 40 seconds after killpath or SIGTERM, if the processes have not exited, sanlock will send SIGKILL to the processes.
- 10 seconds after sending SIGKILL, if the processes have not exited, then the watchdog will reset the host.
So, if you use all the capabilities of sanlock, you should be able to handle the loss of storage without a reboot; at least most of the time. The watchdog reboot exists as the last line of defense if your applications cannot gracefully suspend/shutdown, and if they do not response to SIGKILL. It should not be a common occurance.
I don't want the reset of server when it was disconnected with SAN.
I run the command "sanlock daemon -w 0" to solve this problem. And I also modify the code in src/sanlock_internal.h, set the default value of DEFAULT_USE_WATCHDOG to be 0.
You could set daemon options using /etc/sysconfig/sanlock
I am not familiar with source code of sanlock. So what the effect of that if I have run command "sanlock daemon -w 0"?
If you do not use the watchdog, it's possible for multiple hosts to hold the same lease at the same time, which can lead to corrupting the resource that the lease protects.
Dave
sanlock-devel@lists.fedorahosted.org