On Fri, Jun 07, 2013 at 10:27:43PM +0800, tsiren tsi wrote:
In the diskio.c, if the scsi command was used, the read or write
error would be more circumstantial. When the read/write timeout occurs, the
scsi command could distinguish the actual reason, io busy, not ready,
hardware error and so on. If the reason was io busy, we can enlarge the
timeout-time for robustness.
What do you think about this?
The only way I know of using scsi commands from userland is with sg.
sg is not very practical for i/o, and would be a big code change.
sanlock is used on multipath lvm LVs, which makes sg difficult.
Also, sanlock can be used on both devices and NFS files, and it is
nice to use the same code for both.
A more reasonable suggestion would be to keep the existing i/o paths
and use /dev/sg to get extra scsi information. However, if you think
about how sanlock works, this extra information would not really help.
This is because it is not the host with i/o problems that needs extra
information, it is the *other* hosts that are monitoring it. All
the other hosts would need to enlarge the timeout (and they would
also need this new timeout to be consistent among everyone.)
My suggestion is to monitor the io delays (and their causes) in your
environment. If you find there are io delays, caused by system/storage
load, and they trigger timeouts (or come close to timing out), then
enlarge the io timeout used with sanlock_add_lockspace_timeout.