Hi, all
I hava read the source code of sanlock some days. But I have a question, it is that why scsi command was not used in the 'diskio.c'? The scsi command can catch much more exception the io.
Could someone help me?
diskio.c
/* write aligned io buffer */
int write_iobuf(int fd, uint64_t offset, char *iobuf, int iobuf_len, struct task *task, int ioto) { if (task && task->use_aio == 1) return do_write_aio_linux(fd, offset, iobuf, iobuf_len, task, ioto); else if (task && task->use_aio == 2) return do_write_aio_posix(fd, offset, iobuf, iobuf_len, task, ioto); else return do_write(fd, offset, iobuf, iobuf_len, task); }
On Fri, Jun 07, 2013 at 12:11:46AM +0800, tsiren tsi wrote:
Hi, all
I hava read the source code of sanlock some days. But I have a question, it is that why scsi command was not used in the 'diskio.c'? The scsi command can catch much more exception the io.
Could someone help me?
Hi,
How would you use a scsi command here?
If we did get more exception information, what would you use it for?
Dave
diskio.c
/* write aligned io buffer */
int write_iobuf(int fd, uint64_t offset, char *iobuf, int iobuf_len, struct task *task, int ioto) { if (task && task->use_aio == 1) return do_write_aio_linux(fd, offset, iobuf, iobuf_len, task, ioto); else if (task && task->use_aio == 2) return do_write_aio_posix(fd, offset, iobuf, iobuf_len, task, ioto); else return do_write(fd, offset, iobuf, iobuf_len, task); }
In the diskio.c, if the scsi command was used, the read or write timeout error would be more circumstantial. When the read/write timeout occurs, the scsi command could distinguish the actual reason, io busy, not ready, hardware error and so on. If the reason was io busy, we can enlarge the timeout-time for robustness.
What do you think about this?
Appreciate for your reply. Thanks
http://en.wikipedia.org/wiki/SCSI_Request_Sense_Command
Sense Key Name Description 0h No Sense Indicates there is no specific Sense Key information to be reported for the disc drive. This would be the case for a successful command or when the ILI bit is one. 1h Recovered Error Indicates the last command completed successfully with some recovery action performed by the disc drive. When multiple recovered errors occur, the last error that occurred is reported by the additional sense bytes. Note: For some Mode settings, the last command may have terminated before completing. 2h Not Ready Indicates the logical unit addressed cannot be accessed. Operator intervention may be required to correct this condition. 3h Medium Error Indicates the command terminated with a non-recovered error condition, probably caused by a flaw in the medium or an error in the recorded data. 4h Hardware Error Indicates the disc drive detected a nonrecoverable hardware failure while performing the command or during a self test. This includes SCSI interface parity error, controller failure or device failure. 5h Illegal Request Indicates an illegal parameter in the command descriptor block or in the additional parameters supplied as data for some commands (Format Unit, Mode Select, and so forth). If the disc drive detects an invalid parameter in the Command Descriptor Block, it shall terminate the command without altering the medium. If the disc drive detects an invalid parameter in the additional parameters supplied as data, the disc drive may have already altered the medium. This sense key may also indicate that an invalid IDENTIFY message was received. This could also indicate an attempt to write past the last logical block. 6h Unit Attention Indicates the disc drive may have been reset. 7h Data Protect Indicates that a command that reads or writes the medium was attempted on a block that is protected from this operation. The read or write operation is not performed. 9h Firmware Error Vendor specific sense key. Bh Aborted Command Indicates the disc drive aborted the command. The initiator may be able to recover by trying the command again. Ch Equal Indicates a SEARCH DATA command has satisfied an equal comparison. Dh Volume Overflow Indicates a buffered peripheral device has reached the end of medium partition and data remains in the buffer that has not been written to the medium. Eh Miscompare Indicates that the source data did not match the data read from the medium.
2013/6/7 David Teigland teigland@redhat.com
On Fri, Jun 07, 2013 at 12:11:46AM +0800, tsiren tsi wrote:
Hi, all
I hava read the source code of sanlock some days. But I have a question,
it
is that why scsi command was not used in the 'diskio.c'? The scsi command can catch much more exception the io.
Could someone help me?
Hi,
How would you use a scsi command here?
If we did get more exception information, what would you use it for?
Dave
diskio.c
/* write aligned io buffer */
int write_iobuf(int fd, uint64_t offset, char *iobuf, int iobuf_len, struct task *task, int ioto) { if (task && task->use_aio == 1) return do_write_aio_linux(fd, offset, iobuf, iobuf_len, task,
ioto);
else if (task && task->use_aio == 2) return do_write_aio_posix(fd, offset, iobuf, iobuf_len, task,
ioto);
else return do_write(fd, offset, iobuf, iobuf_len, task);
}
On Fri, Jun 07, 2013 at 10:27:43PM +0800, tsiren tsi wrote:
In the diskio.c, if the scsi command was used, the read or write timeout error would be more circumstantial. When the read/write timeout occurs, the scsi command could distinguish the actual reason, io busy, not ready, hardware error and so on. If the reason was io busy, we can enlarge the timeout-time for robustness.
What do you think about this?
The only way I know of using scsi commands from userland is with sg. sg is not very practical for i/o, and would be a big code change. sanlock is used on multipath lvm LVs, which makes sg difficult. Also, sanlock can be used on both devices and NFS files, and it is nice to use the same code for both.
A more reasonable suggestion would be to keep the existing i/o paths and use /dev/sg to get extra scsi information. However, if you think about how sanlock works, this extra information would not really help. This is because it is not the host with i/o problems that needs extra information, it is the *other* hosts that are monitoring it. All the other hosts would need to enlarge the timeout (and they would also need this new timeout to be consistent among everyone.)
My suggestion is to monitor the io delays (and their causes) in your environment. If you find there are io delays, caused by system/storage load, and they trigger timeouts (or come close to timing out), then enlarge the io timeout used with sanlock_add_lockspace_timeout.
Dave
Thanks for your relpy.
I am reading the source code of sanlock, but the effect of io delays must be taken into consideration.
I will enlarge the parameter of sanlock_add_lockspace_timeout in case of io delay.
Thank you very much. : )
2013/6/7 David Teigland teigland@redhat.com
On Fri, Jun 07, 2013 at 10:27:43PM +0800, tsiren tsi wrote:
In the diskio.c, if the scsi command was used, the read or write timeout error would be more circumstantial. When the read/write timeout occurs,
the
scsi command could distinguish the actual reason, io busy, not ready, hardware error and so on. If the reason was io busy, we can enlarge the timeout-time for robustness.
What do you think about this?
The only way I know of using scsi commands from userland is with sg. sg is not very practical for i/o, and would be a big code change. sanlock is used on multipath lvm LVs, which makes sg difficult. Also, sanlock can be used on both devices and NFS files, and it is nice to use the same code for both.
A more reasonable suggestion would be to keep the existing i/o paths and use /dev/sg to get extra scsi information. However, if you think about how sanlock works, this extra information would not really help. This is because it is not the host with i/o problems that needs extra information, it is the *other* hosts that are monitoring it. All the other hosts would need to enlarge the timeout (and they would also need this new timeout to be consistent among everyone.)
My suggestion is to monitor the io delays (and their causes) in your environment. If you find there are io delays, caused by system/storage load, and they trigger timeouts (or come close to timing out), then enlarge the io timeout used with sanlock_add_lockspace_timeout.
Dave
sanlock-devel@lists.fedorahosted.org