problems with mptscsih / FC5
Nigel Wade
nmw at ion.le.ac.uk
Wed Nov 29 09:48:40 UTC 2006
Grant Ozolins wrote:
> Hi all,
>
> We enountered an unusual condition with the LSI SCSI drivers last night
> - we got an "attempted task abort", followed by about 10 minutes of no
> messages (I wasn't logged in at the time but believe the system was
> unresponsive), followed by several minutes more of similar messages,
> repeated:
>
> Nov 27 22:08:48 hostname kernel: mptscsih: ioc0: attempting task abort!
> (sc=ffff810078cb49c0)
> Nov 27 22:08:48 hostname kernel: sd 0:0:0:0:
> Nov 27 22:08:48 hostname kernel: command: Read(10): 28 00 01 fb
> b7 4d 00 00 08 00
> Nov 27 22:08:48 hostname kernel: mptbase: ioc0: IOCStatus(0x0048): SCSI
> Task Terminated
> Nov 27 22:08:48 hostname kernel: mptscsih: ioc0: task abort: SUCCESS
> (sc=ffff810078cb49c0)
> Nov 27 22:08:58 hostname kernel: mptscsih: ioc0: attempting task abort!
> (sc=ffff810078cb49c0)
> Nov 27 22:08:58 hostname kernel: sd 0:0:0:0:
> Nov 27 22:08:58 hostname kernel: command: Test Unit Ready: 00 00
> 00 00 00 00
>
> and then...
> Nov 27 22:19:38 hostname kernel: mptbase: ioc0: IOCStatus(0x0048): SCSI
> Task Terminated
> Nov 27 22:19:40 hostname kernel: mptscsih: ioc0: task abort: SUCCESS
> (sc=ffff810078cb49c0)
> Nov 27 22:19:41 hostname kernel: mptscsih: ioc0: attempting task abort!
> (sc=ffff81005b759e00)
> Nov 27 22:19:42 hostname kernel: sd 0:0:0:0:
> Nov 27 22:19:42 hostname kernel: command: Read(10): 28 00 01 fb
> b7 fd 00 00 08 00
> Nov 27 22:19:42 hostname kernel: mptbase: ioc0: IOCStatus(0x0048): SCSI
> Task Terminated
> Nov 27 22:19:42 hostname kernel: mptscsih: ioc0: task abort: SUCCESS
> (sc=ffff81005b759e00)
> Nov 27 22:19:42 hostname kernel: mptscsih: ioc0: attempting task abort!
> (sc=ffff81005b759e00)
> Nov 27 22:19:42 hostname kernel: sd 0:0:0:0:
> Nov 27 22:19:42 hostname kernel: command: Test Unit Ready: 00 00
> 00 00 00 00
> Nov 27 22:19:43 hostname kernel: mptbase: ioc0: IOCStatus(0x0048): SCSI
> Task Terminated
> Nov 27 22:19:43 hostname kernel: mptscsih: ioc0: task abort: SUCCESS
> (sc=ffff81005b759e00)
> Nov 27 22:19:43 hostname kernel: mptscsih: ioc0: attempting task abort!
> (sc=ffff810091491e00)
> Nov 27 22:19:43 hostname kernel: sd 0:0:0:0:
> Nov 27 22:19:43 hostname kernel: command: Read(10): 28 00 01 fb
> b8 75 00 00 08 00
> Nov 27 22:19:43 hostname kernel: mptbase: ioc0: IOCStatus(0x0048): SCSI
> Task Terminated
> Nov 27 22:19:43 hostname kernel: mptscsih: ioc0: task abort: SUCCESS
> (sc=ffff810091491e00)
> (... etc)
>
> This is on a dual opteron 1.8Ghz Tyan system, running a fairly minimal
> FC5 install.
>
> The problem resolved itself after a little while, but the web server
> became unresponsive during this time - obviously it looks like some kind
> of loop in the LSI kernel module - has anyone seen anything like this
> before? It looks like something to report to the linux-kernel or
> linux-scsi mailing lists, but as I'm using FC5 I thought I'd ask here
> first.
>
> Thanks in advance,
>
It looks to me like whatever the LSI device driver was attempting to talk to is
failing. The apparent sequence is that a SCSI command has failed to work
correctly (a read, by the looks of it), there is a SCSI command abort followed
by the device driver issuing Test Unit Ready. The TUR either took 10 mins. or
the device driver timed out after 10mins. Another read command then failed in a
similar way, resulting in the same sequence of events. At some point, presumably
whatever was attempting to read the device gave up and all went back to normal.
Does that device normally work ok? It might be going faulty, or there might be a
cabling or termination issue. I doubt it's a device driver fault, it looks to me
like a hardware read error/timeout issue, followed by re-tries, and/or
additional failed attempts.
If daytime TV made shows about computers, this one would be "When SCSI devices
go bad."
--
Nigel Wade, System Administrator, Space Plasma Physics Group,
University of Leicester, Leicester, LE1 7RH, UK
E-mail : nmw at ion.le.ac.uk
Phone : +44 (0)116 2523548, Fax : +44 (0)116 2523555
More information about the users
mailing list