F14 smartctl - unable to complete smart test

J. Randall Owens jrowens.fedora at ghiapet.net
Mon Feb 28 02:30:05 UTC 2011


On 02/27/2011 11:10 AM, Michał Piotrowski wrote:
> Hi,
> 
> This can be a hardware problem - hard to say. For some reason on one
> of the disks smart test is interrupted
> 
> # 1  Extended offline    Interrupted (host reset)      90%     12489         -
> # 2  Extended offline    Interrupted (host reset)      90%     12484         -
> 
> 
> I see this in dmesg
> 
> [ 4328.800100] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
> [ 4328.800129] ata3.00: failed command: WRITE DMA EXT
> [ 4328.800153] ata3.00: cmd 35/00:08:7e:dc:9f/00:00:2c:00:00/e0 tag 0
> dma 4096 out
> [ 4328.800157]          res 40/00:00:02:4f:c2/00:00:00:00:00/00 Emask
> 0x4 (timeout)
> [ 4328.800190] ata3.00: status: { DRDY }
> [ 4333.849048] ata3: link is slow to respond, please be patient (ready=0)
> [ 4338.847048] ata3: device not ready (errno=-16), forcing hardreset
> [ 4338.847063] ata3: soft resetting link
> [ 4339.837375] ata3.00: configured for UDMA/133
> [ 4339.837407] ata3: EH complete
> 
> I'm using 2.6.37.2 with config based on an old rawhide 2.6.37. I have
> not noticed other problems with this disc. What might be causing this
> interrupts?

I've been having similar problems lately.  First my laptop, and I
assumed a hardware problem, so I replaced the HDD.  Then the server
started doing it, which seemed quite a coincidence, but because its
uptime was around two months at the time and it was still running a
2.6.35.9 kernel while my laptop problems started with 2.6.35.11, I
thought it was just coincidence.  Now, if you bring this up, I'm not so
sure.

Here's what I saw happening on the laptop:
[ 1199.706084] ata1.00: exception Emask 0x0 SAct 0x7fffffff SErr 0x0
action 0x6 frozen
[ 1199.706094] ata1.00: failed command: WRITE FPDMA QUEUED
[ 1199.706101] ata1.00: cmd 61/08:00:67:48:3f/00:00:16:00:00/40 tag 0
ncq 4096 out
[ 1199.706106] ata1.00: status: { DRDY }
(repeat the above 3 lines many times)
[ 1199.706533] ata1: hard resetting link
[ 1209.754149] ata1: softreset failed (device not ready)
[ 1209.754155] ata1: hard resetting link
[ 1219.802039] ata1: softreset failed (device not ready)
[ 1219.802046] ata1: hard resetting link
[ 1230.360039] ata1: link is slow to respond, please be patient (ready=0)
[ 1239.438047] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 1239.444280] ata1.00: configured for UDMA/133
[ 1239.444286] ata1.00: device reported invalid CHS sector 0
(repeat above 1 line many times)
[ 1239.444431] ata1: EH complete
[ 1318.752164] ata1.00: exception Emask 0x0 SAct 0x70040b0 SErr 0x0
action 0x6 frozen
[ 1318.752171] ata1.00: failed command: WRITE FPDMA QUEUED
[ 1318.752178] ata1.00: cmd 61/48:20:e7:1b:ac/00:00:22:00:00/40 tag 4
ncq 36864 out
[ 1318.752183] ata1.00: status: { DRDY }
(repeat above 3 lines many many times, lather, rinse, repeat)

In the meantime, the system almost completely freezes up, and the disk
activity light stays on.

On the server:
[6968144.832829] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[6968144.832829] ata1.00: failed command: READ MULTIPLE
[6968144.832829] ata1.00: cmd c4/00:20:fb:a5:df/00:00:00:00:00/ef tag 0
pio 16384 in
[6968144.832829] ata1.00: status: { DRDY ERR }
[6968144.832829] ata1.00: error: { UNC }
[6968144.852104] ata1.00: configured for PIO0
[6968144.852125] ata1: EH complete
(repeat above 7 lines several times)

And again, the system almost completely freezes up, except that it still
routes traffic through it in the meantime.  It's easily reproducible by
starting up MPD, which causes it quickly when it accesses the music in
my main $HOME directory. (I don't use MPD on the laptop, so that's not
the problem in any way.)

Could this possibly be a bug in something besides the kernel?  That
might explain why the server started getting it despite not having a new
kernel.  And I'd like to know before I go out buying more HDDs.

-- 
J. Randall Owens | http://www.ghiapet.net/

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 261 bytes
Desc: OpenPGP digital signature
Url : http://lists.fedoraproject.org/pipermail/devel/attachments/20110227/29b55908/attachment.bin 


More information about the devel mailing list