On Sat, Jul 2, 2016 at 1:55 PM, Chris Murphy lists@colorremedies.com wrote:
On Fri, Jul 1, 2016 at 4:19 PM, Richard Shaw hobbes1069@gmail.com wrote:
Thanks for the detailed reply so forgive me for not quoting :)
I'm not ready to believe the drive is bad, I think what happened is that I didn't realize it but playing around with cockpit I accidentally installed tuned, which is interesting in concept but the aggressive power management did not play nice with the hard drive. I have since removed tuned and going to monitor things for a while.
It's actually a fair point that without a discrete error message from the drive, it isn't necessarily the drive. And that's the problem with the "hard resetting link" message, is it obscures the actual problem. It could be the drive, connectors, cable, or controller - including going into some power save mode and not waking up (in time) and causing problems. All we know is a write command was sent, and there was no response, and the kernel command timer expired and started to do link resets, that clears the whole command queue (that's possibly 31 tags) and the ensuing noop back to ext4 basically made it go WTF rather than just requeue (?) So you end up with a bunch of scary messages...
smartctl -x /dev/sdX might reveal something. Depending on what reporting features it has, it might record command errors. But in any case, the attributes list would show some suspicious counts, in particular reallocated sectors. If that's 0 or not pretty high (dozens) then the drive still has reserve sectors and the write error is bogus. It just happened to be a write command the drive didn't respond to rather than failure to write.