I'm getting SMART errors, but I'm not sure how much credence to give them. It seems to be the same two e-mails over and over.
I recently had to bring the pc to the shop because it wouldn't power on. I thought that the power supply had failed, but the guy did something to the BIOS. I don't have the hardware details at the moment, but could the error be more of a configuration thing rather than an actual hard drive thing?
One of the e-mail's:
PINE 4.64 MESSAGE TEXT Folder: INBOX Message 51 of 57 ALL
Date: Sun, 29 Apr 2007 07:51:00 +0100 From: root root@localhost.localdomain To: root@localhost.localdomain Subject: SMART error (CurrentPendingSector) detected on host: localhost.localdomain
This email was generated by the smartd daemon running on:
host name: localhost.localdomain DNS domain: localdomain NIS domain: (none)
The following warning/error was logged by the smartd daemon:
Device: /dev/hdb, 2 Currently unreadable (pending) sectors
For details see host's SYSLOG (default: /var/log/messages).
You can also use the smartctl utility for further investigation. No additional email messages about this problem will be sent.
Thanks,
Thufir
Around 11:58am on Sunday, April 29, 2007 (UK time), Thufir scrawled:
I'm getting SMART errors, but I'm not sure how much credence to give them. It seems to be the same two e-mails over and over.
<snip>
Device: /dev/hdb, 2 Currently unreadable (pending) sectors
I have been getting this message, along with a "offline uncorrectable sectors" message from smart on one of my machines for a few years now (and I'm not exaggerating). Its never had any other problems, and I've learned to live with it and stop worrying.
Of course it isn't a mission critical machine and it is backed up regularly :-)
Steve
Thufir writes:
I'm getting SMART errors, but I'm not sure how much credence to give them. It seems to be the same two e-mails over and over.
It's all a matter of how much you value the data and your time.
If there's nothing on this disk that you particularly care about, and if it crashes and burns you won't get inconvenienced much, at least not beyond the time it'll take to replace the disk and reinstall everything, then you can ignore this and just ride it out until the hard drive blows out completely. It may take weeks, months, or years, before this hard drive will give out completely. You never know.
But, if you have some valuable data on this drive, and unexpected downtime is going to be a pain in the neck for you, then you should begin making organized plans now to migrate and copy your data off this drive, and onto a replacement disk, while you still have time.
On Sun, Apr 29, 2007 at 10:58:48 +0000, Thufir hawat.thufir@gmail.com wrote:
I'm getting SMART errors, but I'm not sure how much credence to give them. It seems to be the same two e-mails over and over.
The following warning/error was logged by the smartd daemon:
Device: /dev/hdb, 2 Currently unreadable (pending) sectors
It might be that the drive firmware has a bug where as the pending sector count doesn't always get cleared when the sectors are reallocated. I have a Maxtor drive that has that problem.
It might also be that you have never written of the bad sectors so the drive can't reallocate them. If they really can't be read a long scan should be showing up a bad sector that you can then find the file it is contained in (so you know what you are losing) and then rewrite that sector (actually you want to rewrite the whole 8 sector block to keep the OS from trying to read the surrounding sectors). If the sector is permanently bad the drive should reallocate it. Sometimes just an isolated write was bad and the sector doesn't need to be remapped.
"BW" == Bruno Wolff bruno@wolff.to writes:
BW> It might be that the drive firmware has a bug where as the pending BW> sector count doesn't always get cleared when the sectors are BW> reallocated. I have a Maxtor drive that has that problem.
I've seen this on Western Digital and Seagate drives as well.
BW> It might also be that you have never written of the bad sectors so BW> the drive can't reallocate them.
I have used the "wipe drive" functionality in each drive's diagnostic utilities and used dd to zero the entire drive as well. The drives still report problems.
Unfortunately at this point I usually just throw them out, which is terribly wasteful but I can't trust them for important data and I can't get the manufacturer to replace them without an actual failure of their diagnostic tools.
- J<
Jason L Tibbitts III wrote:
[snip]
I have used the "wipe drive" functionality in each drive's diagnostic utilities and used dd to zero the entire drive as well. The drives still report problems.
Unfortunately at this point I usually just throw them out, which is terribly wasteful but I can't trust them for important data and I can't get the manufacturer to replace them without an actual failure of their diagnostic tools.
This is a consequence of a trade-off. Any time sth gets more complex, it has greater liklihood of failure. When I was technical lead, I used to have to insist that my engineers not put in any code to accomplish anything not listed in the requirements and design documents. Every extra line of code is another place for a defect to hide. Adding more code to the firmware in the drives, in an attempt to make the drives last longer, makes the liklihood that the eventual failure is due to a defect in the firmware rather than the hardware itself, rise. There is an optimum point where additional complexity of the firmware decreases the eventual lifetime of the product.
This is true for all products, not just disc drives.
That's one reason I don't like SeLinux and LVM. Although the intent is make the machine more usable, it also adds more potential points of failure. Also, the more complex a product is, the more complex can be the failure modes, making them more difficult to diagnose.
Mike
Thufir:
I'm getting SMART errors, but I'm not sure how much credence to give them. It seems to be the same two e-mails over and over.
The following warning/error was logged by the smartd daemon:
Device: /dev/hdb, 2 Currently unreadable (pending) sectors
Bruno Wolff III:
It might be that the drive firmware has a bug where as the pending sector count doesn't always get cleared when the sectors are reallocated. I have a Maxtor drive that has that problem.
It might also be that you have never written of the bad sectors so the drive can't reallocate them. If they really can't be read a long scan should be showing up a bad sector that you can then find the file it is contained in (so you know what you are losing) and then rewrite that sector (actually you want to rewrite the whole 8 sector block to keep the OS from trying to read the surrounding sectors). If the sector is permanently bad the drive should reallocate it. Sometimes just an isolated write was bad and the sector doesn't need to be remapped.
I had a system using LVM fail to boot, and when assessed using another drive (that would boot), got error reports like the above. Wherever the errors were, it was some place that LVM really did not like. I used dd to overwrite the entire drive, to try and force a write to wherever it was, and force the drive to try and fix what it could, and the errors got cleared up.
Nothing else I had tried cleared up the errors. The system got reinstalled, without LVM, just to see if the drive would keep on working, and it has. It's many months later, and there's no error reports, including while making deliberate checks.
Tim wrote:
Thufir:
I'm getting SMART errors, but I'm not sure how much credence to give them. It seems to be the same two e-mails over and over.
The following warning/error was logged by the smartd daemon:
Device: /dev/hdb, 2 Currently unreadable (pending) sectors
Bruno Wolff III:
It might be that the drive firmware has a bug where as the pending sector count doesn't always get cleared when the sectors are reallocated. I have a Maxtor drive that has that problem.
It might also be that you have never written of the bad sectors so the drive can't reallocate them. If they really can't be read a long scan should be showing up a bad sector that you can then find the file it is contained in (so you know what you are losing) and then rewrite that sector (actually you want to rewrite the whole 8 sector block to keep the OS from trying to read the surrounding sectors). If the sector is permanently bad the drive should reallocate it. Sometimes just an isolated write was bad and the sector doesn't need to be remapped.
I had a system using LVM fail to boot, and when assessed using another drive (that would boot), got error reports like the above. Wherever the errors were, it was some place that LVM really did not like. I used dd to overwrite the entire drive, to try and force a write to wherever it was, and force the drive to try and fix what it could, and the errors got cleared up.
Nothing else I had tried cleared up the errors. The system got reinstalled, without LVM, just to see if the drive would keep on working, and it has. It's many months later, and there's no error reports, including while making deliberate checks.
I have had the same issue with a Western Digital as well. A WD5200 started giving these errors in FC6. Less than a year old.
I have since replaced the drive and put the WD in a USB carrier. Two complete wipes and reformats and tests and no errors found.
I wonder if it is an issue with SMART and the drives?
Tim:
I had a system using LVM fail to boot, and when assessed using another drive (that would boot), got error reports like the above. Wherever the errors were, it was some place that LVM really did not like. I used dd to overwrite the entire drive, to try and force a write to wherever it was, and force the drive to try and fix what it could, and the errors got cleared up.
Nothing else I had tried cleared up the errors. The system got reinstalled, without LVM, just to see if the drive would keep on working, and it has. It's many months later, and there's no error reports, including while making deliberate checks.
Robin Laing:
I have had the same issue with a Western Digital as well. A WD5200 started giving these errors in FC6. Less than a year old.
I have since replaced the drive and put the WD in a USB carrier. Two complete wipes and reformats and tests and no errors found.
I wonder if it is an issue with SMART and the drives?
This was also a WD. I've had three in a row cark it, after the second that was the last of them I'd buy. The third was given to me. I'm less than impressed by anything they've made, so my bet's on there being something crappy about them (whether they have real drive faults, or badly managed fault detection).