Questionable Status
Robin Laing
Robin.Laing at drdc-rddc.gc.ca
Thu Oct 1 13:09:40 UTC 2009
Tony Nelson wrote:
> On 09-09-23 09:29:56, Gene Poole wrote:
>> I've very recently upgraded 2 of my machines. One machine was
>> upgraded from Fedora 9 to Fedora 11, and the other machine was
>> upgraded from Fedora 10 to Fedora 11. On machine 1 I have 2-hard
>> disks (both Seagate's - 500 GB and 1000 GB), on machine 2 I have 1-
>> hard disk (Western Digital 320 GB). All of the interfaces are SATA.
>> The questionable status is that on machine 1 the 500 GB drive is
>> showing as failing and on machine 2 the 20 GB drive is showing as
>> failing. Neither drive, under the old releases, showed up as failing.
>> How do I know that these drive are truly failing?
>
> 1) Wait. If the disk is going bad, it will fail.
>
> 2) Run as root `smartctl -A /dev/sdx` (for each sdx) and look at the
> "WHEN_FAILED" column; it will be "-" if not failed.
>
> 3) Run as root `smartctl -a /dev/sdx` (for each sdx) and look at the
> whole output.
>
> 4) Run as root `smartctl -t long /dev/sdx` (for each sdx) and wait
> until the time the test should finish, then view the results with
> `smartctl -l selftest /dev/sdx` (for each sdx) or `smartctl -a /dev/
> sdx` (for each sdx).
>
> See `man smartctl`.
>
> Note that the new disk health monitoring tool "palimpsest" in package
> gnome-disk-utility is panicky and not to be trusted, unless you like
> buying lots of hard drives. It doesn't just look at "WHEN_FAILED", but
> has its own criteria such as nonzero Reallocated_Event_Count, which is
> fairly normal for a modern drive that has been in use for a while. A
> nonzero Current_Pending_Sector or Offline_Uncorrectable are bad, as
> they mean data loss, though not general drive failure. I recommend
> enabling Automatic Offline Testing with `smartctl -o on /dev/sdx` (for
> each sdx), which will do a surface scan every few hours, giving the
> best chance to repair or recover any sectors that are going bad.
>
Will the `smartctl -o on /dev/sdx` (for > each sdx), fix the nonzero
Reallocated_Event_Count issue on RAID arrays in a non-desctructive way?
Do you have to use the /dev/sdx devices or the /dev/md devices?
Good pointers in the mean time.
--
Robin Laing
More information about the users
mailing list