Questionable Status

Robin Laing Robin.Laing at drdc-rddc.gc.ca
Thu Oct 1 13:09:40 UTC 2009


Tony Nelson wrote:
> On 09-09-23 09:29:56, Gene Poole wrote:
>> I've very recently upgraded 2 of my machines.  One machine was
>> upgraded from Fedora 9 to Fedora 11, and the other machine was 
>> upgraded from Fedora 10 to Fedora 11.  On machine 1 I have 2-hard 
>> disks (both Seagate's - 500 GB and 1000 GB), on machine 2 I have 1-
>> hard disk (Western Digital 320 GB).  All of the interfaces are SATA.  
>> The questionable status is that on machine 1 the 500 GB drive is 
>> showing as failing and on machine 2 the 20 GB drive is showing as 
>> failing. Neither drive, under the old releases, showed up as failing. 
>> How do I know that these drive are truly failing?
> 
> 1) Wait.  If the disk is going bad, it will fail.
> 
> 2) Run as root `smartctl -A /dev/sdx` (for each sdx) and look at the 
> "WHEN_FAILED" column; it will be "-" if not failed.
> 
> 3) Run as root `smartctl -a /dev/sdx` (for each sdx) and look at the 
> whole output.
> 
> 4) Run as root `smartctl -t long /dev/sdx` (for each sdx) and wait 
> until the time the test should finish, then view the results with 
> `smartctl -l selftest /dev/sdx` (for each sdx) or `smartctl -a /dev/
> sdx` (for each sdx).
> 
> See `man smartctl`.
> 
> Note that the new disk health monitoring tool "palimpsest" in package 
> gnome-disk-utility is panicky and not to be trusted, unless you like 
> buying lots of hard drives.  It doesn't just look at "WHEN_FAILED", but 
> has its own criteria such as nonzero Reallocated_Event_Count, which is 
> fairly normal for a modern drive that has been in use for a while.  A 
> nonzero Current_Pending_Sector or Offline_Uncorrectable are bad, as 
> they mean data loss, though not general drive failure.  I recommend 
> enabling Automatic Offline Testing with `smartctl -o on /dev/sdx` (for 
> each sdx), which will do a surface scan every few hours, giving the
> best chance to repair or recover any sectors that are going bad.
> 

Will the `smartctl -o on /dev/sdx` (for > each sdx), fix the nonzero 
Reallocated_Event_Count issue on RAID arrays in a non-desctructive way? 
  Do you have to use the /dev/sdx devices or the /dev/md devices?

Good pointers in the mean time.

-- 
Robin Laing




More information about the users mailing list