Questionable Status
Tony Nelson
tonynelson at georgeanelson.com
Thu Oct 1 17:54:32 UTC 2009
On 09-10-01 09:09:40, Robin Laing wrote:
> Tony Nelson wrote:
> > On 09-09-23 09:29:56, Gene Poole wrote:
> >> I've very recently upgraded 2 of my machines. One machine was
> >> upgraded from Fedora 9 to Fedora 11, and the other machine was
> >> upgraded from Fedora 10 to Fedora 11. On machine 1 I have 2-hard
> >> disks (both Seagate's - 500 GB and 1000 GB), on machine 2 I have
> >> 1- hard disk (Western Digital 320 GB). All of the interfaces are
> >> SATA. The questionable status is that on machine 1 the 500 GB
> >> drive is showing as failing and on machine 2 the 20 GB drive is
> >> showing as failing. Neither drive, under the old releases, showed
> >> up as failing. How do I know that these drive are truly failing?
> >
> > 1) Wait. If the disk is going bad, it will fail.
> >
> > 2) Run as root `smartctl -A /dev/sdx` (for each sdx) and look at
> > the "WHEN_FAILED" column; it will be "-" if not failed.
> >
> > 3) Run as root `smartctl -a /dev/sdx` (for each sdx) and look at
> > the whole output.
> >
> > 4) Run as root `smartctl -t long /dev/sdx` (for each sdx) and wait
> > until the time the test should finish, then view the results with
> > `smartctl -l selftest /dev/sdx` (for each sdx) or `smartctl -a
> > /dev/sdx` (for each sdx).
> >
> > See `man smartctl`.
> >
> > Note that the new disk health monitoring tool "palimpsest" in
> > package gnome-disk-utility is panicky and not to be trusted, unless
> > you like buying lots of hard drives. It doesn't just look at
> > "WHEN_FAILED", but has its own criteria such as nonzero
> > Reallocated_Event_Count, which is fairly normal for a modern drive
> > that has been in use for a while. A nonzero Current_Pending_Sector
> > or Offline_Uncorrectable are bad, as they mean data loss, though
> > not general drive failure. I recommend enabling Automatic Offline
> > Testing with `smartctl -o on /dev/sdx` (for
> > each sdx), which will do a surface scan every few hours, giving the
> > best chance to repair or recover any sectors that are going bad.
> >
>
> Will the `smartctl -o on /dev/sdx` (for > each sdx), fix the nonzero
> Reallocated_Event_Count issue on RAID arrays in a non-desctructive
> way?
No. Nor for non-RAID either. It doesn't "fix" Reallocated_Event_Count
-- rather, its purpose is to make Reallocated_Event_Count go up faster,
in that as soon as a sector starts to go bad it will be reallocated if
readable, and the sooner the more likely it is possible. A non-zero
Reallocated_Event_Count is not a problem. Whatever says it is a
problem is the real problem. Fix that instead.
Non-zero Current_Pending_Sector is a problem, but RAID should be fixing
that already. I don't know, but I think that enabling Automatic
Offline Testing should cause any uncorrectable sectors to be noticed
and fixed sooner by RAID.
> Do you have to use the /dev/sdx devices or the /dev/md devices?
...
Automatic Offline Testing must be enabled on an actual ATA hard disk,
so no fake disk such as dm or md. See `man smartctl`.
--
____________________________________________________________________
TonyN.:' <mailto:tonynelson at georgeanelson.com>
' <http://www.georgeanelson.com/>
More information about the users
mailing list