I've very recently upgraded 2 of my machines. One machine was upgraded from Fedora 9 to Fedora 11, and the other machine was upgraded from Fedora 10 to Fedora 11. On machine 1 I have 2-hard disks (both Seagate's - 500 GB and 1000 GB), on machine 2 I have 1-hard disk (Western Digital 320 GB). All of the interfaces are SATA. The questionable status is that on machine 1 the 500 GB drive is showing as failing and on machine 2 the 320 GB drive is showing as failing. Neither drive, under the old releases, showed up as failing. How do I know that these drive are truly failing?
Thanks, Gene Poole
On Wed, 2009-09-23 at 09:29 -0400, Gene Poole wrote:
I've very recently upgraded 2 of my machines. One machine was upgraded from Fedora 9 to Fedora 11, and the other machine was upgraded from Fedora 10 to Fedora 11. On machine 1 I have 2-hard disks (both Seagate's - 500 GB and 1000 GB), on machine 2 I have 1-hard disk (Western Digital 320 GB). All of the interfaces are SATA. The questionable status is that on machine 1 the 500 GB drive is showing as failing and on machine 2 the 320 GB drive is showing as failing. Neither drive, under the old releases, showed up as failing. How do I know that these drive are truly failing?
That's because older distributions didn't have automatically enabled disk health monitoring GUI software. Fedora 11 does.
The health monitor reads the SMART data of the disks. This is not a software questions, the hard drives report that they are failing.
You can read the raw SMART data with # smartctl -A /dev/sda where sda is your disk drive device (you need to run this as root).
On Wed, 23 Sep 2009 16:38:10 +0300 Jussi Lehtola wrote:
That's because older distributions didn't have automatically enabled disk health monitoring GUI software. Fedora 11 does.
At risk of asking a stupid question, where does it provide that monitoring?
I haven't seen anything about SMART on any of my Fedora desktops; what am I missing?
On Wed, 2009-09-23 at 10:05 -0600, Frank Cox wrote:
At risk of asking a stupid question, where does it provide that monitoring?
I haven't seen anything about SMART on any of my Fedora desktops; what am I missing?
A daemon that's started at boot time, reports going into the usual daily report emailed to the root user.
Tim on 09/23/2009 11:25 AM wrote:
A daemon that's started at boot time, reports going into the usual daily report emailed to the root user.
For a good majority of Fedora desktop users, that root mail is never read.
DeviceKit provides some SMART notification now IIRC starting in F11. palimpset (gnome-disk-utility) also provides some extended SMART tools.
Michael Cronenworth wrote:
For a good majority of Fedora desktop users, that root mail is never read.
This is not good. Maybe a modification of the install procedure to set an alias so that all mail goes to the first normal user that is set up? Or an option to do this as part of the install? What do you think?
Mikkel
Mikkel on 09/23/2009 11:49 AM wrote:
This is not good. Maybe a modification of the install procedure to set an alias so that all mail goes to the first normal user that is set up? Or an option to do this as part of the install? What do you think?
This is going off-topic of the OP, but this topic is always discussed around Alpha time every 6 months. Nothing ever comes out of the discussions. Feel free to open a bug or provide patches.
On Wed, 2009-09-23 at 11:51 -0500, Michael Cronenworth wrote:
Mikkel on 09/23/2009 11:49 AM wrote:
This is not good. Maybe a modification of the install procedure to set an alias so that all mail goes to the first normal user that is set up? Or an option to do this as part of the install? What do you think?
This is going off-topic of the OP, but this topic is always discussed around Alpha time every 6 months. Nothing ever comes out of the discussions. Feel free to open a bug or provide patches.
It is easy to send root's mail to another user using the /etc/aliases file. So the above is not really a problem. -- ======================================================================= In a whiskey it's age, in a cigarette it's taste and in a sports car it's impossible. ======================================================================= Aaron Konstam telephone: (210) 656-0355 e-mail: akonstam@sbcglobal.net
On Wed, 2009-09-23 at 15:55 -0500, Aaron Konstam wrote:
On Wed, 2009-09-23 at 11:51 -0500, Michael Cronenworth wrote:
Mikkel on 09/23/2009 11:49 AM wrote:
This is not good. Maybe a modification of the install procedure to set an alias so that all mail goes to the first normal user that is set up? Or an option to do this as part of the install? What do you think?
This is going off-topic of the OP, but this topic is always discussed around Alpha time every 6 months. Nothing ever comes out of the discussions. Feel free to open a bug or provide patches.
It is easy to send root's mail to another user using the /etc/aliases file. So the above is not really a problem.
---- assuming that you are not on a cable/phone network that doesn't block outbound port 25 or that someone is using local mail, your assertion would be true but those are bad assumptions to make for a large portion of the users.
Craig
Mikkel wrote:
Michael Cronenworth wrote:
For a good majority of Fedora desktop users, that root mail is never read.
This is not good. Maybe a modification of the install procedure to set an alias so that all mail goes to the first normal user that is set up? Or an option to do this as part of the install? What do you think?
This only helps if users read local mail, which I doubt is much more likely than reading root's mail spool. For desktop installs, there is a proposed feature to not install any MTA by default and send all cron output and other things that are currenly mailed to root to logfiles instead.
http://fedoraproject.org/wiki/Features/NoMTA
For the typical desktop user this will probably be about the same as now. Instead of having unread messages in /var/spool/mail/root, they'll be in /var/log/*. :)
For those of use acustomed to having an MTA installed and sending root's mail somewhere it is read, it will just be a matter of installing the MTA of our choice. (Which for me would be slightly simpler, as I wouldn't have to handle removing sendmail and replacing it with postfix.)
Michael Cronenworth:
For a good majority of Fedora desktop users, that root mail is never read.
Mikkel:
This is not good. Maybe a modification of the install procedure to set an alias so that all mail goes to the first normal user that is set up? Or an option to do this as part of the install? What do you think?
I seem to recall, years ago, that the install routine arranged this with a "Who should receive root's mail?" question. Though, that still leaves a problem with users who never check any local mail.
On Wed, Sep 23, 2009 at 10:05:26 -0600, Frank Cox theatre@sasktel.net wrote:
On Wed, 23 Sep 2009 16:38:10 +0300 Jussi Lehtola wrote:
That's because older distributions didn't have automatically enabled disk health monitoring GUI software. Fedora 11 does.
At risk of asking a stupid question, where does it provide that monitoring?
I haven't seen anything about SMART on any of my Fedora desktops; what am I missing?
The package is smartmontools. The daemon name is smartd. It starts scheduled tests based on the config file (/etc/smartd.conf).
You can also use smartctl to manually look at the current status or start tests.
Bruno Wolff III on 09/23/2009 12:36 PM wrote:
The package is smartmontools. The daemon name is smartd. It starts scheduled tests based on the config file (/etc/smartd.conf).
You can also use smartctl to manually look at the current status or start tests.
The resulting daemon output is dumped into log files though. Not something a typical desktop user expects to look. Sure you can run it manually but the resulting output can look alien to a normal user. My recommendation of gnome-disk-utility should be what he's looking for.
On 09-10-01 09:45:06, David Timms wrote:
On 09/24/2009 02:05 AM, Frank Cox wrote:
I haven't seen anything about SMART on any of my Fedora desktops;
what am I
missing?
You are missing disks with faults ;-) It's a good thing.
Or you may not have Palimpsest (gnome-disk-utility) installed or running. I got it on a new install of F11-Live, but not on an upgrade from F9.
On 09-09-23 09:29:56, Gene Poole wrote:
I've very recently upgraded 2 of my machines. One machine was upgraded from Fedora 9 to Fedora 11, and the other machine was upgraded from Fedora 10 to Fedora 11. On machine 1 I have 2-hard disks (both Seagate's - 500 GB and 1000 GB), on machine 2 I have 1- hard disk (Western Digital 320 GB). All of the interfaces are SATA. The questionable status is that on machine 1 the 500 GB drive is showing as failing and on machine 2 the 20 GB drive is showing as failing. Neither drive, under the old releases, showed up as failing. How do I know that these drive are truly failing?
1) Wait. If the disk is going bad, it will fail.
2) Run as root `smartctl -A /dev/sdx` (for each sdx) and look at the "WHEN_FAILED" column; it will be "-" if not failed.
3) Run as root `smartctl -a /dev/sdx` (for each sdx) and look at the whole output.
4) Run as root `smartctl -t long /dev/sdx` (for each sdx) and wait until the time the test should finish, then view the results with `smartctl -l selftest /dev/sdx` (for each sdx) or `smartctl -a /dev/ sdx` (for each sdx).
See `man smartctl`.
Note that the new disk health monitoring tool "palimpsest" in package gnome-disk-utility is panicky and not to be trusted, unless you like buying lots of hard drives. It doesn't just look at "WHEN_FAILED", but has its own criteria such as nonzero Reallocated_Event_Count, which is fairly normal for a modern drive that has been in use for a while. A nonzero Current_Pending_Sector or Offline_Uncorrectable are bad, as they mean data loss, though not general drive failure. I recommend enabling Automatic Offline Testing with `smartctl -o on /dev/sdx` (for each sdx), which will do a surface scan every few hours, giving the best chance to repair or recover any sectors that are going bad.
Tony Nelson wrote:
On 09-09-23 09:29:56, Gene Poole wrote:
I've very recently upgraded 2 of my machines. One machine was upgraded from Fedora 9 to Fedora 11, and the other machine was upgraded from Fedora 10 to Fedora 11. On machine 1 I have 2-hard disks (both Seagate's - 500 GB and 1000 GB), on machine 2 I have 1- hard disk (Western Digital 320 GB). All of the interfaces are SATA. The questionable status is that on machine 1 the 500 GB drive is showing as failing and on machine 2 the 20 GB drive is showing as failing. Neither drive, under the old releases, showed up as failing. How do I know that these drive are truly failing?
Wait. If the disk is going bad, it will fail.
Run as root `smartctl -A /dev/sdx` (for each sdx) and look at the
"WHEN_FAILED" column; it will be "-" if not failed.
- Run as root `smartctl -a /dev/sdx` (for each sdx) and look at the
whole output.
- Run as root `smartctl -t long /dev/sdx` (for each sdx) and wait
until the time the test should finish, then view the results with `smartctl -l selftest /dev/sdx` (for each sdx) or `smartctl -a /dev/ sdx` (for each sdx).
See `man smartctl`.
Note that the new disk health monitoring tool "palimpsest" in package gnome-disk-utility is panicky and not to be trusted, unless you like buying lots of hard drives. It doesn't just look at "WHEN_FAILED", but has its own criteria such as nonzero Reallocated_Event_Count, which is fairly normal for a modern drive that has been in use for a while. A nonzero Current_Pending_Sector or Offline_Uncorrectable are bad, as they mean data loss, though not general drive failure. I recommend enabling Automatic Offline Testing with `smartctl -o on /dev/sdx` (for each sdx), which will do a surface scan every few hours, giving the best chance to repair or recover any sectors that are going bad.
Will the `smartctl -o on /dev/sdx` (for > each sdx), fix the nonzero Reallocated_Event_Count issue on RAID arrays in a non-desctructive way? Do you have to use the /dev/sdx devices or the /dev/md devices?
Good pointers in the mean time.
On 09-10-01 09:09:40, Robin Laing wrote:
Tony Nelson wrote:
On 09-09-23 09:29:56, Gene Poole wrote:
I've very recently upgraded 2 of my machines. One machine was upgraded from Fedora 9 to Fedora 11, and the other machine was upgraded from Fedora 10 to Fedora 11. On machine 1 I have 2-hard disks (both Seagate's - 500 GB and 1000 GB), on machine 2 I have 1- hard disk (Western Digital 320 GB). All of the interfaces are SATA. The questionable status is that on machine 1 the 500 GB drive is showing as failing and on machine 2 the 20 GB drive is showing as failing. Neither drive, under the old releases, showed up as failing. How do I know that these drive are truly failing?
Wait. If the disk is going bad, it will fail.
Run as root `smartctl -A /dev/sdx` (for each sdx) and look at
the "WHEN_FAILED" column; it will be "-" if not failed.
- Run as root `smartctl -a /dev/sdx` (for each sdx) and look at
the whole output.
- Run as root `smartctl -t long /dev/sdx` (for each sdx) and wait
until the time the test should finish, then view the results with `smartctl -l selftest /dev/sdx` (for each sdx) or `smartctl -a /dev/sdx` (for each sdx).
See `man smartctl`.
Note that the new disk health monitoring tool "palimpsest" in package gnome-disk-utility is panicky and not to be trusted, unless you like buying lots of hard drives. It doesn't just look at "WHEN_FAILED", but has its own criteria such as nonzero Reallocated_Event_Count, which is fairly normal for a modern drive that has been in use for a while. A nonzero Current_Pending_Sector or Offline_Uncorrectable are bad, as they mean data loss, though not general drive failure. I recommend enabling Automatic Offline Testing with `smartctl -o on /dev/sdx` (for each sdx), which will do a surface scan every few hours, giving the best chance to repair or recover any sectors that are going bad.
Will the `smartctl -o on /dev/sdx` (for > each sdx), fix the nonzero Reallocated_Event_Count issue on RAID arrays in a non-desctructive way?
No. Nor for non-RAID either. It doesn't "fix" Reallocated_Event_Count -- rather, its purpose is to make Reallocated_Event_Count go up faster, in that as soon as a sector starts to go bad it will be reallocated if readable, and the sooner the more likely it is possible. A non-zero Reallocated_Event_Count is not a problem. Whatever says it is a problem is the real problem. Fix that instead.
Non-zero Current_Pending_Sector is a problem, but RAID should be fixing that already. I don't know, but I think that enabling Automatic Offline Testing should cause any uncorrectable sectors to be noticed and fixed sooner by RAID.
Do you have to use the /dev/sdx devices or the /dev/md devices?
...
Automatic Offline Testing must be enabled on an actual ATA hard disk, so no fake disk such as dm or md. See `man smartctl`.
Tony Nelson wrote:
On 09-10-01 09:09:40, Robin Laing wrote:
Tony Nelson wrote:
On 09-09-23 09:29:56, Gene Poole wrote:
I've very recently upgraded 2 of my machines. One machine was
Will the `smartctl -o on /dev/sdx` (for > each sdx), fix the nonzero Reallocated_Event_Count issue on RAID arrays in a non-desctructive way?
No. Nor for non-RAID either. It doesn't "fix" Reallocated_Event_Count -- rather, its purpose is to make Reallocated_Event_Count go up faster, in that as soon as a sector starts to go bad it will be reallocated if readable, and the sooner the more likely it is possible. A non-zero Reallocated_Event_Count is not a problem. Whatever says it is a problem is the real problem. Fix that instead.
Non-zero Current_Pending_Sector is a problem, but RAID should be fixing that already. I don't know, but I think that enabling Automatic Offline Testing should cause any uncorrectable sectors to be noticed and fixed sooner by RAID.
Do you have to use the /dev/sdx devices or the /dev/md devices?
...
Automatic Offline Testing must be enabled on an actual ATA hard disk, so no fake disk such as dm or md. See `man smartctl`.
With the changes, I was shocked to see the error message when I tried a live DVD on my laptop. It would be worthwhile to have a tool for testing and possibly fixing the problem in a non-destructive way for most users. I guess it is time for an RFE search.