F20 - Unintended consequences of no default MTA - How best to fix

Sun Jan 5 01:17:07 UTC 2014

On Jan 4, 2014, at 2:47 PM, Andre Robatino <robatino at fedoraproject.org> wrote:
> 
> I'd really like to be able to get smartd notifications, in GNOME, without
> having to configure an MTA at all.

Hmmm…
http://gnomeshell.wordpress.com/2011/08/28/manage-the-startup-applications/

This shows Disk Notifications does this, and some results seem to indicate it was a default behavior at one time. Fedora 20 Gnome doesn't have this item in startup-applications. And I can't find a gdu-notification-daemon. I do have a gdu-sd-plugin.

Then I find this:
http://code.metager.de/source/history/gnome/Platform/disk-utility/src/notify/gdu-sd-plugin.gnome-settings-plugin.in

Looks like it was cut in 3.4 and has been added back in as gdu-sd-plugin.gnome-settings-plugin.in but I'm not sure how to access this in the GUI to configure it, or maybe it's already enabled by default. I'm not sure. In Disks, I have an option "Drive Settings" but it doesn't include SMART monitoring or notifications. There is a SMART Data & Self Tests which is "On". I don't know if that means I'll get notifications.

journalctl -xb | grep -i smartd

Shows that it is enabled and my drive has been added to the monitor list. If I remove -b, I get a lot more entries from prior boots, but there's no indication any tests are being done.

> For now, it's easier for me to just check
> gnome-disks regularly, but not as prompt. My last HDD only lasted about 1
> 1/2 years, and at the time its failure affected my ability to work, the
> number of bad sectors was high, but still below the FAIL threshold.

This requires a lot of subjectivity and the available data shows something like a 60% correlation between the self-assessment and failure. So about 40% of the time, a failure occurs while health is a pass. And then correlating individual attributes to failures is also difficult according to the research. What might be useful is if the monitor setting had a slider for the user to set how verbose the monitoring is. Other than off, the least verbose would be to report any attribute in pre-fail now. And most verbose might notify each time any pre-fail type attribute value changes by >=2 since last check.

Value is not the same as the attribute's raw_ value which can change by a lot more than one without affecting the value. Raw_value is probably too verbose to be useful. And the old age types don't all give an indicator of impending failure.

A bigger issue is that the SCT ERC for consumer drives is almost certainly mismatched with the default linux SCSI block layer timeout value, so what happens is the drive can have problems that don't result in read errors before the kernel decides to reset the bus. For any raid configuration this especially tragic and to my knowledge the installer isn't changing the defaults.

[root at f20s ~]# cat /sys/block/sda/device/timeout 
30

Indeed. 30 seconds is when it will timeout, and yet the drive is typically 120.

> (The
> previous day, there were no bad sectors at all, or any other sign of
> failure.) There wasn't enough time to do a backup before it failed. Having
> Fedora configured by default to give a desktop notification when the number
> of bad sectors increases (not just hitting an arbitrary FAIL threshold)
> would be handy.

I'd file a feature request, I'll support it. 

Chris Murphy