I have a 4 disk soft RAID-5 array and I'm receiving a ton of these messages in the system logs:
Dec 31 17:04:38 tibeaux smartd[2384]: Device: /dev/sda, 3 Offline uncorrectable sectors Dec 31 17:04:39 tibeaux smartd[2384]: Device: /dev/sdd, 48 Currently unreadable (pending) sectors
But cat /proc/mdstat reports:
Personalities : [raid6] [raid5] [raid4] md0 : active raid5 sda[0] sdd[3] sdc[2] sdb[1] 735351936 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]
unused devices: <none>
Best way to fix this, or is it not a problem?
vamythguy wrote:
I have a 4 disk soft RAID-5 array and I'm receiving a ton of these messages in the system logs:
Dec 31 17:04:38 tibeaux smartd[2384]: Device: /dev/sda, 3 Offline uncorrectable sectors Dec 31 17:04:39 tibeaux smartd[2384]: Device: /dev/sdd, 48 Currently unreadable (pending) sectors
Best way to fix this, or is it not a problem?
It is a problem. This means that both drives are starting to develop bad sectors. Since you have raid 5, you can replace the disk and let the array rebuild itself. Or keep using those disks, but add a hot spare disk to the array. I usually replace the disks as soon as I see smartd complain about them, but if you wait too much , you risk loosing your data.. Last time I replaced a disk in such condition, it had over 1600 uncorrectable sectors, making it impossible to transfer the data to the new disk via DD, pvmove, cp... Luckily it was just a proxy/gateway/vpn server, so a reinstall wasnt difficult.
-- Pedro Macedo
On 12/31/06, Pedro Fernandes Macedo webmaster@margo.bijoux.nom.br wrote:
vamythguy wrote:
I have a 4 disk soft RAID-5 array and I'm receiving a ton of these messages in the system logs:
Dec 31 17:04:38 tibeaux smartd[2384]: Device: /dev/sda, 3 Offline uncorrectable sectors Dec 31 17:04:39 tibeaux smartd[2384]: Device: /dev/sdd, 48 Currently unreadable (pending) sectors
Best way to fix this, or is it not a problem?
It is a problem. This means that both drives are starting to develop bad sectors. Since you have raid 5, you can replace the disk and let the array rebuild itself. Or keep using those disks, but add a hot spare disk to the array. I usually replace the disks as soon as I see smartd complain about them, but if you wait too much , you risk loosing your data.. Last time I replaced a disk in such condition, it had over 1600 uncorrectable sectors, making it impossible to transfer the data to the new disk via DD, pvmove, cp... Luckily it was just a proxy/gateway/vpn server, so a reinstall wasnt difficult.
-- Pedro Macedo
-- fedora-list mailing list fedora-list@redhat.com To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
I suspected as much. Two questions:
1) Do the different messages imply they are dieing in different ways? 2) Is there any way to salvage either drive, maybe by quarantining the bad sectors or something?
On Sun, Dec 31, 2006 at 21:11:02 -0500, vamythguy vamythguy@gmail.com wrote:
- Do the different messages imply they are dieing in different ways?
The drive with only 3 bad sectors you may want to continue using, but the one with 48, you should strongly consider replacing as fast as you can.
- Is there any way to salvage either drive, maybe by quarantining the bad
sectors or something?
If you write over the bad sectors, the drive will remap them to spare sectors. This won't happen unless you either get a successful read or write over the bad sectors.
In theory you might not have lost any data, but because the one drive is offline you can't be sure you can know what value goes into the bad sectors of the drive that is still in the array.
You should first try to back up what you have now. Then you can try to figure out which files are possibly corrupt (see: http://smartmontools.sourceforge.net/BadBlockHowTo.txt) and decide if it is worth trying to fix them. You may be able to use the data on the failed drive to fix the files.
Once you have receovered everything you can, you should run badblocks on the drives (using a livecd is probably easiest) to try to find other bad blocks. Then run smartctl -t long on the drives to see how bad things are. A few reallocated blocks don't necessarily warrant tossing a drive. You need to balance your budget versus the cost of replacing the data you might lose and the extra likelyhood of failure suggested by a drive having reallocated sectors.
For the future, you want to be running smartd so that you are warned about bad sectors while only one drive has them so that you can repair the array before there are two drives with errors.