failed drive?

Thu Nov 13 22:11:23 UTC 2014

Hi all, I'm having some issues. I'm a little confused, so I was checking 
our servers today and saw something strange. cat /proc/mdstat shows that 
1 device md0 is inactive. I'm not really sure why. I did a bit more 
digging and testing using smartctl and it says that the device /dev/sdg 
(part of md0) is failing, estimated to fail within 24 hrs. but if i do 
df -h it doesn't even show md0, and was talking to a friend and we 
disagreed. I believe that based on what smrtctl says the drive is 
failing but not failed yet. he doesn't think its a problem with the 
drive. do you have any thoughts on this? and why would the device (md0) 
suddenly be inactive but still show 2 working devices (sdg, sdh)?

*(proc/mdstat)*
[root at csdatastandby3 bin]# cat /proc/mdstat
Personalities : [raid1] [raid10]
md125 : active raid10 sdf1[5] sdc1[2] sde1[4] sda1[0] sdb1[1] sdd1[3]
       11720655360 blocks super 1.2 512K chunks 2 near-copies [6/6] [UUUUUU]

md126 : active raid1 sdg[1] sdh[0]
       463992832 blocks super external:/md0/0 [2/2] [UU]

md0 : inactive sdh[1](S) sdg[0](S)
       6306 blocks super external:imsm

unused devices: <none>
[root at csdatastandby3 bin]#

*(smartctl)*
[root at csdatastandby3 bin]# smartctl -H /dev/sdg
smartctl 5.43 2012-06-30 r3573 [x86_64-linux-2.6.32-431.17.1.el6.x86_64] 
(local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
Failed Attributes:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE UPDATED  
WHEN_FAILED RAW_VALUE
   5 Reallocated_Sector_Ct   0x0033   002   002   036    Pre-fail 
Always   FAILING_NOW 32288

[root at csdatastandby3 bin]#

*(df -h)*
[root at csdatastandby3 bin]# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/md126p4    404G  4.3G  379G   2% /
tmpfs            16G  172K   16G   1% /dev/shm
/dev/md126p2    936M   74M  815M   9% /boot
/dev/md126p1    350M  272K  350M   1% /boot/efi
/dev/md125       11T  4.2T  6.1T  41% /data
[root at csdatastandby3 bin]#

*(mdadm -D /dev/md0*
[root at csdatastandby3 bin]# mdadm -D /dev/md0
/dev/md0:
         Version : imsm
      Raid Level : container
   Total Devices : 2

Working Devices : 2

            UUID : 32c1fbb7:4479296b:53c02d9b:666a08f6
   Member Arrays : /dev/md/Volume0

     Number   Major   Minor   RaidDevice

        0       8       96        -        /dev/sdg
        1       8      112        -        /dev/sdh
[root at csdatastandby3 bin]#

thanks

-dustink
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.fedoraproject.org/pipermail/users/attachments/20141113/e98b2e9e/attachment-0001.html>