[Solved] Re: FC5 S/W Raid Rebuilding to Infiinity(and beyond!)

Nigel Wade nmw at ion.le.ac.uk
Thu Nov 23 09:27:26 UTC 2006


Sean Bruno wrote:
>> You have found yourself in the same situation I found myself in recently. 
>> Actually my situation was slightly different, but the resulting problem is the 
>> same. In my case at re-boot md decided that one partition of a mirror was out of 
>> sync, and so initiated a re-sync with the other partition. However, the 
>> partition which was active contained a bad sector, so the re-sync failed, over 
>> and over and over..., just like yours is doing.
>>
>> In order to fix my system I used the following steps.
>>
>> The first step is to take the offending filesystem offline. Then I copied the 
>> existing partition onto the good disk using dd, with the noerror option so it 
>> would continue past read errors. In my case I knew that the read error was not 
>> part of the actual filesystem in use because it passed fsck. When the copy was 
>> complete I ran fsck on the new filesystem just to be sure it had copied ok.
>>
>> After this I created a new RAID consisting of just the good partition (in my 
>> case the RAID was md1 and the new partition was sda3):
>>   # mdadm -C /dev/md1 --force -n 1 -l 1 /dev/sda3
>>
>> As a temporary fix, until a new disk arrived, I ran
>>    # e2fsk -c -d -f /dev/sdb3
>> to mark back blocks (sdb3 was the failing partition).
>> Then I ran:
>>    # mdadm --zero-superblock /dev/sdb3
>> to remove the md superblock from the partition so it was no longer part of a RAID.
>>
>> Finally, I used mdadm to add the dodgy partition back into the RAID:
>>
>> # mdadm -a /dev/md1 /dev/sdb3
>>
>> and to grow the RAID to 2 partitions:
>>
>> # mdadm --grow -n 2 /dev/md1
> 
> Thanks for the assistance with this Nigel.  I was able to recover from
> this 'double' failure with your procedure.  I had purchased 2 new disks
> in order to replace the failed drives and I am back up at this time.
> 
> Sean
> 
> 

You may want to do some additional testing to verify the status of the new 
filesystem. In my original message I implied that fsck was sufficient, but as 
Tony quite rightly pointed out, it isn't. On my failing disk I knew that the bad 
block wasn't part of the active filesystem, so a simple copy/fsck was 
sufficient. During the copy there were no errors, and a comparison of the two 
filesystems showed no discrepancies.

When you copied your filesystem, did the system generate any error messages? If 
so, you will probably want to investigate which file the bad block belonged to, 
and determine the impact that having that file corrupted might cause, and 
whether you can restore that file from a backup.

-- 
Nigel Wade, System Administrator, Space Plasma Physics Group,
             University of Leicester, Leicester, LE1 7RH, UK
E-mail :    nmw at ion.le.ac.uk
Phone :     +44 (0)116 2523548, Fax : +44 (0)116 2523555




More information about the users mailing list