RAID & HDD failure recovery

Laurence Vanek lvanek at charter.net
Fri Nov 17 00:32:34 UTC 2006


Bruno Wolff III wrote:
> On Wed, Nov 15, 2006 at 19:15:49 -0600,
>   Laurence Vanek <lvanek at charter.net> wrote:
>   
>> Surprise! boot hangs, cant find partitions on hda (of course not).  
>> drops me to simple shell.
>>     
>
> That is an odd error message. The raid devices should have names like
> /dev/md0 and only those names should appear in /etc/fstab. Grub refers
> to drive numbers, not letters, so I am wondering what is giving you that
> message.
>
> There are a couple of things that can cause problems. If you lost the
> only disk that grub was installed on (grub works with raw partitions
> and doesn't know about raid) then you will need to boot from a rescue
> CD to fix things up. Even if you made sure that grub was installed on
> multiple disks you still have to worry about hard drive order as after
> grub starts booting from the MBR of a device it finishes using a named
> (by hard drive order) hard drive and if that drive doesn't have what
> is expected on it (because that drive was lost or drives don't have the
> number expected after one was removed) the system won't boot. Unfortunately
> grub doesn't seem to have a way to say finish booting using the same drive
> you started on. (In my case I only have two drives, so I can just set them
> both to use drive 0 and be able to easily get things working with either
> one drive or with one drive replaced.)
>
> No matter what has happened, you should be able to fix things up by booting
> from the rescue CD. You should be able to repartition the new hda and use
> mdadm to add the partitions back into the appropiate mirrors.
>
>   
thank you for the advice.

A little more info. The failed drive (hda) is connected to the primary 
HDD controller while the good drive (hdc) is connected to the secondary 
controller. I am guessing that things would have been different if the 
drive on /dev/hdc had failed. I think my original "game plan" might have 
worked.

As is was, the rescue disk could not find a fedora install (was looking 
only at the new drive) so cant mount a sysimage. Thats the point where I 
was dropped to the simple shell.

In hindsight perhaps I should have booted with a live CD like knoppix 
for the recovery. I imagine then the "sfdisk -d /dev/hdc | sfdisk 
/dev/hda" plan might have worked.

Since its not if but when the next drive will fail, Im hoping I can 
assemble a working plan. If not Im afraid I dont see the value in a 
RAID1 setup.





More information about the users mailing list