Check your /etc/default/grub, if you use raid 1.

Tom Killian tom.killian at gmail.com
Mon Jul 30 14:55:22 UTC 2012


On Sun, Jul 29, 2012 at 10:02 AM,  Sam Varshavchik
<mrsam at courier-mta.com> wrote:
> To: For users of Fedora Core releases <users at lists.fedoraproject.org>
> Subject: Check your /etc/default/grub, if you use raid 1.
> Message-ID: <cone.1343570520.888812.3982.1000 at monster.email-scan.com>
> Content-Type: text/plain; charset="utf-8"; Format="flowed";
>         DelSp="yes"
>
> There's a long standing combination of two bugs: the list of rd.md.uuid boot
> parameters generated by anaconda for /etc/default/grub may not include the
> raid uuid of non-stock partitions like /home; and although the ramfs
> initscript autodiscovers all raid volumes present, sometimes (not always,
> I'll estimate 5% of the time) if a uuid is not enumerated in the boot
> parameters, one of the drives in the raid 1 volume may not get assembled at
> boot.
>
> There's probably a third bug in here: mdmonitor should've mailed me when an
> array came up degraded at boot (I suspect that because mdmonitor gets
> started so early in the boot process, not all the moving pieces are there
> for mail delivery to happen). Eventually, you'll boot again with both drives
> in the array somehow, except they'll be out of sync, resulting in massive
> corruption. If you're lucky, you'll boot just with the other drive, and
> discover that your filesystem's contents are weeks/months out of date, and
> maybe you'll be lucky enough to figure out what happen, and switch back to
> the other drive and resync. But, not everyone's so lucky.
>
> This first started happening in F16. It took me a while to figure out the
> cause for an occasionally raid assembly failure at boot. Fixed it, and
> forgot about it. Well, looks like the F17 anaconda brought back the broken
> /etc/default/grub, which found its way into my grub.cfg, and I just lost a
> full day, cleaning up this mess.
>
> So, if you use raid 1 and upgraded to F17, you may need to fix this before
> it's too late: put back the missing uuid into /etc/default/grub, and into
> every entry in grub.cfg
>
> Pissed.

Thanks for the explanation and fix/workaround, Sam.  This happened to
me as well.  I ran fsck on the two mirrors independently and was able
to recover most of the data from the lost+found's.  But I had been
brooding over the root cause until now.


More information about the users mailing list