dmraid comments and a warning

Tue Feb 7 02:02:06 UTC 2006

On Mon, 2006-02-06 at 13:08 -0700, Dax Kelson wrote:
[...]
> Once a RAID1 (mirror) is has been defined and built inside of the BIOS
> the utility you *never ever* want to boot to half of the RAID. If you
> do, and you go back to booting to whole activated RAID, you get massive
> file corruption.

Well, that's not necessarily true, although for practical purposes it's
currently true with dmraid.  In a nice blue sky world, you've got block
maps of what's in sync, and they've got a timestamp, and one of them is
"correct".  After all, you have to be able to re-sync from somewhere
when one half of the mirror dies.

But right now we can't really do that for most dmraid formats, I think.

> The standard root=LABEL=/ was used on the kernel command line and what
> happened is that it booted up to one side of the mirror. All the updates
> and new packages (including a new kernel install which modified the
> grub.conf) activity just happened on that one side of the mirror.

This should be fixed in the current rawhide tree.

> When I rebooted, GRUB read a garbled grub.conf because at that stage ist
> *is* using a 'activated' RAID (via the RAID BIOS support). I couldn't
> boot.

What do you mean by "garbled" here?  From what you've said so far, at
this point you should have two perfectly coherent filesystems -- which
just don't match.  Each of them should have a grub.conf, both of which
should be properly formed -- one of them just doesn't match one disk.

> So I booted to the rescue environment, which did the right thing and
> activated the RAID and it even mounted the filesystems. When I went and
> inspected the files though, anything that got touched while it booted to
> the one side of the mirror was trashed.

So the Really Important Thing about BIOS-based raids is that if you
_ever_ get into the situation where one disk has been written and the
other hasn't, you need to go into your bios and re-sync the disks.  And
unfortunately, it's very difficult to automatically detect that you're
in this situation with bios raid.

> --Event Two--
> 
> With the benefit of the experience of event one. I did a new install,
> but this time I let Anaconda's disk druid do the "auto setup" thing and
> create a LVM. I figured that LVM using device mapper and dmraid would
> always "do the right thing" in regards to *always* using the activated
> RAID partitions as the PVs.

What distro were you installing?  AFAIK, both this and your previous
configuration should have worked if you installed on a tree after
January 9th or so.  That'd mean test2 should have been ok.

(I haven't really looked at upgrades yet; hopefully very soon even
though it's not really possible to be "upgrading" from a fc4 dmraid
setup.)

> This seemed to be the case. I installed and booted OK. I verified that I
> was using LVM and inspected the physical volumes using 'pvdisplay'.
> 
> I was greeted with:
> 
> # pvdisplay
>   --- Physical volume ---
>   PV Name               /dev/dm-1
> [snip]
> 
> Looks good! Seeing /dev/dm-1 instead of /dev/mapper was a surprise, but
> I agree with the idea.

Yeah, I hate our naming policies with all that.
Having /dev/mapper/$MAP_NAME not correspond with the name in /sys/block
is totally bogus, but it's what all of the device-mapper code does right
now.

> On bootup I noticed an error flash by something to the effect of "LVM
> ignoring duplicate PV".

Ok, so this means one of several possible things:

1) you're using lvm2 < 2.02.01-1
2) there's no entry for the dm device in /etc/blkid.tab
3) for some reason, the priority isn't set on the dm device
in /etc/blkid.tab
4) there's no dm rules in your initrd

I think that's actually the whole list of practical reasons you'd get to
this point, but it's always possible I've overlooked something.

One interesting note is that given any of these you should be getting
the same disk mounted each time.  Which means there's a good chance that
sda and sdb are both fine, one of them just happens to represent your
machine 3 weeks ago.  I do still need to add more checking at boot time
to bring it up read-only with a dm-error device as one half of the
mirror, though, at which time it may even work to try and boot
(read-only of course ;)
once you've yanked sdb out.

> I ran pvdisplay and saw:
> 
> # pvdisplay
>   --- Physical volume ---
>   PV Name               /dev/sda1
> [snip]
> 
> It booted off one half of the mirror. It must have done the same on some
> previous boot.

Do you still have this disk set, or have you wiped it and reinstalled
already?  If you've got it, I'd like to see /etc/blkid.tab from either
disk (both if possible).

> There needs to be more checks in place to prevent booting off of one
> half of the mirror, or at a minimum only allowing a read-only boot on
> one side of the mirror. Dead systems are no fun. Loosing your personal
> data is hell.

Well, we should have the appropriate checks there at this point -- so
I'd be curious to find out exactly which versions you installed with.
It could be that one of the checks was introduced after you installed,
and the "yum update" process caused it to believe it was *not* a raid
system.

(I haven't been extensively checking to make sure every daily rawhide
would work perfectly as an update from the previous one, just that
they'd install if possible...)

> This isn't purely a Linux problem. Any operating system using fake RAID1
> needs to be robust in this regard. I saw a Windows box using 'fake'
> motherboard RAID and the motherboard BIOS got flashed which reset the
> "Use RAID" setting to 'off'. Then Windows booted off of half the RAID.

That's interesting.  It means there's some way to query the BIOS to tell
if it's installed the int13 "raid" hook or not.  I wish I knew what that
magic is.

> This was noticed and the BIOS setting was turned back on and a boot
> attempted. Massive corruption and a dead Windows system was the result.
> To Window's credit I haven't seen it accidentally boot off of half the
> RAID as long as the BIOS RAID was turned on and the drivers installed.

Of course you're also not using windows development trees ;)

> The rules are:

> 1. Don't boot off half of the RAID1 in read-write mode

Yeah, we definitely still need some fallback stuff here.

> 2. If rule 1 is violated, don't ever again boot using the RAID1
> - If you can abide by rule 2, you can do so indefinitely

This isn't enforceable in any meaningful way in the software.  In fact,
it's scarcely even detectable currently :/

> 3. There is no way to recover from a violated rule 1 without
> reinstalling.

That's not the case -- you can go into the bios and sync from the
"newer" disk to the older one.  Or if your bios is total junk, you can
boot some other media and (carefully) re-sync each partition with "dd".

Thanks for the testing and feedback!

-- 
  Peter, wishing there was some way to tell the kernel to forget
  about partitions which have already been scanned...