F10+dmraid eats puppies! (and ate my system too)
Robert L Cochran
cochranb at speakeasy.net
Sat Dec 13 12:31:38 UTC 2008
Could you have used one or more of: dd, ddrescue, and Testdisk to copy
your system to a set of spare drives and then work only on the spares
until you had a clear idea of what was wrong? I think that would have
gone a long way to sparing you from some data loss.
Graham TerMarsch wrote:
> I ran into this earlier in the week and after finally getting my machine back
> online am surprised to see that people aren't making a big stink about
> this... its got subtle nuances that make it nearly impossible to fix without
> loss of data.
> I've found the following threads/bugs that appear related:
> Here's what happened to me...
> I upgraded from F9 to F10 back on Nov 29th, and things seemed fine. I
> upgraded the kernel last Wednesday, rebooted, and started seeing all sorts of
> crazy weirdness. At first the system wouldn't boot at all, dying on errors
> of "killing init" and "corrupted libraries". I thought it sounded like FS
> corruption, so I booted the rescue CD, ran fsck (which came back clean), and
> then proceeded to re-install some of the packages with the corrupted
> libraries, so I could at least get the machine up and running again.
> After several cycles of "rescue CD, install packages, reboot, fail", I
> decided that even if I could get it running I wasn't going to trust it. Went
> back to the rescue CD, and started backing up files onto other machines on
> the network here.
> I then re-installed the machine, leaving my "/home" and "/usr/local"
> partitions as they were; reformatted everything else, but left those alone.
> Got the system up, but was then presented with the most shocking thing... it
> looked like my machine had basically done time-travel and was now *exactly*
> as it was on November 29th. Files I know I'd edited were missing changes, e-
> mails were lost, databases were missing data.
> Took me a while to figure it out, but here's what happened...
> When I upgraded from F9 to F10, Anaconda detected my nvidia dmraid mirror and
> installed F10 onto both halves of the mirror. When I rebooted, though, it
> only picked up *ONE HALF* of the mirror... /dev/sda. It had the UUIDs right,
> but it didn't mount /device/mapper/nvidia_xxxx but mounted sda instead. When
> I did the kernel upgrade this week, *that* mounted sdb. When I reinstalled,
> it *also* mounted sdb, not sda or dmraid.
> When I looked at sda directly, I saw all of my recent changes to files that
> I'd made since the 29th. When I looked at sdb directly, it was a snapshot of
> what my machine looked like on the 29th.
> When we actually manage to get the bug fixed that caused this, anyone who's
> had this problem is potentially going to be in for a bigger world of hurt
> when applying the fix... I don't even think we can (with confidence) just
> nuke one half of the mirror and rebuild based on whats on the other half; how
> do we know which half they've been using? In my case, I'd made ~2wks of
> changes to sda not knowing that I was only using half the mirror, and then
> after updating the kernel got bumped over to sdb and made changes there while
> trying to fix it. Neither one was a mirror of the other, and each one had
> something on it that needed to be preserved. YUCK.
> Once I realized what'd happened to my machine I went into the BIOS and turned
> off the nvidia fakeraid and re-installed directly onto the two drives. Isn't
> what I want as I'd at least like to have _some_ mirror of my data somewhere,
> but it was the only way I could get this machine running again.
> Be forewarned.... F10+dmraid is *DANGEROUS* right now...
More information about the users