Hi, I have a fedora21 system with three md RAID1 arrays. I had used this system to temporarily access an external array connected to an LSI controller in the system. So I inserted the LSI controller, accessed the data on the disks connected to it, and removed the controller from the system.
Now, all three md RAID arrays are in some type of degraded state:
# cat /proc/mdstat Personalities : [raid1] md2 : inactive sdb1[1](S) sdc1[4](S) 1953519616 blocks super 1.2
md0 : active raid1 sda1[0] 511988 blocks super 1.0 [2/1] [U_]
md1 : active raid1 sda2[0] 237566840 blocks super 1.2 [2/1] [U_] bitmap: 2/2 pages [8KB], 65536KB chunk
Thankfully the system boots. The inactive array, md2, is connected to /var/backup, so I just commented it out of /etc/fstab to get the system to boot.
It appears the devices were somehow impacted by the LSI controller, and now the partitions/devices have been reordered.
How do I realign these slices to rebuild the array?
Thanks, Alex
It looks like a device is completely missing?
blkid > blkid.txt
Then for each md device UUID pair (the md UUID will show up twice in blkid for each of the three arrays):
mdadm -E <dev1> <dev2> >> mdXraidstats.txt
Where X=the md device from mdstat. That way you get the superblock from each device making up each array into an array specific text file. They can go all in one file, but it's easier to keep things separate.
journalctl -b -l -o short-monotonic > journal.txt /etc/mdadm/mdadm.conf
Post those files somewhere and the URL here.
The journal should say why it wasn't assembled. But the main thing is you should get copies of all device superblocks before you do anything else, especially reading around the internet where often people get the idea that they need to use -C to recreate an array which obliterates the old metadata and almost invariably kills their data.
Chris Murphy
Alex writes:
Hi, I have a fedora21 system with three md RAID1 arrays. I had used this system to temporarily access an external array connected to an LSI controller in the system. So I inserted the LSI controller, accessed the data on the disks connected to it, and removed the controller from the system.
Now, all three md RAID arrays are in some type of degraded state:
# cat /proc/mdstat Personalities : [raid1] md2 : inactive sdb1[1](S) sdc1[4](S) 1953519616 blocks super 1.2
md0 : active raid1 sda1[0] 511988 blocks super 1.0 [2/1] [U_]
md1 : active raid1 sda2[0] 237566840 blocks super 1.2 [2/1] [U_] bitmap: 2/2 pages [8KB], 65536KB chunk
Thankfully the system boots. The inactive array, md2, is connected to /var/backup, so I just commented it out of /etc/fstab to get the system to boot.
It appears the devices were somehow impacted by the LSI controller, and now the partitions/devices have been reordered.
How do I realign these slices to rebuild the array?
The first step is to see what
mdadm --detail /dev/md0
says. Repeat for md1, et al…
Each one of your arrays is either missing a drive, or the other drive is marked as faulty.
If it's missing, use --add to add each drive to it's array. If mdadm says the drive has failed, try to --remove, then --add it back in.
Hi,
It looks like a device is completely missing?
blkid > blkid.txt
Then for each md device UUID pair (the md UUID will show up twice in blkid for each of the three arrays):
mdadm -E <dev1> <dev2> >> mdXraidstats.txt
Where X=the md device from mdstat. That way you get the superblock from each device making up each array into an array specific text file. They can go all in one file, but it's easier to keep things separate.
journalctl -b -l -o short-monotonic > journal.txt /etc/mdadm/mdadm.conf
Post those files somewhere and the URL here.
The journal should say why it wasn't assembled. But the main thing is you should get copies of all device superblocks before you do anything else, especially reading around the internet where often people get the idea that they need to use -C to recreate an array which obliterates the old metadata and almost invariably kills their data.
Thanks so much for your help. In the process of disconnecting the LSI controller and power cable connected to the disks, somehow the power cable at the power supply side was loose for one of the other system drives. Using these commands, I was able to figure out there really was a device physically missing.
I panicked a bit, and didn't want to do something stupid (like mdadm -C) before passing it by at least one other person. After connecting the power cable, I was able to easily "mdadm --add" the drives back and rebuild both degraded arrays. The /var/backup partition came up automatically.
Another great mdadm command to know is --details.
Thanks again guys, Alex
On Tue, Aug 4, 2015 at 6:38 PM, Alex mysqlstudent@gmail.com wrote:
I panicked a bit, and didn't want to do something stupid (like mdadm -C) before passing it by at least one other person.
I definitely suggest you make a backup of each device superblock and distribute it somewhere safe in the unlikely event two devices in a mirrored array stop cooperating. The superblocks do come in handy from time to time when they can't be acquired once problems happen.
The other thing is, if you're panicked, it kinda suggests no recent backup. I can't tell you how many sad panda faces I see on many lists, including the linux-raid@ list (all things RAID on Linux but mainly md/mdadm) and it's like, really no backup? The brutal response is, if you don't have a backup the data isn't important.
After connecting the power cable, I was able to easily "mdadm --add" the drives back and rebuild both degraded arrays. The /var/backup partition came up automatically.
Great!
Another great mdadm command to know is --details.
Yeah it's easy to get -D and -E confused. -D is pointed at the running array and -E is pointed at devices. But they're functional equivalents.
Hi,
I panicked a bit, and didn't want to do something stupid (like mdadm -C) before passing it by at least one other person.
I definitely suggest you make a backup of each device superblock and distribute it somewhere safe in the unlikely event two devices in a mirrored array stop cooperating. The superblocks do come in handy from time to time when they can't be acquired once problems happen.
Do you have any suggestions on how to do this?
The other thing is, if you're panicked, it kinda suggests no recent backup. I can't tell you how many sad panda faces I see on many lists, including the linux-raid@ list (all things RAID on Linux but mainly md/mdadm) and it's like, really no backup? The brutal response is, if you don't have a backup the data isn't important.
I do have a backup, but it would be a pain to rebuild everything. It would be a major inconvenience. No one really ever wants to go through that.
Thanks, Alex
On Tue, Aug 4, 2015 at 9:02 PM, Alex mysqlstudent@gmail.com wrote:
Hi,
I panicked a bit, and didn't want to do something stupid (like mdadm -C) before passing it by at least one other person.
I definitely suggest you make a backup of each device superblock and distribute it somewhere safe in the unlikely event two devices in a mirrored array stop cooperating. The superblocks do come in handy from time to time when they can't be acquired once problems happen.
Do you have any suggestions on how to do this?
mdadm -E for each device, and point it to a file using >> - for the purposes of the backup you could write all devices to one file, it's up to you.
Hi,
I definitely suggest you make a backup of each device superblock and distribute it somewhere safe in the unlikely event two devices in a mirrored array stop cooperating. The superblocks do come in handy from time to time when they can't be acquired once problems happen.
Do you have any suggestions on how to do this?
mdadm -E for each device, and point it to a file using >> - for the purposes of the backup you could write all devices to one file, it's up to you.
Ah, I thought you were referring to actually storing information from the disk itself, such as the bootsector or MBR, not just a list of the partition info.
There's also a great resource on recovering a failed array here:
https://raid.wiki.kernel.org/index.php/RAID_Recovery
Thanks, Alex