[SOLVED] Re: RAID adventures

Tue Jul 20 17:11:31 UTC 2010

  On 07/19/2010 05:09 AM, Roberto Ragusa wrote:
> Konstantin Svist wrote:
>
>> so I sized it down a bit:
>> # mdadm --grow -z 293033472 --backup-file=/root/grow_md0_size.bak /dev/md0
> And there is your error.
> You resized the device without first resizing the fileystem.
>
> The filesystem is *in* the device.
> So if you want to enlarge, you enlarge the device and then the filesystem.
> If you want to shrink, you shrink the filesystem and then the device.
>
> You basically destroyed a few random blocks from the end of the filesystem.
> I don't know how serious the damage is. fsck will tell you.
>
> It is also impossible to undo the error because you have reshaped the RAID
> after the shrinking, so the "lost" blocks are not easily reincludeable
> (with an opposite --grow, I mean).
>
> The small number of missing blocks and their position at the end of the disk
> give you a reasonable confidence that you will save your data.
> I'm not sure if it is better to run an fsck and pray or it is better
> to reenlarge the device with some zeroed blocks to avoid the "read beyond
> partition size" condition (I personally would think about linearly concatenation
> of your RAID with another small zeroed partition (or file) just to exceed the
> size of the "contained" filesystem).
>

My thinking process was this: I just added a new drive and md0 was 
rebuilt but FS was not resized -- which implies the end of the drives 
should've been empty. I'm pretty sure I gave the right command to mdadm, 
but for whatever reason it didn't listen to me.

After I changed the chunk size back to 64, I told it to resize to max 
size, which should've reverted everything:

# mdadm --grow -z max --backup-file=/root/grow_md0_size_back.bak /dev/md0
mdadm: component size of /dev/md0 has been set to 293033536K
# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdd1[3] sda4[0] sdc1[2] sdb1[1]
       293033472 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]

After stopping the array and re-assembling it, 3 of 4 drives came back 
up (because of a typo in /etc/mdadm.conf) and the array had proper size, 
though degraded.
I found 2 mistakes -- / was missing from /dev/sdd1 in /etc/mdadm.conf, 
and partition type of /dev/sdd1 was '83 Linux' instead of 'fd Linux raid 
autodetect'

Right now the 4th drive has been added and the array is running recovery.