[fwd] md raid1 chokes when one disk is removed
Danny Howard
dannyman at toldme.com
Fri Nov 11 22:16:28 UTC 2005
Hello, Fedorians!
I asked this question over on redhat-list. (Was hoping to ask RedHat
support.) But that list is something of a ghost town. Perhaps someone
on fedora-list can comment of md behaviour when a drive gets pulled?
I'd appreciate figuring this out.
Thanks in advance.
Sincerely,
-danny
----- Forwarded message from Danny Howard <dannyman at toldme.com> -----
Hello,
I am evaluating RHEL, prior to purchase for a new production network.
Our boxes are SuperMicro 6018HT with dual SATA drives.
I like to give my system a bit of added resiliency with RAID1. These
systems have pairs of SATA disks, but no hardware RAID. With FreeBSD, I
can set up a gmirror and have a RAID1 system. (I have documentation on
that at
http://dannyman.toldme.com/2005/01/24/freebsd-howto-gmirror-system/ )
So, for Red Hat, I checked the manual, and thought I'd give the Red Hat
method a shot.
Here's a capture of my Disk Druid:
http://www.flickr.com/photos/dannyman/61643870/
And, here's some info from the running system:
[root at linux ~]# cat /etc/fstab
# This file is edited by fstab-sync - see 'man fstab-sync' for details
/dev/md2 / ext3 defaults 1 1
/dev/md0 /boot ext3 defaults 1 2
none /dev/pts devpts gid=5,mode=620 0 0
none /dev/shm tmpfs defaults 0 0
none /proc proc defaults 0 0
none /sys sysfs defaults 0 0
/dev/md1 swap swap defaults 0 0
/dev/hdc /media/cdrom auto pamconsole,fscontext=system_u:object_r:removable_t,exec,noauto,managed 0 0
/dev/fd0 /media/floppy auto pamconsole,fscontext=system_u:object_r:removable_t,exec,noauto,managed 0 0
[root at linux ~]# mount
/dev/md2 on / type ext3 (rw)
none on /proc type proc (rw)
none on /sys type sysfs (rw)
none on /dev/pts type devpts (rw,gid=5,mode=620)
usbfs on /proc/bus/usb type usbfs (rw)
/dev/md0 on /boot type ext3 (rw)
none on /dev/shm type tmpfs (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
[root at linux ~]# cat /etc/mdadm.conf
# mdadm.conf written out by anaconda
DEVICE partitions
MAILADDR root
ARRAY /dev/md2 super-minor=2
ARRAY /dev/md0 super-minor=0
ARRAY /dev/md1 super-minor=1
[root at linux ~]# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdb2[1] sda2[0]
2032128 blocks [2/2] [UU]
md2 : active raid1 sdb3[1] sda3[0]
76011456 blocks [2/2] [UU]
md0 : active raid1 sdb1[1] sda1[0]
104320 blocks [2/2] [UU]
unused devices: <none>
Sweet! I can "fail" a disk and remove it thus:
mdadm --fail /dev/md0 /dev/sdb1
mdadm --fail /dev/md1 /dev/sdb2
mdadm --fail /dev/md2 /dev/sdb3
[ ... physically remove disk, system is fine ... ]
[ ... put the disk back in, system is fine ... ]
mdadm --remove /dev/md0 /dev/sdb1
mdadm --add /dev/md0 /dev/sdb1
mdadm --remove /dev/md1 /dev/sdb2
mdadm --add /dev/md1 /dev/sdb2
mdadm --remove /dev/md2 /dev/sdb3
mdadm --add /dev/md2 /dev/sdb3
[ ... md2 does a rebuild, but /boot and <swap> are fine -- nice! ... ]
Okay, but what if a disk fails on its own?
[root at linux ~]# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdb2[1] sda2[0]
2032128 blocks [2/2] [UU]
md2 : active raid1 sdb3[1] sda3[0]
76011456 blocks [2/2] [UU]
md0 : active raid1 sdb1[1] sda1[0]
104320 blocks [2/2] [UU]
unused devices: <none>
[ ... pull sdb ... ]
[root at linux ~]# cat /proc/mdstat
ata1: command 0x35 timeout, stat 0xd0 host_stat 0x61
ata1: status=0xd0 { Busy }
SCSI error : <0 0 1 0> return code = 0x8000002
Current sdb: sense key Aborted Command
Additional sense: Scsi parity error
end_request: I/O error, dev sdb, sector 156296202
md: write_disk_sb failed for device sdb3
ATA: abnormal status 0xD0 on port 0x1F7
md: errors occurred during superblock update, repeating
ATA: abnormal status 0xD0 on port 0x1F7
ATA: abnormal status 0xD0 on port 0x1F7
ata1: command 0x35 timeout, stat 0x50 host_stat 0x61
[ ... reinsert sdb ... ]
Personalities : [raid1]
md1 : active raid1 sdb2[1] sda2[0]
2032128 blocks [2/2] [UU]
md2 : active raid1 sdb3[1] sda3[0]
76011456 blocks [2/2] [UU]
md0 : active raid1 sdb1[1] sda1[0]
104320 blocks [2/2] [UU]
unused devices: <none>
I don't like that the system seems to choke when the disk is removed
unexpectedly. Is this intended operation? Do I need to massage my SCSI
subsystem a bit? What's up? :)
Thanks for you time.
Sincerely,
-danny
--
http://dannyman.toldme.com/
--
redhat-list mailing list
unsubscribe mailto:redhat-list-request at redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list
----- End forwarded message -----
More information about the users
mailing list