Assistance building a backup server

Mon Mar 9 02:52:22 UTC 2015

On Thu, Mar 5, 2015 at 9:48 AM, Alex Regan <mysqlstudent at gmail.com> wrote:

> I currently have a 3TB backup system using five 1TB disks in RAID5. Restore
> times in case of disk failure are already exceedingly long,

Oh yeah, several things you need to check if you're using mdadm created RAID.

md/stripe_cache_size in sysfs defaults to 256, it needs to be higher,
probably 1024 but do some reading to get more specific advice. Low
values cause slow performance in general, but in particular rebuilds.

These too may be too low by default, in particular max.
/proc/sys/dev/raid/speed_limit_max
/proc/sys/dev/raid/speed_limit_min

A pernicious problem that's totally non-obvious and comes up on
linux-raid@ list all the time. Mismatching SCT ERC and SCSI command
timer. The former needs to be shorter than the latter. Both are per
drive, not per array.

smartctl -l scterc <dev>
cat /sys/block/sdX/device/timeout

And don't forget period scrubs:
echo check > /sys/block/mdX/md/sync_action
cat /sys/block/mdX/mismatch_cnt

-- 
Chris Murphy