Raid vs rsync -

Rick Stevens ricks at alldigital.com
Tue Mar 10 21:12:58 UTC 2015


On 03/10/2015 12:24 PM, Bob Goodwin wrote:
>
>
> On 03/10/15 12:29, Gordon Messmer wrote:
>> On 03/09/2015 11:04 AM, Bob Goodwin wrote:
>>> However I have been wondering if it wouldn't work just as well to
>>> periodically rsync the drive in use with a second drive?
>>
>> I know I'm going to repeat some of what has already been said.  My 2c
>> anyway:
>>
>> No, rsync would not work just as well.
>>
>> Do you want your system to continue functioning when one of your
>> drives fails? If so, then set up RAID1 and make sure you actually get
>> and read email from cron jobs.  In the event of failure, the mdmonitor
>> service will send email to "root" to indicate that a drive needs to be
>> replaced.  The down side is that your data will have no backups.  If a
>> file is accidentally deleted or corrupted, you probably have no recourse.
>>
>> Do you, instead, want multiple levels of online backups?  In that
>> case, there are a handful of backup applications, including rsnapshot,
>> that handle rotation and rsync to provide efficient backups.  If your
>> primary drive fails, you'll deal with the outage while you get a
>> replacement drive, format it, install a system, restore your data,
>> etc, which could be a fairly long process. Instead, you'll gain very
>> coarse file versions and protection from accidental deletions.
>>
>>> Am I going wrong somewhere in my thinking?
>>
>> Thinking that you have to make a choice between the two may be in
>> error.  Depending on the size of your disks and the amount of data on
>> them, you could potentially have both RAID1 and backups.
>>
>> Build a system with a RAID1 mirror on the two drives that uses half of
>> the available space for your system, and half of the space for a
>> separate backup filesystem.  Keeping the backup filesystem separate
>> provides some additional protection against filesystem corruption.
>> It's still possible for some errors to destroy both your system and
>> its backups, but in most cases, you'll get good coverage for the most
>> common failures with this setup.
> .
> Well as I said earlier, I mainly want to have files that I consider
> critical backed up somewhere. I'm not very much concerned about
> equipment failure and downtime. I've been backing stuff up between
> computers using rsync so that I'll always have a fairly recent copy of
> my notes, checking account, genealogy, etc.
>
> Presently most of that data resides on a 1 TB drive in an NFS server
> running SL-6. I have two new WD/black 1 TB drives, that I am going to
> use on the computer I'm working on which is presently running Fedora-21,
> probably not the best choice for the purpose but I thought I'd try it.
>
> I also have a Raid1 samba server [SL-7], mainly to deal with the family
> Apple users [everyone but me]. I have never received any messages from
> that. As it stands I might not know if there was a failure, I've been
> worrying about that!
>
> I am considering everything in the responses I've received ...

As people have remarked, RAID (at least RAID1, RAID5, RAID6 and RAID10)
is a way to make sure that one (or more) drive failures doesn't kill
your system. A drive can die in one of those and the thing keeps
running (albeit at a reduced rate and you lose the reliability until
the failed drive has been replaced and the "rebuild" process completes).

Depending on the physical size of any individual disk in the RAID array,
these rebuild times can get fairly lengthy, which is why I tend to use
RAID6 when drives go above 1TB. A RAID6 with one failed drive behaves
like a RAID5, so you can tolerate one more drive failure before you have
a data corruption problem:

	RAID6 minus 1 drive = RAID5
	RAID5 minus 1 drive = Degraded RAID 5 (cannot tolerate another
				failure)

I replace the failed RAID6 drive right away, but because it might take
a day or two to rebuild, it's highly possible a second drive might die
in that period. RAID6 reduces that window of vulnerability.

RAID0 is a way to spread data across multiple spindles (disk drives) to
improve performance but it does NOT offer redundancy. Use RAID10 for
that (RAID10 = a RAID0 with each drive in a RAID1 mirrored pair).

Note that all these RAID things do is make multiple drives appear to the 
operating system as a single physical drive. It is NOT a backup.

Regardless of WHERE the RAID array is (another physical machine, the 
machine you're running on, whatever), you have to get data ONTO the RAID
array somehow. rsync (and its various permutations such as rsnapshot,
UDR, etc.) is one way. Dedicated backup programs such as Bacula and 
Amanda are another. If the RAID is on the machine you want to back up,
you could even use tar, cpio or cp to do it.

We use Bacula because we have to back up a large number of machines
(well over 200). Bacula makes that management fairly tolerable and you
can define what gets backed up and when (full backups, incrementals,
snapshots, etc.).

The backup media where all the stuff goes is a big storage array (one
filesystem on an HP StoreAll 9730), which (at its core) is a whole lot
of RAID6 arrays.

----------------------------------------------------------------------
- Rick Stevens, Systems Engineer, AllDigital    ricks at alldigital.com -
- AIM/Skype: therps2        ICQ: 22643734            Yahoo: origrps2 -
-                                                                    -
-    First Law of Work:                                              -
-    If you can't get it done in the first 24 hours, work nights.    -
----------------------------------------------------------------------


More information about the users mailing list