Raid vs rsync -

Tue Mar 10 21:17:04 UTC 2015

On Tue, Mar 10, 2015 at 2:38 PM, Steven Rosenberg
<stevenhrosenberg at gmail.com> wrote:

> What usually happens is that a file is corrupted, by either man or
> machine, and then that corrupted data goes to your backup, and you are
> screwed.

SDC and propagation of corruption is a big problem actually especially
rotated media backups, or even the restore process, causes
derivatives. Each derivative inherits and creates its own SDC over
time. A file gets corrupt, and then backed up to one or more backups,
eventually replacing all good copies. And there's no notification of
this.

This even affects Btrfs and ZFS if it isn't exclusively used in the
chain. If corruption has already happened, of course Btrfs and ZFS
simply maintain and propagate that corruption. Granted it should stop
additional corruptions (or at least notify of them). But we need e.g.
/home to be on Btrfs to significant reduce this.

The network is a source of SDC also. It's conceivable to have a Btrfs
source and destination, using rsync between them, and for SDC to get
introduced without notice by the network. The BER for networks is
highly variable, it's not a static thing, it can change just by
pinching a wire. If you follow the specs exactly then you get a BER
that probably isn't being exceeded value, but that's it, no guarantee.

The best guarantee? Btrfs /home, Btrfs backups, and rsync with
checksumming enabled to confirm what's on the source is actually
what's on the destination. That's slow. There is a feature idea to get
rsync some Btrfs awareness so that this can be optimized, taking
advantage of work Btrfs has already done, rather than separately
compute additional checksums.

-- 
Chris Murphy