How to make a block-level incremental backup using LVM?

Fri Dec 14 22:54:55 UTC 2012

The biggest difficulty in answering your question is that you asked 
about a specific method to solve a general problem, without specifying 
any other requirements.  In particular, in order to give you good 
answers we need to know whether you need only single near-line backups, 
or whether you need multiple snapshots, or whether you need those and 
also off-site backups on tape or on replicated systems.

The short answer is that replication can get you redundancy (which is 
not a backup), rsnapshot can often do a good job of getting very 
space-efficient online snapshots, and that Bacula Enterprise is an 
excellent option for backing up data to removable data like tape.  You 
can probably get an ideal backup solution using some combination of 
those systems.  Finding that combination is the complicated part.

Each of those things solve some portion of the problem that you're 
asking about.  First, you asked about block level change tracking. 
That's exactly what replication requires, and you'll find that a 
replicated filesystem provides exactly that: they track block changes 
and can efficiently transfer those blocks to a remote system.  Alan 
mentioned ceph, and that's probably a great solution.  Your production 
systems, local and remote, should have their data on a replicated volume 
where at least one of the replicas is your backup server.  One backup 
server can serve as the replica of all of the volumes with data in all 
of your production systems.  Once that's in place, until the backup 
storage array fails, you'll never have to transfer a full backup again. 
  Whether you back up to rsnapshot or removable media, you'll back up 
from the local filesystem and you've eliminated the network as a 
bottleneck for backups.

If you only need online backups, you may be able to get that with 
rsnapshot.  rsnapshot is fairly good when you're not dealing with very 
large files (such as databases).  If you combine that with ceph, your 
backup system will need one volume to replicate your production data and 
a second volume to back it up.  At this point, you'll have eliminated 
the network bottleneck at a cost of more disk storage (which is fairly 
cheap, compared to the cost of increasing the speed of the network).

If you need offline backups such as tape, bacula can also back up from 
that locally replicated volume.  Bacula Enterprise can provide the other 
bits you mentioned wanting: a web dashboard and easier 
configuration/management.

A few more notes follow:

On 12/14/2012 04:42 AM, Fernando Lozano wrote:
> We already have a few TB on file shares (Samba) and mailboxes (Zimba)
> and just moving those bits around for our weekly full backup is proving
> to be too slow for our Gb network and impossible for the hosted machines
> we use as contingency and off-site backup . Beisdes, incremental backups
> are taking a too long time just scanning the file systems searching for
> changed files.

If scanning your filesystem takes too long, your storage array is 
probably too slow.  Consider using RAID10 instead of RAID5/6.  Consider 
using SSDs instead of hard drives.  Consider using a fast additional 
drive or array as your ext4 journal.

> Sory for the long story, the question: could I implement block-level
> backups using dump, dd, and some LVM or ext utility? Maybe using
> inotify? Why no open source backup tool seems to be doing this?

Mostly because inotify only allows you to track which files are changed, 
and only for files that are changed while the tracking daemon is 
running.  OS X does something very much like this for Time Machine: a 
small daemon logs the files from a kernel notification.  The kernel 
keeps a small notification queue (which Linux does not, as far as I 
know), so that if the daemon stops and for files that are modified 
during the boot sequence before that tracking daemon starts, the 
tracking daemon can still keep a log which Time Machine will back up. 
If one of those components detects that the tracking daemon may have 
missed kernel notices, the system falls back to a full scan.

It's not a very complicated system, and could be duplicated fairly 
simply under Linux, but you'd fall back to full scans much more often 
since (again, as far as I know) there's no kernel notification queue, so 
a full scan would be required every time the tracking daemon starts. It 
doesn't have to wait on the start of a backup, however.  The tracking 
daemon could do the crawl as soon as it starts.

> Would any option allow me to restore an individual file?

Virtually every option does.  The only case in which you can't restore 
an individual file is when you replicate a volume to a system that 
doesn't understand the filesystem/volume contents.