After a super-fun-time debacle restoring a single file today I'd like to talk about our backups a bit.
Right now our backups are:
- bacula to a few central servers and then off to tape.
That seems like it is not scaling super-duper well for our size of disk storage. It also seems like it is a wee bit cumbersome to use. :)
In the best of all possible infinite-money worlds I'd love to have enough disk space to offer multiple snapshots of every filesystem and/or a complete disk-to-disk copy with deduping (obnam) or with reverse diffs (rdiff-backup). But let's assume that world is not likely to exist and figure a few things out:
1. where are we backing up that we don't need to?
2. are there places that we can backup that really would benefit from being a warmer-backup always available in a filesystem somewhere
3. Is there any good way to couple snapshots with our tape system to make our backups a little simpler to deal with?
4. What level of bare metal-disaster-recovery do we actually HAVE with our existing system and have we ever tested any of those cases?
I do not know when I will get the time to put into fixing any of these things up - but after today it is clearly on my list of things to think about.
-sv
On Mon, 18 Mar 2013 21:25:23 -0400 seth vidal skvidal@fedoraproject.org wrote:
After a super-fun-time debacle restoring a single file today I'd like to talk about our backups a bit.
Right now our backups are:
- bacula to a few central servers and then off to tape.
(we also have disk based backups of only some more critical stuff).
That seems like it is not scaling super-duper well for our size of disk storage. It also seems like it is a wee bit cumbersome to use. :)
Agreed.
In the best of all possible infinite-money worlds I'd love to have enough disk space to offer multiple snapshots of every filesystem and/or a complete disk-to-disk copy with deduping (obnam) or with reverse diffs (rdiff-backup). But let's assume that world is not likely to exist and figure a few things out:
- where are we backing up that we don't need to?
So, right now on many of the machines we backup we are just backing up the entire thing. This is nice in that it means if someone has something in their homedir, or a log file we need is in /var/log, etc we can get it (at least in theory).
However, we could change that to just target /etc /srv and/or any places that actually have content vs OS.
- are there places that we can backup that really would benefit from
being a warmer-backup always available in a filesystem somewhere
Possibly fedorahosted, fedorapeople, pkgs?
- Is there any good way to couple snapshots with our tape system to
make our backups a little simpler to deal with?
I fear snapshots might make it more complicated.
- What level of bare metal-disaster-recovery do we actually HAVE with
our existing system and have we ever tested any of those cases?
I'm not sure we have. Basically it should be: Install new OS, re-puppet, then restore any data/content from backups.
I do not know when I will get the time to put into fixing any of these things up - but after today it is clearly on my list of things to think about.
Yep. Me too.
kevin
On 18 March 2013 19:25, seth vidal skvidal@fedoraproject.org wrote:
After a super-fun-time debacle restoring a single file today I'd like to talk about our backups a bit.
Right now our backups are:
- bacula to a few central servers and then off to tape.
That seems like it is not scaling super-duper well for our size of disk storage. It also seems like it is a wee bit cumbersome to use. :)
In the best of all possible infinite-money worlds I'd love to have enough disk space to offer multiple snapshots of every filesystem and/or a complete disk-to-disk copy with deduping (obnam) or with reverse diffs (rdiff-backup). But let's assume that world is not likely to exist and figure a few things out:
- where are we backing up that we don't need to?
We currently back up the following systems: ask01 bastion01 bastion02 collab02 db-fas01 db01 db04 db05 fas01 hosted-lists01 hosted02 lockbox01 log02 noc01 people03 pkgs01 proxy01 proxy02 releng03 releng04 relepel01
Most of those are quick backups.. but a couple of them are slow long things. Looking at that list.. there may be some thing we need to backup that we aren't.. more than us backing up stuff we shouldn't.
- are there places that we can backup that really would benefit from
being a warmer-backup always available in a filesystem somewhere
I would say that it would be quite useful for lots of things. If anything I would love to have a backup system that backs stuff to a disk tree per box and then tape backups that set of disks versus just going to disks. It would be easier for us to do disaster recovery by dumping those disks to multiple sites (though it means more dealing with encrypted disks and such.)
- Is there any good way to couple snapshots with our tape system to
make our backups a little simpler to deal with?
I have seen a couple of methods but they aren't snapshots like LVM and such (I found LVM snapshots were more painful to deal with on backups but I think it was mainly slow slow disks.)
- What level of bare metal-disaster-recovery do we actually HAVE with
our existing system and have we ever tested any of those cases?
We have tested bare metal a couple of times. Tested in the sense that one of the boxes we have backed up is dead and we needed to restore stuff that was on it.
I do not know when I will get the time to put into fixing any of these things up - but after today it is clearly on my list of things to think about.
-sv
infrastructure mailing list infrastructure@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/infrastructure
infrastructure@lists.fedoraproject.org