[PATCH] Ensure that rsync backups are only running one at a time

Wed Mar 24 17:24:49 UTC 2010

On Wed, 24 Mar 2010, Toshio Kuratomi wrote:

> On Wed, Mar 24, 2010 at 10:54:07AM -0500, Mike McGrath wrote:
> > ---
> >  configs/db/backup-dbs |    2 ++
> >  1 files changed, 2 insertions(+), 0 deletions(-)
> >
> > diff --git a/configs/db/backup-dbs b/configs/db/backup-dbs
> > index 83ed6c8..7af66fb 100755
> > --- a/configs/db/backup-dbs
> > +++ b/configs/db/backup-dbs
> > @@ -20,6 +20,8 @@ mv $DEST/$HOSTNAME.new $DEST/$HOSTNAME
> >
> >  # Sync out
> >  for host in db01 db02 db03; do
> > +    # Sleep if any other rsyncs are already
> > +    while ssh $host "pgrep rsync" | grep -q [0-9]; do sleep 10; done
> >      if [ "$host" != $HOSTNAME ]; then
> >          su - dbbackup -c "ssh $host mkdir -p $DEST/$HOSTNAME"
> >          su - dbbackup -c "rsync -azr --bwlimit=5000 -e ssh $DEST/$HOSTNAME/* $host:$DEST/$HOSTNAME/"
> >
> So this means that among the three db hosts, db01, db02, and db03 we'll have
> at most one rsync running at any one time.  That's going to increase the
> time to sync some more.
>
> I think that this code wonn't quite function properly when more than one
> backup-dbs script runs on a box at a time... what gets transferred to the
> remote host will be a mixture of what was in $DEST/$HOSTNAME when the rsync
> starts and what's in there when the rsync ends.. We would get errors if
> filenames were removed (ie: a database is removed while the rsync is still
> procesing).  Do to copying the directory prior to rsyncing, I don't think
> we'll get corruption of the actual dump files -- we'll just get dump files
> from two separate runs intermingled on the host being backed up to.
>
> It doesn't look like the scripts are currently stacking on a single host.
> They run at every six hours on db01 and db02 (every 12 hours on db03).
>
> So the risk of problems at this time doesn't seem too bad.
>
> +1
>

In the above scenarios we'd at least get an error.  The sync times only
take a few minutes right now.  If one hangs long enough to hit another
backup we'd at least get notified about it.  We'd be notified now and so
far that hasn't happened but it's something to think about.  perhaps we
want to re-think this whole script and just store the data somewhere else
completely.

The initial idea was that if say, db1 died.  Db2 and db3 could take it's
place, and since the dumps are already there, downtime is kept to a
minimum.  But if it's causing us performance issues getting the data there
perhaps it's not worth it.  I may also want to tone the rsync throttle
down a bit more.  Something to think about after the freeze.

	-Mike