On 11/04/2011 09:46 AM, Richard W.M. Jones wrote:
On Fri, Nov 04, 2011 at 08:51:08AM -0400, Mo Morsi wrote:
> On 11/04/2011 08:36 AM, Richard W.M. Jones wrote:
>> How large is the snap metadata, ie. the stuff that you copy between
>> the machines? How large would it be given, say, a typical
>> database-backed webserver installation where you might have lots of
>> static contents and some database tables?
> One of the nice things that I added to Snap was the ability to
> ignore static content managed by the package management system. For
> example when taking a snapshot of the filesystem, only the files
> modified post installation and the files not tracked by the package
> system will be backed up and restored.
>
> It should be simple enough to expand upon this concept, adding
> additional hooks to call out to to determine what exactly should be
> backed up and restore (hooks to be invoked during the backup /
> restoration process is already a feature on the project todo list /
> backlog).
>
>> Is the metadata in an ad-hoc format and how hard would it be to turn
>> it into a standard format (probably one that we would standardize
>> ourselves)? Can it be useful in other contexts -- eg. could a system
>> administrator look at the output in order to get a definitive list of
>> the changes made to the machine? Could it be useful for auditing?
>> Could the format be diffed?
>>
> Right now the snapshot is a simple tarball containing the actual
> contents of the snapshot and the metadata in XML files. So for
> example there is a packages.xml file which contains the packages
> which have been recorded, services.xml containing the services and
> associated metadata, etc. We can use this as the basis of the
> standard, easily encapsulating any required information there.
Can you give us some numbers -- how big was the tarball for the
migration you did?
2.7MB
This included the mediawiki db dump which was close to 1MB, and
inspecting the snapshot I realize that it can be optimized further to
reduce a bit of unnecessary cruft.
If we had a more formal description, then it could be the basis for a
useful collection of tools.
snap
puppet manifest
snap formal
--------> spec for --------> sysadmin
libguestfs- VM
based tool auditing
| ^
| |
+------+
p2v, v2v
I think your demonstration only worked with a bit of luck. For v2v we
rewrite a lot of configuration files, install virtio drivers etc. In
terms of a formalized snap description, that process is a kind of
transformation.
Sure thing, the demo was a proof of concept, though quite a bit of
refactoring went into making the tool more modular and pluggable so that
it can be easily extended and adapted to meet whatever snapshot and
restoration needs.
Agree that a formal metadata and api definitions would be very useful to
have, added that as a high priority item to the TODO list.
How (if at all) does this apply to Windows?
So yes I have been throwing around different ideas of how to accomplish
different aspects of the backup/restoration process on Windows
- The repositories bit obviously does not apply, so that snapshot target
can be ignored on those systems.
- Packages would only apply in a very limited manner, eg we can record
what software is installed and what versions, but even then it would
require more end user interaction / intervention in the restoration
process.
- Backing up files should be straightforward, albiet without the support
of a package system to determine which files are redundant. Snap already
supports the user selection of which files to backup / restore and we
can combine this with fixed lists of files to ignore for different
windows versions
- Services should be simple enough and will work in the same manner as
on Linux, eg for windows we know how to backup a postgres db, a running
IIS webserver, or whatever other services.
So all in all, snap will be able to work in a consistent manner across
linux distros, Windows, and even Mac OSX!
-Mo