On Thu, 24 Dec 2020, 8:22 am Chris Murphy, <lists@colorremedies.com> wrote:
(I did read the whole thread but I'm gonna reply three times anyway :D)

On Wed, Dec 23, 2020 at 6:29 AM Patrick O'Callaghan
<pocallaghan@gmail.com> wrote:
>
> I have a directory I use to hold a Windows VM disk image
> (/home/Windows/...), and would like to snapshot it before playing with
> the QEMU settings. However it's already part of the /home subvolume, so
> is there a way of splitting it off on its own without having to create
> a new subvolume and sending the contents over? AFAIK subvolumes can be
> hierarchical so it would seem like a useful thing to be able to convert
> a subtree without all the copying, but the man page doesn't seem to
> address it.

Direct answer:

cd /home
btrfs subvolume create Windows2
chattr +C Windows2
cp Windows/* Windows2/
rm -rf Windows
mv Windows2 Windows

Explanation:

The logic of this is to set the nodatacow attribute on Windows2, which
then means overwrites by the guest will overwrite those same areas in
the file, rather than COW. There's no direct COW penalty per se, it's
just a redirection. The issue is it can result in fragmentation and
that'll make subsequent reads slower due to having to seek for that
fragment. It's a bigger problem on rotational drives, the seek latency
is high and so is rotational latency. For all the libvirt locations,
this is done for you on default clean Btrfs installs, when the pool
(directory) is activated. Includes: virt-manager, gnome-boxes,
cockpit.

mv will fail between destinations with different nodatacow settings,
hence cp. If the source and destination can support reflink copy, then
reflink copy is used by default starting with Fedora 33 on both Btrfs
and XFS. Note that reflink copies cannot cross mount points, that's a
VFS limitation. So what'll happen is, cp falls back to regular copy,
rather than being efficient.

The reason for making /home/Windows a subvolume in my opinion is not
to snapshot /home/Windows. But so you can snapshot /home and *exclude*
/home/Windows. Snapshots aren't recursive (for developers they can be,
see libbtrfsutil C API and Python bindings). So this keeps your VM
images nodatacow.

Instead what I do is "snapshot" the VM files by reflink copy within
their own directory. And yes they get quite a lot more fragmented.

cp win10.raw win10-updated.raw

Again, it's a reflink copy because (a) that's the cp default on F33
and (b) the source and destination files are going to be nodatacow
since we're staying in the same directory. So this cp is fast. And if
you 'du' the ../Windows directory it'll hilariously count these files
twice, but if you use 'btrfs fi du -s' you'll be introduced to some
new terminology: exclusive and shared data.

You might prefer doing snapshots the qcow2 way, although I have no
idea what "qemu-img snapshot" does because I've never used it. I've
only used "qemu-img create -b" to create a new qcow2 file that points
to the original as a backing file. Why? *shrug* I don't know. But I
throw it out there as an option.

Whether you use 'cp' to make a reflink copy (fast copy, efficient
copy) or you 'btrfs sub snap /home/Windows /home/Windows.20201223' or
however you want to name them, there is a gotcha. Those nodatacow
extents now *cannot* be overwritten because they've been snapshot.
They are now cow again for any writes that would be an overwrite. But
it's only a one time redirect. It'll COW a shared block to a new
location (exclusive block) and any subsequent changes for that block
will be nodatacow again... unless you make another snapshot.

So yeah, no free lunch. There's always a tradeoff. And that's why you
might consider a different way to preserve the images.

Windows NTFS in particular has fairly remarkable write patterns that
result in a lot of fragmentation on any file system. But it is
especially bad on Btrfs. I speculate that the dual journaling
technique it uses just ends up being turned into piles of "appends"
onto a sparse raw file if it's COW. Your best bet really is to
fallocate a raw file in a subvolume or directory with chattr +C set
and don't snapshot or reflink copy it. That's very much like what you
get with a raw file on XFS or ext4. And perhaps a tiny bit more
overhead than a conventional LVM LV, which is really your best bet if
performance is the top concern.

Another thing I use in virt-manager is cache mode unsafe. I can't
recommend it because, I mean, it's not safe. But it's a lot faster. So
if you don't care about your data, and I definitely don't, and you
care about performance, unsafe cache mode is awesome. And worth
finding another way to mitigate the possibility of guest writes
happening when power is yanked from the host and totally trashes the
guest's file system. (I have had it trash NTFS in this scenario; but
never once have I damaged Btrfs in such a guest. The unsafeness does
come from the host crashing, not the guest. You can probably get away
with force quitting the guest all you want. I do this all the time
because (a) I hate my data and (b) I'm impatient. :D But yeah I also
backup my data other ways so if it goes kaboom, it's actually fun
because I get to use my backups.

I am the reason for my backups...

Let me ask you how will snapshotting and restoration work with nested sub volumes?

Let's say I snapshot my root subvolume which contains a subvolume which contains my VM images. 

What will happen to this volume when I restore my root snapshot?