BTRFS/Rollback & Yum Snapshot Plugin

Chris Murphy lists at colorremedies.com
Mon Feb 3 18:26:34 UTC 2014


On Jan 31, 2014, at 10:54 AM, Jorge Fábregas <jorge.fabregas at gmail.com> wrote:
> 
> Ok, I might check that.  All these years I've been using Clonezilla to
> do a "partition image" of my root filesystem before applying major
> updates.  I have the ISO stored on my disk and GRUB2 is configured to
> boot ("loopback") this image (this is great as I don't need to use a USB
> stick or CD anymore).   Of course, using this btrfs snapshot
> functionality with yum is by far way better & sophisticated than my
> current method.  I wished it were more polished. That is, as soon as yum
> snapshots your system, it should create the proper GRUB2 entry (pointing
> to your snapshot volume so you can easily rollback) instead of you doing
> it manually.

Yes, there are challenges, much of which are the result of unanswered questions. For example:

What should the layout be? (See opensuse's layout for an alternative to Fedora's.) How discoverable should this be for mortal users vs experts? How far back should rootfs snapshots be bootable? Should snapshots initially be read-only? Maybe. If a snapshot is rw, and then booted, it's immediately changed, and if it's further changed by being updated or modified by the user, is it really a snapshot of a file tree in a particular state? No.

If snapshots are read-only, how do we boot them? What system changes are needed for rootfs to always be ro in normal use, and only made rw when there's a system update? Or alternatively, when doing the rollback, do we make a rw snapshot of the ro snapshot, and boot the rw version? And then how do we clean up all of the ensuing snapshots?

Also, the fstab in all of the snapshots are wrong. An unmodified fstab in a snapshot causes the parent subvolume of all snapshots to be mounted, not the snapshot. There are multiple ways to solve this, it's not so much a technical problem as it is a "determining best practices" by imagining many use cases, and figuring out the liabilities of each potential solution.

Also, /boot quickly will contain updated kernels that can't boot old snapshots because the snapshots only contain older kernel modules. So that implies /boot needs snapshotting. Or we need limited snapshot/rollback to maybe just one older kernel. The main holdup for /boot on Btrfs is an old grubby bug RHBZ# 864198. Also, the freedesktop bootloaderspec calls for $BOOT being a non-snapshotable file system, while also being too small to accumulate many kernel+initramfs files so that old snapshots can be booted.

For what it's worth, currently both the yum plugin and snapper presume the parent subvols are the ones persistently used and modified; the snapshots are children and in normal operation aren't ever used. If a rollback is needed, it's the "child" snapshots that are used. This isn't the only way to do it. It's equally valid to snapshot the parent, modify and use the child in normal operation, and rollback to the parent. An advantage is that the parent subvol name and its fstab are already properly in sync, the child snapshot subvolume(s) of course have new names and are used in the modified fstab.

Note also that opensuse has a different layout for all of this than Fedora. They make the default subvolume ID 5 (the top level of the Btrfs file system, the first subvolume, the one that can't be deleted or named) the parent and mount it at /. And then create the following subvolumes:

boot/grub2/x86_64-efi
home
opt
srv
tmp
usr/local
var/crash
var/log
var/opt
var/spool
var/tmp

So which is the more discoverable layout? Well it depends on one's point of view. The expert who mounts a Fedora install on Btrfs doesn't see the linux FHS, and becomes confused initially. They see what looks like two directories: root and home (on Fedora 19 they might also see boot if they opted to put /boot on Btrfs). Because the mount command doesn't show the subvolume that's mounted, the assembly of the on-disk layout into the mounted file system isn't obvious. It only becomes clear once understanding subvolumes can be (almost completely) independently mounted, and looking at /etc/fstab which shows the subvol= mount option.

Anyway, point is, even in the infant stage of Btrfs as a root file system, two distros have two completely different layouts and snapshotting behaviors. I've argued that we need some interdistro conversation on something like an FHS addenda that tackles some standardization or best practices for how to organize such file systems and their snapshots. It probably should also account for LVM thin provisioning, which enables somewhat similar functionality. Or we'll just have to live with ensuing messiness.

And standardization is better for bootloader development too, so that we don't have all of these different distro patches accounting for six ways to Sunday boot strategies. More like 60 ways…

Not last, not least, is GRUB. It has several pieces, and relevant different behaviors on BIOS and UEFI that I won't bring up here. However, there are some ideas how to make a static grub.cfg dynamically produce entires for Btrfs snapshots on grub-devel@, and I think that's more useful than always rewriting grub.cfg.

But yes, I wish it were more polished too.

> 
> Of course, I'm just testing this on a VM.  My main system still runs
> ext4 across the board.  I plan to start using btrfs gradually.  My
> nearest plan is to switch my /backup partition (where my backup drive
> is) to btrfs.  I plan to rsync my /home over there and then create the
> snapshots there (and do some snapshot rotation of course).
> 
>> Do you know if these are read only snapshots?
>> 
>> btrfs sub show /home/yum_20140130172422
> 
> They aren't.  I just checked now and could write to them.

If they are read-only they have metadata at the Btrfs level that flags them such, which can be seen with btrfs sub show <subvpath>. They aren't writable even by root. Of course root can just snapshot the ro snapshot, which creates a rw snapshot that can be modified. But then, if we wanted to audit all of this, we'd know that this happened because each subvolume has a UUID. A snapshot is a subvolume, and each subvolume has a UUID. A snapshot also references its parent's UUID.


Chris Murphy



More information about the users mailing list