Technical Spec, better upgrade/rollback control

Colin Walters walters at verbum.org
Mon Feb 24 12:29:29 UTC 2014


On Mon, Feb 24, 2014 at 1:11 AM, Chris Murphy <lists at colorremedies.com> 
wrote:
> 
> Yes. Snapper on openSUSE is doing this already on Btrfs. I'm not sure 
> how it's dealt with on LVM thinp since /boot has to be outside LVM 
> thinp because while GRUB groks conventional LVM, it doesn't get thinp 
> yet. GRUB does understand /boot on Btrfs, but Fedora's grubby has a 
> problem with it [1]. I've also been making /var/log a separate 
> subvolume making it immune to rootfs snapshots and rollbacks.
> 

Note for OSTree, /var/lib/rpm -> /usr/share/rpm  (it's also immutable). 
 Same for /var/lib/yum.

> Is there good chance of optimizing OSTree to use LVMthin and Btrfs 
> snapshots instead of hardlinks, while still being in charge of the 
> proper semantic enforcement?
> 

Note OSTree already today uses BTRFS_IOC_CLONE if on btrfs for 
implementing the separate copies of /etc.  (Actually this happens via 
the generic g_file_copy() since 
https://git.gnome.org/browse/glib/commit/?id=5eba9784979e0b723c05a45cf767046607e4e759 
)

Beyond that though - because for OSTree, /usr is immutable, there isn't 
really a big advantage of thinp or btrfs snapshots.  Just try this 
right now on your laptop:

# Once for cold cache performance
time cp -al /usr /usr.copy
# And once for hot cache
time cp -al /usr /usr.copy2

For me (and this a real-world RHEL7 system with a 5.1G /usr):

[root at localhost /]# time cp -al usr usr.copy
real	0m5.199s
user	0m0.220s
sys	0m2.849s
[root at localhost /]# time cp -al usr usr.copy2
real	0m2.245s
user	0m0.166s
sys	0m2.049s

That's really fast enough for the use cases I envision, for now.  
Obviously FS/block snapshots have other advantages beyond being instant 
- for example, they don't incur lots of scattered writes to bump the 
refcounts of inodes.  But many systems already have that happening 
periodically to a lesser degree with the default of relatime anyways.

Where FS/block snapshots become *necessary* is if you have 
*uncontrolled writes* to /usr.  For example, with OSTree's hardlink 
model, I cannot allow arbitrary rpm %post code to run.  Each one has to 
be carefully audited to break hardlinks via "write new copy, rename" 
instead of doing edits in place.

This is necessary to allow a story for local software installation.  We 
don't need to do it though for the "pure replication" model where *no* 
RPM %post runs on client systems - it all happens on the build server.

This replication model where OSTree is strongest right now, and where 
the traditional package model is weakest, so I have been mainly 
emphasizing it.

That said, doing this careful auditing of RPM %post and in general 
laying the foundations for a package-like system on top of OSTree is 
very much in the long term plans.

> Yes I also don't consider one kind of "rollback" since there can be 
> different contexts. A user rolling back their /home doesn't mean 
> rolling back any other user's, or the system. Conversely rolling back 
> the system doesn't mean rolling back user /home or logs or some other 
> things. 
> 

Definitely.  OSTree doesn't touch /home (note this is now /var/home) - 
and so it makes a lot of sense to still have something that's more like 
a backup system.  Particularly a backup system that knew to take a 
backup before OSTree upgrades.

That's where using BTRFS or thinp in *combination* with OSTree is 
really nice - that total freedom to do whatever you want at the block 
layer means you can choose to have /home (/var/home) on a separate 
partition and do thinp snapshots of it.  Or use BTRFS's per-subvolume 
RAID to say you want RAID0 for /, and RAID1 for /home.

To answer your question in another way then - I'll definitely be fast 
to take advantage of any new APIs added by the storage layer to 
*transparently* make things better for OSTree.  But I don't want to 
mandate any particular partition layout or FS/block level layout, 
because I think it takes away too much administrator flexibilty.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.fedoraproject.org/pipermail/desktop/attachments/20140224/a5de2cd6/attachment.html>


More information about the desktop mailing list