In the cloud meeting today I brought up overlayfs and F25. After discussing with the engineers closer to the technology they recommend waiting to move to overlayfs as the default in F26.
I think this will work well because it will give us some time to allow people to "try" overlayfs in F25 (we should provide good docs on this) and then give us feedback before we go with it as default in F26. If the feedback is bad then maybe we wouldn't even go with it in F26, but hopefully that won't be the case.
Thoughts?
Dusty
On Wed, Sep 14, 2016 at 12:14 PM, Dusty Mabe dusty@dustymabe.com wrote:
In the cloud meeting today I brought up overlayfs and F25. After discussing with the engineers closer to the technology they recommend waiting to move to overlayfs as the default in F26.
I think this will work well because it will give us some time to allow people to "try" overlayfs in F25 (we should provide good docs on this) and then give us feedback before we go with it as default in F26. If the feedback is bad then maybe we wouldn't even go with it in F26, but hopefully that won't be the case.
Thoughts?
Sounds good to me.
Dusty _______________________________________________ cloud mailing list cloud@lists.fedoraproject.org https://lists.fedoraproject.org/admin/lists/cloud@lists.fedoraproject.org
On Wed, Sep 14, 2016 at 2:45 PM, Jason Brooks jbrooks@redhat.com wrote:
On Wed, Sep 14, 2016 at 12:14 PM, Dusty Mabe dusty@dustymabe.com wrote:
In the cloud meeting today I brought up overlayfs and F25. After discussing with the engineers closer to the technology they recommend waiting to move to overlayfs as the default in F26.
I think this will work well because it will give us some time to allow people to "try" overlayfs in F25 (we should provide good docs on this) and then give us feedback before we go with it as default in F26. If the feedback is bad then maybe we wouldn't even go with it in F26, but hopefully that won't be the case.
Thoughts?
Sounds good to me.
I'm uncertain if this is current or needs an update:
Evaluate overlayfs with docker https://github.com/kubernetes/kubernetes/issues/15867
If the way forward is a non-duplicating cache then I see a major advantage gone. But that alone isn't enough to promote something else, I'd just say, hedge your bets. Pretty much all the reasons why CoreOS switched from Btrfs to overlay have been fixed, although there's a asstrometric ton of enospc rework landing in kernel 4.8 [1] that will need time to shake out, and if anyone's able to break it, one of the best ways of getting it fixed and avoiding regressions is to come up with an xfstests [2] for it to be cleanly reproduced. The Facebook devs consistently report finding hardware (even enterprise stuff that they use) doing batshit things that Btrfs catches and corrects that other filesystems aren't seeing. And then on the slow downs mainly due to fragmentation when creating and destroying many snapshots over a short period of time, this probably could be mitigated with garbage collection optimization, and I've had some ideas about that if anyone wants to futz around with it.
The more conservative change is probably XFS + overlayfs though, since now XFS checksums fs metadata and the journal, which helps catch problems before they get worse.
[1] http://www.spinics.net/lists/linux-btrfs/msg53410.html
[2] semi random example http://oss.sgi.com/cgi-bin/gitweb.cgi?p=xfs/cmds/xfstests.git;a=blob;f=tests...
On Wed, Sep 14, 2016 at 2:14 PM, Dusty Mabe dusty@dustymabe.com wrote:
In the cloud meeting today I brought up overlayfs and F25. After discussing with the engineers closer to the technology they recommend waiting to move to overlayfs as the default in F26.
I think this will work well because it will give us some time to allow people to "try" overlayfs in F25 (we should provide good docs on this) and then give us feedback before we go with it as default in F26. If the feedback is bad then maybe we wouldn't even go with it in F26, but hopefully that won't be the case.
Thoughts?
Seems a little conservative, but I'm not opposed.
I've been under the impression that part of the point of the Two Week Release cycle was to be able to deliver new stuff faster and fix issues faster but playing it safe isn't inherently a bad approach either.
-AdamM
Dusty _______________________________________________ cloud mailing list cloud@lists.fedoraproject.org https://lists.fedoraproject.org/admin/lists/cloud@lists.fedoraproject.org
On 09/14/2016 02:32 PM, Adam Miller wrote:
On Wed, Sep 14, 2016 at 2:14 PM, Dusty Mabe dusty@dustymabe.com wrote:
In the cloud meeting today I brought up overlayfs and F25. After discussing with the engineers closer to the technology they recommend waiting to move to overlayfs as the default in F26.
I think this will work well because it will give us some time to allow people to "try" overlayfs in F25 (we should provide good docs on this) and then give us feedback before we go with it as default in F26. If the feedback is bad then maybe we wouldn't even go with it in F26, but hopefully that won't be the case.
Thoughts?
Seems a little conservative, but I'm not opposed.
I've been under the impression that part of the point of the Two Week Release cycle was to be able to deliver new stuff faster and fix issues faster but playing it safe isn't inherently a bad approach either.
When is F26 out?
On Wed, Sep 14, 2016 at 02:47:09PM -0700, Josh Berkus wrote:
Seems a little conservative, but I'm not opposed. I've been under the impression that part of the point of the Two Week Release cycle was to be able to deliver new stuff faster and fix issues faster but playing it safe isn't inherently a bad approach either.
When is F26 out?
Next June.
On 14/09/16, Adam Miller wrote:
On Wed, Sep 14, 2016 at 2:14 PM, Dusty Mabe dusty@dustymabe.com wrote:
In the cloud meeting today I brought up overlayfs and F25. After discussing with the engineers closer to the technology they recommend waiting to move to overlayfs as the default in F26.
I think this will work well because it will give us some time to allow people to "try" overlayfs in F25 (we should provide good docs on this) and then give us feedback before we go with it as default in F26. If the feedback is bad then maybe we wouldn't even go with it in F26, but hopefully that won't be the case.
Thoughts?
Seems a little conservative, but I'm not opposed.
I've been under the impression that part of the point of the Two Week Release cycle was to be able to deliver new stuff faster and fix issues faster but playing it safe isn't inherently a bad approach either.
For two week atomic we are not tied with the Fedora 25 release cycle. We can enable it in our release when we think it is ready for the consumers. It does not have to wait F26 release. For example we see it is in good condition after one week of F25 release, we can then enable it default in the next 2WA release.
Kushal
On Thu, Sep 15, 2016 at 01:55:16PM +0530, Kushal Das wrote:
For two week atomic we are not tied with the Fedora 25 release cycle. We can enable it in our release when we think it is ready for the consumers. It does not have to wait F26 release. For example we see it is in good condition after one week of F25 release, we can then enable it default in the next 2WA release.
+1
On 09/15/2016 04:25 AM, Kushal Das wrote:
On 14/09/16, Adam Miller wrote:
On Wed, Sep 14, 2016 at 2:14 PM, Dusty Mabe dusty@dustymabe.com wrote:
In the cloud meeting today I brought up overlayfs and F25. After discussing with the engineers closer to the technology they recommend waiting to move to overlayfs as the default in F26.
I think this will work well because it will give us some time to allow people to "try" overlayfs in F25 (we should provide good docs on this) and then give us feedback before we go with it as default in F26. If the feedback is bad then maybe we wouldn't even go with it in F26, but hopefully that won't be the case.
Thoughts?
Seems a little conservative, but I'm not opposed.
I've been under the impression that part of the point of the Two Week Release cycle was to be able to deliver new stuff faster and fix issues faster but playing it safe isn't inherently a bad approach either.
For two week atomic we are not tied with the Fedora 25 release cycle. We can enable it in our release when we think it is ready for the consumers. It does not have to wait F26 release. For example we see it is in good condition after one week of F25 release, we can then enable it default in the next 2WA release.
That is correct, but changing a default like that might be a bad idea. My opinion is that it should happen on a major release boundary.
The user still has the option to choose to user overlayfs if they want.
Dusty
On Thu, Sep 15, 2016, at 09:57 AM, Dusty Mabe wrote:
That is correct, but changing a default like that might be a bad idea. My opinion is that it should happen on a major release boundary.
One thing this impacts is the AH partitioning - it no longer makes sense by default with overlayfs. I think we should probably do exactly the same thing as the Server SIG (and consider doing it for Workstation too), which actually argues for just fixing the Anaconda defaults.
Server thread: https://lists.fedoraproject.org/archives/list/server@lists.fedoraproject.org...
On Fri, Sep 16, 2016 at 7:02 AM, Colin Walters walters@verbum.org wrote:
On Thu, Sep 15, 2016, at 09:57 AM, Dusty Mabe wrote:
That is correct, but changing a default like that might be a bad idea. My opinion is that it should happen on a major release boundary.
One thing this impacts is the AH partitioning - it no longer makes sense by default with overlayfs. I think we should probably do exactly the same thing as the Server SIG (and consider doing it for Workstation too), which actually argues for just fixing the Anaconda defaults.
Server thread: https://lists.fedoraproject.org/archives/list/server@lists.fedoraproject.org...
The genesis of that was me pointing to Cloud Atomic ISO's handling; since Server went with pretty much the identical layout, it managed to get slipped in for Alpha. It was a proven layout. [1]
For an Atomic Host overlayfs based layout, there's nothing within Fedora that's a proven layout. For starters, it could be something much simpler than what CoreOS is doing [2]. If the target installation is VM, then dropping LVM stuff makes sense. If it's going to include baremetal, keeping LVM makes sense. I'm a bit unclear on this point, but with a handwave it sorta feels like Cloud->Container WG is far less interested in the baremetal case, where Server is about as interested in baremetal as VM and container cases. If that's true, then the CoreOS layout is a decent starting point, and just needs some simplification to account for ostree deployments rather than partition priority flipping.
Inode exhaustion?
If the installer is going to create the file system used for overlayfs backing storage with ext4, that probably means mkfs.ext4 -i 4096 will need to be used; so how does that get propagated to only AH installs, for both automatic and custom partitioning? Or figure out a way to drop custom/manual partitioning from the UI. Or does using XFS mitigate this issue? A simple search turns up no inode exhaustion reports with XFS. The work around for ext4 is at mkfs time, it's not something that can be changed later.
Release blocking and custom partitioning?
Upon AH image becoming release blocking, then "The installer must be able to create and install to any workable partition layout using any file system and/or container format combination offered in a default installer configuration. " applies. Example bug [3] where this fails right now. Does it make sense for AH installations to somehow be exempt from custom partitioning resulting in successful installations? And what would that look like criterion wise (just grant an exception?) or installer wise (drop the custom UI or put up warnings upon entering?)
[1] https://lists.fedoraproject.org/archives/list/server@lists.fedoraproject.org...
[2] https://coreos.com/os/docs/latest/sdk-disk-partitions.html Trivial pursuit is this "the GPT priority attribute" which I can find no where else, but I rather like this idea of using an xattr on a directory as the hint for which fs tree the bootloader should use rather than writing out new bootloader configurations.
[3] https://bugzilla.redhat.com/show_bug.cgi?id=1289752
On Fri, Sep 16, 2016 at 10:56 AM, Chris Murphy lists@colorremedies.com wrote:
Inode exhaustion?
If the installer is going to create the file system used for overlayfs backing storage with ext4, that probably means mkfs.ext4 -i 4096 will need to be used; so how does that get propagated to only AH installs, for both automatic and custom partitioning? Or figure out a way to drop custom/manual partitioning from the UI. Or does using XFS mitigate this issue? A simple search turns up no inode exhaustion reports with XFS. The work around for ext4 is at mkfs time, it's not something that can be changed later.
I just did some more digging, and also chatted with Eric Sandeen about this. Here's what I've learned:
- Inode exhaustion with mkfs.ext4 defaults can be a real thing with overlayfs [1] - mkfs.ext4 -i 4096 will make 1 inode per 4096 byte block, so 1:1, which is a metric shittonne of inodes - a different -i value might be more practical most of the time, but if the maximum aren't created at mkfs time and are exhausted the fs basically face plants and no more files can be created; and it's only fixable by a.) deleting a bunch of files or b.) creating a new file system to have more inodes preallocated. - mkfs.ext4 hands off the actual creation of the inodes to lazy init at first mount time, it's a lot of metadata being written to do this
- XFS doesn't have this issue, its inode allocation is dynamic (there are limits but can be changed with xfs_growfs) - XFS now defaults to -m crc=1, and by extension -n flags=1 which overlayfs wants for putting filetype in the directory entry; Fedora 24 had a sufficiently new xfsprogs for this from the get go.
I don't know what the workflow is for creating the persistent storage for the host, whether this will be Anaconda's role or something else? If Anaconda, my experience has been the Anaconda team are reluctant to use non-default mkfs unless there's a UI toggle for it. You'd need to run all this by them and see if there's a way to do a mkfs.ext4 -i 4096 for just Atomic Host installations, there's no point doing that for workstation installations. Or just use XFS.
[1] https://github.com/coreos/bugs/issues/264 https://github.com/boot2docker/boot2docker/issues/992
On Fri, Sep 16, 2016 at 2:15 PM, Chris Murphy lists@colorremedies.com wrote:
You'd need to run all this by them and see if there's a way to do a mkfs.ext4 -i 4096 for just Atomic Host installations, there's no point doing that for workstation installations. Or just use XFS.
Another possibility is an AH specific /etc/mke2fs.conf file on the installation media only.
[defaults] base_features = sparse_super,large_file,filetype,resize_inode,dir_index,ext_attr default_mntopts = acl,user_xattr enable_periodic_fsck = 0 blocksize = 4096 inode_size = 256 inode_ratio = 16384
By changing inode_ratio = 4096, it achieves the same outcome as -i 4096 without having to pass that flag at mkfs time. And it'd only affect installation time file systems (including /boot and / as well as the persistent storage for overlayfs and containers). So... yeah.
FWIW, you're basically already using XFS with the dm-thin docker-storage-setup you've got going on right now. It doesn't get mounted anywhere, but
$ docker info [chris@localhost ~]$ sudo docker info [sudo] password for chris: Containers: 0 Running: 0 Paused: 0 Stopped: 0 Images: 0 Server Version: 1.10.3 Storage Driver: devicemapper Pool Name: fedora--atomic-docker--pool Pool Blocksize: 524.3 kB Base Device Size: 10.74 GB Backing Filesystem: xfs
So, just use XFS across the board (plus overlayfs on the persistent storage for containers).
As for Workstation changing file systems, that's another ball of wax. I'd just say use XFS + overlayfs there too to keep it simple across the various products in the near term. And then presumably the Workstation folks will want Btrfs when it sufficiently stable that the kernel team won't freak out if there's still no Btrfs specific kernel dev on the team or at Red Hat.
Just in case this poor horse isn't suitably beaten yet.
1. Create 4 qcow2 files per qemu-img create -f qcow2 *.qcow2 120g
Each qcow2 starts out 194K (not preallocated). q 2. Format each qcow2 mkfs.ext4 <dev> mkfs.ext4 -i 4096 <dev> mkfs.xfs <dev> mkfs.btrfs <dev>
3. mount each fs (mainly to be fair since ext4 does lazy init) and wait until the qcow2 stops growing.
5.5M -rw-r--r--. 1 qemu qemu 5.9M Sep 19 20:40 bios_btrfs.qcow2 2.1G -rw-r--r--. 1 root root 2.1G Sep 19 20:27 bios_ext4_default.qcow2 7.7G -rw-r--r--. 1 root root 7.7G Sep 19 20:33 bios_ext4_i4096.qcow2 62M -rw-r--r--. 1 qemu qemu 62M Sep 19 20:40 bios_xfs.qcow2
Btrfs and XFS take seconds to completely initialize. Ext4 defaults took 6 minutes, and with -i 4096 it took 8 minutes to complete lazy init.