On Thu, Nov 5, 2020, 10:03 AM Patrick O'Callaghan <pocallaghan@gmail.com> wrote:
The swapon(8) man page says:

   The  swap  file  implementation in the kernel expects to be able to write
   to the file directly, without the assistance of the filesystem.  This is a
   problem on files with holes or on copy-on-write files on filesystems like Btrfs.

As I'm getting OOM errors when I try to run a VM, it looks like I need
a swap file or partition. I'd prefer to use part of my (large) SSD for
this, but currently it's entirely formatted as BTRFS. Do I need to
resize the BTRFS partition rather than using a swap file?



There's several options depending on the workload.

swapfile on Btrfs performs the same as a swap partition in my testing. The main difference other than the limitations in 'man 5 btrfs' is an additional  set of steps are needed to make it possible to do  hibernation.

The swapfile must not be snapshot. If it is, COW applies and the swapfile can no longer be activated. There's several ways to avoid this if you want to snapshot root (but not the swapfile). Create a /swap subvolume, or /var/swap subvolume. Since btrfa snapshots are not recursive, either of these prevents a snapshot of root from snapshotting the swapfile.

if you aren't doing any snapshotting at all then it doesn't matter you can probably make the swapfile most anywhere.

I've got some work to do to figure out if there's a more elegant way to do it. And of course, encryption and SELinux implications.

The longer story is to automatically create the swap files on demand dynamically in the proper location. And in that case possibly use zswap, instead of swap on zram. 

It's a bit esoteric, but there is a way to track sysfs memory.stat for page faults. And the zram driver tracks some statistics that could be helpful in figuring out situations where zram is getting full of seldom used dirty pages that are just taking up memory. In those kinds of workloads it's probably more beneficial to use conventional disk-based swap. That's because disk space swap fully evicts the page, freeing up all of that memory, whereas with the zram based swap we are still consuming some memory, in effect it's a partial eviction.

One thing to watch out for with new installs, it's possible we will see cases where there's average or below average RAM and heavy Firefox usage. And it could be easier to go below the low water threshold on zram-based swap, leading to early home issuing SIGTERM to a Firefox tab. This does get logged.

I think the first recommendation is to create a custom zram-generator configuration and bump the size of the zram device from 50% ram to 75% ram. While 100% is normally OK, it might be a use case better off with disk based swap. It just depends on the workload.

Most users should be ok with the defaults. It's what I'm using most of the time for over one year now. But I'm always on the lookout for any issues.

--
Chris Murphy