For now, kernel developers have made it clear they do not care about
user space responsiveness. At all. Their concern with kernel
oom-killer is strictly with keeping the kernel functioning.
This is false. The stated purpose of the OOM killer is not only to keep the
kernel alive. Nor does the fact the kernel has not solved userspace
responsiveness yet imply that kernel folks do not care. Rather, it means that
they will not solve it on their own because the kernel does not have all the
information it needs. Kernel folks do care, or we wouldn’t have PSI or cgroups.
A userspace solution is needed, but does not need to replace the OOM killer;
cgroups are also a userspace solution. If earlyoom breaks them, it can make
things worse than the status quo.
Can it be done with cgroupv2 and PSI alone? Unclear.
Of course it can. Just run 100 instances of every stress-ng memory worker in
a podman container with a cgroup memory limit. The system will not hang. Do
the same without the memory limit. The system will hang within seconds and never
recover. Thus demonstrating that cgroups work and do the things they were
intended to do.
Try it. With a memory limit,
podman run --rm -it --memory=1G fedora bash -c 'dnf install -y stress-ng &&
stress-ng --malloc 100 --memcpy 100 --mmap 100 --vm 100'
will use CPU but keep your system responsive. Without the memory limit
(this will hang your system),
podman run --rm -it fedora bash -c 'dnf install -y stress-ng && stress-ng
--malloc 100 --memcpy 100 --mmap 100 --vm 100'
the system hangs and doesn’t recover after 15 minutes. Same thing
with `tail /dev/zero`:
podman run --rm -it --memory=1G fedora tail /dev/zero
activates the OOM killer after three seconds, with
kernel: Memory cgroup out of memory: Killed process 8814 (tail) total-vm:3141408kB,
anon-rss:1042028kB, file-rss:4kB, shmem-rss:0kB, UID:1000 pgtables:6336512kB
libpod-e061e1cb57dde204632531a556d37efbd51c9ab67346a8bc4d5e26c7301c165b.scope: A process
of this unit has been killed by the OOM killer.kernel: oom_reaper: reaped process 8814
(tail), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
logged in the system journal. You were saying the OOM killer activates too late
and rarely kills the right process? Well, here it activates early enough and
knows exactly what to stop. It is worth trying with ninja and WebKit too.