On 1/6/20 1:18 PM, Chris Murphy wrote:
Hi server@ and cloud@ folks,
There is a system-wide change to enable earlyoom by default on Fedora
Workstation. It came up in today's Workstation working group meeting
that I should give you folks a heads up about opting into this change.
Thanks for the heads up!
The main issue on a workstation, heavy swap leading to an unresponsive
system, is perhaps not as immediately frustrating on a server. But
the consequences of indefinite hang or the kernel oom-killer
triggering, which is a SIGKILL, are perhaps worse.
On the plus side, earlyoom is easy to understand, and its first
attempt is a SIGTERM rather than SIGKILL. It uses oom_score, same as
kernel oom-killer, to determine the victim.
The SIGTERM is issued to the process with the highest oom_score only
if both memory and swap reach 10% free. And SIGKILL is issued to the
process with the highest oom_score once memory and swap reach 5% free.
Those percentages can be tweaked, but the KILL percentage is always
1/2 of the TERM percentage, so it's a bit rudimentary.
Yeah. Adding more ways to relate SIGTERM to SIGKILL (other the 1/2) would
One small concern I have is, what if there's no swap? That's probably
uncommon for servers, but I'm not sure about cloud. But in this case,
For cloud at least it's very common to not have swap. I'd argue for servers
you don't want them swapping either but resources aren't quite as elastic as
in the cloud so you might not be able to burst resources like you can in the cloud.
SIGTERM happens at 10% of RAM, which leaves a lot of memory on the
table, and for a server with significant resources it's probably too
high. What about 4%? Maybe still too high? One option I'm thinking of
is a systemd conditional that would not run earlyoom on systems
without a swap device, which would leave these systems no worse off
than they are right now. [i.e. they eventually recover (?),
indefinitely hang (likely), or oom-killer finally kills something
Seems like it on these systems it would nice to make earlyoom SIGTERM just
right before SIGKILL. i.e. try the nice way and then bring in the hammer.
In this case a 1% difference in threshold would be useful. i.e. SIGTERM at
5% SIGKILL at 4% or something like that.
I've been testing earlyoom, nohang, and the kernel oom-killer for > 6
months now, and I think it would be completely sane for Server and
Cloud products to enable earlyoom by default for fc32, while
evaluating other solutions that can be more server oriented (e.g.
nohang, oomd, possibly others) for fc33/fc34. What is clear: this
isn't going to be solved by kernel folks, the kernel oom-killer only
cares about keeping the kernel alive, it doesn't care about user space
In the cases where this becomes a problem, either the kernel hangs
indefinitely or does SIGKILL for your database or whatever is eating
up resources. Whereas at least earlyoom's first attempt is a SIGTERM
so it has a chance of gracefully quitting.
There are some concerns, those are in the devel@ thread, and I expect
they'll be adequately addressed or the feature will not pass the FESCo
vote. But as a short term solution while evaluating more sophisticated
solutions, I think this is a good call so I thought I'd just mention
it, in case you folks want to be included in the change.