On Tue, Jan 7, 2020 at 1:48 PM Mark Otaris <mark(a)net-c.com> wrote:
I intended to demonstrate that cgroups can be used to cause the kernel OOM
killer to react appropriately and fast enough, implying that replacing the
OOM killer is not necessary and that replacing it by a userspace OOM killer
that does not account for cgroups can be undesirable. The exact same controls
set with my example commands, and others, can be set with scopes as well,
so this should be applicable.
Okay, interesting. But that’s a statement from just one person, and it has to
be interpreted in the context of what it is confirming; that is, that the OOM
killer is “mainly concerned about kernel survival in low memory situations”,
which is weaker than your claim that “their concern with kernel oom-killer is
strictly with keeping the kernel functioning”. I don’t know if the OOM killer’s
main purpose is to keep the kernel alive (Michal Hocko appears to think so,
maybe others disagree), but it is in any case not an abuse of the OOM killer to
also use it to keep userspace responsive,
The oom killer doesn't keep user space responsive per se, in your
example that's done by cgroups restricting resources. And that's neat,
and necessary to keep making forward progress on. But we don't have
that for unprivileged process right now, unless the user knows the
secret decoder ring command to use to do this every time they run
something in Terminal; and then have some idea to hint at what
resources are needed for the task to succeed rather than just get
That's maybe the elephant in the room with earlyoom (or one of them),
yes we've recovered sooner, the user can hopefully save their data and
reboot. But did their task succeed? No. It got clobbered.
and there is no reason to think that
kernel folks are not interested in helping achieve this goal.
I did mean with a kernel only solution. I've been tracking this issue
for 6-7 months including the congestion and kswapd discussions
on-going, so I know they do care broadly about providing some
mechanisms by which user space can better behave. But all of that
requires varying degrees of opt-in, and quite a lot of it involves
considerable work to even understand it, let alone implement it.
advantage I see to earlyoom so far is that it sends SIGTERM before taking
further steps that will kill processes.
Yes and it happens sooner. Probably not soon enough for many users.
There may be some risk by overpromising and under delivering: by
making it the default and then for the vast majority of cases it
doesn't matter, because users are long since conditioned to just force
power off within a minute or less of the GUI stuttering or freezing up
on them. It is very workload and system specific.