Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

Sunday, 5 January 2020

On Sat, Jan 04, 2020 at 04:38:19PM -0700, Chris Murphy wrote:
...
 On Sat, Jan 4, 2020 at 2:51 AM Aleksandra Fedorova
<alpha(a)bookwar.info&gt; wrote:

 > Since in the Change we are not introducing just the earlyoom tool but enable it with
a specific profile I would add those details here. Smth like:
 >
 > "earlyoom service will choose the offending process based on the same oom_score
as kernel uses. It will send a SIGTERM signal on 10% of RAM left, and SIGKILL on 5%"

 I add this information to the summary. Also, I think these numbers may
 need to change to avoid prematurely sending SIGTERM when the system
 has no swap device.

 > As I understand in the current setup we are looking more for a controlled failure
scenario rather than for a solution.

 Yes, it's fair to say this proposal is to make things "less bad". It
 doesn't improve system responsiveness. Once heavy swap starts, the
 system is sluggish, stutters, and briefly stalls. This proposal
 doesn't fix that. There is a lot of room for improvement.

 > Can we get a specific manual, what users supposed to do, once they trigger the
earlyoom? Does earlyoom help in reporting? Which logs we need to look at?
 >
 > Maybe add a section in UX part of the change, or setup a dedicated wiki page?

 The user shouldn't need to do anything differently than if the kernel
 oom-killer had triggered. The system journal will contain messages
 showing what was killed and why:

 Jan 04 16:05:42 fmac.local earlyoom[4896]: low memory! at or below
 SIGTERM limits: mem 10 %, swap 10 %
 Jan 04 16:05:42 fmac.local earlyoom[4896]: sending SIGTERM to process
 27421 "chrome": badness 305, VmRSS 42 MiB

 > Additionally, there was a question during the chat discussion: how the earlyoom
setup will work together with OOMPolicy and any other related options of systemd units?
Will systemd recognize the OOM event?

 My understanding of systemd OOMPolicy= behavior, is it looks for the
 kernel's oom-killer messages and acts upon those. Whereas earlyoom
 uses the same metric (oom_score) as the oom-killer, it does not invoke
 the oom-killer. Therefore systemd probably does not get the proper
 hint to implement OOMPolicy= 
Yes. The kernel reports oom events in the cgroup file memory.events,
and systemd waits for an inotify event on that file; OOMPolicy=stop is
implemented that way. And the OOMPolicy=kill option is "implemented"
by setting memory.oom.group=1 in the kernel [1] and having the kernel
kill all the processes. So systemd is providing a thin wrapper around
the kernel functionality.

If processes are not killed by the kernel but through a signal from
userspace, all of this will not work.

[1]
https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#memory-...

Zbyszek

> Fedora need to discuss how big of a problem that is, if there's anyway
> to mitigate it, or tolerate it, weighing the pros of earlyoom for a
> short period, versus the cons of punting this problem for another
> release. This proposal does not intend to step on other superseding
> work in this area, but if it does, it'll be withdrawn.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM