Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

Friday, 3 January 2020

https://fedoraproject.org/wiki/Changes/EnableEarlyoom

== Summary ==
Install earlyoom package, and enable it by default. This will cause
the kernel oomkiller to trigger sooner, but will not affect which
process it chooses to kill off. The idea is to recover from out of
memory situations sooner, rather than the typical complete system hang
in which the user has no other choice but to force power off.

== Owner ==
* Name: [[User:chrismurphy| Chris Murphy]]
* Email: bugzilla(a)colorremedies.com

== Detailed Description ==
Workstation working group has discussed "better interactivity in
low-memory situations" for some months. In certain use cases,
typically compiling, if all RAM and swap are completely consumed,
system responsiveness becomes so abysmal that a reasonable user can
consider the system "lost", and resorts to forcing a power off. This
is objective a very bad UX. The broad discussion of this problem, and
some ideas for near term and long term solutions, is located here:

Recent long discussions on "Better interactivity in low-memory
situations" 
https://pagure.io/fedora-workstation/issue/98 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.o...

Fedora editions and spins, have the in-kernel OOM (out-of-memory)
manager enabled. The manager's concern is keeping the kernel itself
functioning. It has no concern about user space function or
interactivity. This proposed change attempts to improve the user
experience, in the short term, by triggering the in-kernel process
killing mechanism, sooner. Instead of the system becoming completely
unresponsive for tens of minutes, hours or days, the expectation is an
offending process (determined by oom_score, same as now) will be
killed off within seconds or a few minutes. This is an incremental
improvement in user experience, but admittedly still suboptimal. There
is additional work on-going to improve the user experience further.

Workstation working group discussion specific to enabling earlyoom by default
https://pagure.io/fedora-workstation/issue/119

Other in-progress solutions:<br>
https://gitlab.freedesktop.org/hadess/low-memory-monitor<br>

Background information on this complicated problem: 
https://www.kernel.org/doc/gorman/html/understand/understand016.html<b...
https://lwn.net/Articles/317814/

== Benefit to Fedora ==

There are two major benefits to Fedora:

* improved user experience by more quickly regaining control over
one's system, rather than having to force power off in low-memory
situations where there's aggressive swapping. Once a system becomes
unresponsive, it's completely reasonable for the user to assume the
system is lost, but that includes high potential for data loss.

* reducing forced poweroff as the main work around will increase data
collection, improving understanding of low memory situations and how
to handle them better

== Scope ==
* Proposal owners:
a. Modify {{code|https://pagure.io/fedora-comps/blob/master/f/comps-f32.xml.in}}
to include earlyoom package for Workstation. 
b. Modify
{{code|https://src.fedoraproject.org/rpms/fedora-release/blob/master/f/80-workstation.preset}}
to include:
<pre>
# enable earlyoom by default on workstation
enable earlyoom.service
</pre>

* Other developers:
Restricted to Workstation edition, unless other editions/spins want to opt-in.

* Release engineering: [https://pagure.io/releng/issues #9141] (a
check of an impact with Release Engineering is needed) <!-- REQUIRED
FOR SYSTEM WIDE CHANGES -->

* Policies and guidelines: N/A
* Trademark approval: N/A

== Upgrade/compatibility impact ==
earlyoom.service will be enabled on upgrade. An upgraded system should
exhibit the same behaviors as a clean installed system.

== How To Test ==
* Fedora 30/31 users can test today, any edition or spin:<br>
{{code|sudo dnf install earlyoom}}<br>
{{code|sudo systemctl enable --now earlyoom}}

And then attempt to cause an out of memory situation. Examples: 
{{code|tail /dev/zero}} 
{{code|https://lkml.org/lkml/2019/8/4/15}}

* Fedora Workstation 32 (and Rawhide) users will see this service is
already enabled. It can be toggled with  {{code|sudo systemctl
start/stop earlyoom}}  where start means earlyoom is running, and stop
means earlyoom is not running.

== User Experience ==
The most egregious instances this change is trying to mitigate:
a. RAM is completely used
b. Swap is completely used
c. System becomes unresponsive to the user as swap thrashing has ensued
--> earlyoom disabled, the user often gives up and forces power off
(in my own testing this condition lasts >30 minutes with no kernel
triggered oom killer and no recovery)
--> earlyoom enabled, the system likely still becomes unresponsive but
oom killer is triggered in much less time (seconds or a few minutes,
in my testing, after less than 10% RAM and 10% swap is remaining)

earlyoom starts sending SIGTERM once both memory and swap are below
their respective PERCENT setting, default 10%. It sends SIGKILL once
both are below their respective KILL_PERCENT setting, default 5%.

The package includes configuration file /etc/default/earlyoom which
sets option {{code|-r 60}} causing a memory report to be entered into
the journal every minute.

== Dependencies ==
earlyoom package has no dependencies

== Contingency Plan ==
* Contingency mechanism: Owner will revert all changes
* Contingency deadline: Final freeze
* Blocks release? No
* Blocks product? No

== Documentation ==
{{code|man earlyoom}}<br><br>
https://www.kernel.org/doc/gorman/html/understand/understand016.html

== Release Notes ==
Earlyoom service is enabled by default, which will cause kernel
oom-killer to trigger sooner. To revert to previous behavior: 
{{code|sudo systemctl disable earlyoom.service}}

And to customize see {{code|man earlyoom}}.

-- 
Ben Cotton
He / Him / His
Fedora Program Manager
Red Hat
TZ=America/Indiana/Indianapolis

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002