On Fri, Jan 3, 2020 at 1:51 PM Robbie Harwood <rharwood(a)redhat.com> wrote:
Another thought. Wouldn't some of the pain here be alleviated by
My sample size is not scientific. But, in my testing I can't tell any
difference for the swap under pressure case we're testing against. The
system is lost in the same amount of time, system still does not
recover on its own. Perhaps swappiness matters for the less extreme
case of incidental swap usage, which is probably what swap was
originally intended for (?) but that's speculation on my part.
The central problem as I see it, unprivileged applications *by
default* are given a memory allocation that they request, without any
consideration for the health of other processes, even privileged
The kernel oom-killer (with or without earlyoom running), often
clobbers things like sshd, systemd-journald, sssd, and even user space
programs (Maps, TextEdit, Terminal) that have nothing to do with
what's actually eating up CPU and memory. It's pretty atrocious, but
I've editorialized plenty about that in the cited threads already.
People smarter than I am are working on more long term solutions for
this. This is a bit of a hack, intended to return some sense of
control back to the user sooner, so they can save their state
(however they define that) and reboot normally, rather than having to
force power off. It's unsophisticated and therefore mismatched with a
complex problem, but elegant because it's simple, easy to test, easy
to remove or disable.
Another plus I only briefly mention in the proposal, is that because
the user is far less likely to hard power off, their system journal
will have certainly recorded earlyoom's memory report leading up to
the oom, as well as the complete kernel oom-killer output. Whereas in
many of my tests without earlyoom, forced power off can cause a lot of
this information to get lost, especially what action oom-killer took
assuming it even triggered at all which often it doesn't and the
system remained wedged in and unresponsive for 30+ minutes.
Currently it seems to be 60, which results in
somewhat aggressive swap use; 1 seems better (minimal swapping without
disabling), while 0 will disable it for general use (while preserving it
for hibernation). This would at least improve the disk thrashing during
Sounds right, but in practice I'm not observing this. The two
observation perspectives I'm using:
a) GUI responsiveness: ability to drag windows around, scroll in
Firefox, type text in textedit, open/save files.
b) remote (ssh) observation with vmstat, iotop, and top