I'm not a developer, nor do I pretend to understand the nuances of memory management.
But I signed up for this list just to say "thanks" to all the devs and others
that are finally discussing what I consider to be one of the biggest problems with Linux
on the desktop.
My experience with desktop Linux distros with SSDs when a few processes start to leak
memory, or if I launch a new program when my system is right at the limits, is a full
system hang where only the mouse occasionally moves jerkily, and I can't switch to a
virtual terminal. I recently learned the SysRq trick to evoke the OOM killer, but I
personally think that the kernel should deal with that, not the user. As unfortunate as it
is for the OOM killer to have to randomly kill something, I am of the opinion that the OS
should *never* lock up, period. I would strongly prefer that one application get killed
instead of losing all my applications and working data because of a necessary hard
reboot.
I don't know if this helps or not, but anecdotally I started see this issue *after*
SSDs became more common, i.e. I don't think I ever experienced it with spinning rust.
Maybe something to do with the vastly faster I/O of an SSD, which allows it to more
quickly saturate the RAM before the OOM killer has time to react?
Also, I've had relatively low memory KVM guests running on a VPS under very high load,
and they never lockup. The OOM killer does occasionally kick in, but the affected daemon
or systemd service restarts and it's amazingly undramatic. It appears that this issue
only occurs with Xorg (and I imagine Wayland) and "desktop" usage.
As for the problem of the randomness of the OOM killer, couldn't it be made to take
into account the PID and/or how long the process has been running? Normally Xorg (and I
assume Wayland stuff) gets started before the other desktop programs that tend to consume
a lot of memory. So if it's a higher PID and/or has been running for less time, give
it a higher score for killability.
In my experience on a system with 8GB of RAM and an SSD, the amount of swap space makes no
difference. I've tried with no swap space, with 2GB, with 8GB, etc, and it still hangs
under high memory usage. I've also tried tuning a lot of sysctl parameters such as
vm.swappiness, vm.vfs_cache_pressure, and vm.min_free_kbytes, to no avail.
Don't know if this helps, but here are some additional discussions of Linux
unresponsiveness under low memory situations from a layman's perspective:
-
osnews.com/story/130117/kde-usability-and-productivity-are-we-there-yet/ (in the
comments)
-
unix.stackexchange.com/questions/373312/oom-killer-doesnt-work-properly-l...
-
bbs.archlinux.org/viewtopic.php?id=233843
-
askubuntu.com/questions/432809/why-is-kswapd0-running-on-a-computer-with-...
-
unix.stackexchange.com/questions/24625/how-to-completely-disable-swap/246...
Thanks again to everyone for looking into this!