On Mon, Aug 12, 2019 at 12:30 AM Benjamin Kircher
> On 11. Aug 2019, at 23:05, Chris Murphy <lists(a)colorremedies.com> wrote:
> I think the point at which the mouse pointer has frozen, the user has
> no practical means of controlling or interacting with the system, it's
> a failure.
> In the short term, is it reasonable and possible, to get the oom
> killer to trigger sooner and thereby avoid the system becoming
> unresponsive in the first place? The oom score for most all processes
> is 0, and niced processes have their oom score increased. I'm not
> seeing levers to control how aggressive it is, only a way of hinting
> at which processes can be more readily subject to being killed. In
> fact, a requirement of oom killer is that swap is completely consumed,
> which if swap is on anything other than a fast SSD, swapping creates
> its own performance problems way before oom can be a rescuer. I think
> I just argued against my own question.
Yes you just did :-)
From what I understand from this LKML thread  fast swap on NVMe is only part of the
issue (or adds to the issue). The kernel really really tries hard not to OOM kill anything
and keep the system going. And this overcommitment is where it eventually gets
unresponsive to the extend that the machine needs to be hard rebooted.
The LKML thread also mentions that user-space OOM handling could help.
But what about cgroups? Isn’t there a systemd utility that helps me wrap processes in
resource constrained groups? Something along the line
$ systemd-run -p MemoryLimit=1G firefox
(Not tested.) I imagine that a well-behaved program will handle a bad malloc by ending
BTW, this happens not only on Linux. I’m used to deal with quite big files during my day
job and if you accidentally write some… em… very unsophisticated code that attempts to
read the entire file into memory at once you can experience the same behavior on a recent
macOS, too. You’re left with nothing else than force rebooting your machine.
If I just run the example program, let's say systemd MemoryLimit is
set to /proc/meminfo MemAvailable, the program is still going to try
and bust out of that and fail. The failure reason is also non-obvious.
Yes this is definitely an improvement in that the system isn't taken
How to do this automatically? Could there be a mechanism for the
system and the requesting application to negotiate resources?
One reality is, the system isn't a good estimator of system
responsiveness from the user's point of view. Anytime swap is under
significant pressure (what's the definition of significant?) the
system is effectively lost at that point, *if* this is a desktop
system (includes laptops). In the example case, once swap is being
heavily used on either the SSD, or on ZRAM, the mouse pointer is
frozen variably 50%-90% of the time. It's not a usable system, well
before swap is full. How does the system learn that a light swap rate
is OK, but a heavy swap rate will lead to an angry user? And even
heavy swap might be OK on NVMe, or on a server.
Right now the only lever to avoid swap, is to not create a swap
partition at installation time. Or create a smaller one instead of 1:1
ratio with RAM. Or use a 1/4 RAM sized swap on ZRAM. A consequence of
each of these alternatives, is hibernation can't be used. Fedora
already explicitly does not support hibernation, but strictly that
means we don't block release on hibernation related bugs. Fedora does
still create a swap that meets the minimum size for hibernation, and
also inserts the required 'resume' kernel parameter to locate the
hibernation image at the next boot. So we kinda sorta do support it.
Another reality is, the example program, also doesn't have a good way
of estimating the resources it needs. It has some levers, that just
aren't being used by default, including -l option which reads "do not
start new jobs if the load average is greater than N". But that's
different than "tell me the box sizes you can use" and then the system
supplying a matching box, and for the program to work within it.