On Sun, Aug 11, 2019 at 1:02 PM Jan Kratochvil jan.kratochvil@redhat.com wrote:
On Sun, 11 Aug 2019 20:54:28 +0200, Chris Murphy wrote:
and likely experiences data loss and possibly even file system corruption as a direct consequence of having to force power off on the machine because for all practical purposes normal control has been lost.
Not really, this is what journaling filesystem is there for.
Successful journal replay obviates the need for fsck, it has nothing to do with avoiding corruption. And in any case, anything the user is working on that isn't already saved and committed to stable media, isn't going to survive the poweroff.
But then there still can be an application-level data corruptions if an application does not handle its sudden termination properly. Which should be rare but IIRC I did see it for example with Firefox.
I think the point at which the mouse pointer has frozen, the user has no practical means of controlling or interacting with the system, it's a failure.
In the short term, is it reasonable and possible, to get the oom killer to trigger sooner and thereby avoid the system becoming unresponsive in the first place? The oom score for most all processes is 0, and niced processes have their oom score increased. I'm not seeing levers to control how aggressive it is, only a way of hinting at which processes can be more readily subject to being killed. In fact, a requirement of oom killer is that swap is completely consumed, which if swap is on anything other than a fast SSD, swapping creates its own performance problems way before oom can be a rescuer. I think I just argued against my own question.