Making sense of OOM killer messages?

Tue Nov 29 19:38:14 UTC 2005

At 3:53 PM +1100 11/29/05, Steffen Kluge wrote:
>Content-Type: multipart/signed; micalg=pgp-sha1;
>	protocol="application/pgp-signature";
>	boundary="=-SipagDTVN24gA9Rruz2i"
>
>On Mon, 2005-11-28 at 21:09 +0800, John Summerfied wrote:
>> I'm not sure it's useful to know that:-( In my experience (which
>> includes 2.6 kernels that are supposed to do this better) the killed
>> process is generally an innocent bystander.
>
>The process that triggered the OOM condition is probably just as
>innocent. There isn't always a "cuplrit", and if there is it isn't easy
>to spot. The OOM killer tries to do the most sensible thing by killing
>less active processes that still yield a fair amount of released memory.
>
>BTW, the OpenBSD folks reckon there is no reasonable or fair way of
>dealing with an OOM situation and take the easy way out: they simply
>halt the whole system...

I don't know how Linux does it, but my understanding is that (old) Unix
used to kill the process that made the failing request.  In most cases, a
runaway process has exhausted the memory, and is still making requests for
more memory, so it is the likeliest to be killed each time the OOM killer
zaps something.  Normally the runaway process will be killed after only a
few tries, and usually most of the system will still be running.  The
system may even recover, if there are monitoring processes looking to
restart important daemons (such monitoring processes normally aren't
allocating any memory and won't be killed).

Killing less active processes seems like a terrible heuristic, as the
runaway process will likely be the very last process killed.
____________________________________________________________________
TonyN.:'                       <mailto:tonynelson at georgeanelson.com>
      '                              <http://www.georgeanelson.com/>