Hi all,
I've just had oom killer toast a process. From the messages can I tell how much memory (real/swap) this process was using before it was blasted?
kernel: Normal free:3304kB min:3756kB low:4692kB high:5632kB active:343712kB inactive:466528kB present:901120kB pages_scanned:2328996 all_unreclaimable? yes kernel: protections[]: 0 0 0 HighMem free:512kB min:512kB low:640kB high:768kB active:568648kB inactive:567380kB present:1179072kB pages_scanned:508296 all_unreclaimable? no protections[]: 0 0 0 kernel: DMA: 1*4kB 0*8kB 2*16kB 1*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 68kB kernel: Normal: 0*4kB 1*8kB 0*16kB 19*32kB 0*64kB 15*128kB 3*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3304kB kernel: HighMem: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 512kB kernel: Swap cache: add 5952419, delete 5951922, find 42444162/42881561, race 35+213 kernel: Free swap: 0kB kernel: 524144 pages of RAM kernel: 294768 pages of HIGHMEM kernel: 5277 reserved pages kernel: 2594451 pages shared kernel: 497 pages swap cached kernel: Out of Memory: Killed process 15156 (httpd). kernel: oom-killer: gfp_mask=0xd0
Naoki wrote:
Hi all,
I've just had oom killer toast a process. From the messages can I tell how much memory (real/swap) this process was using before it was blasted?
I'm not sure it's useful to know that:-( In my experience (which includes 2.6 kernels that are supposed to do this better) the killed process is generally an innocent bystander.
Yes, it's very hard to track down the cause of the memory problem in this situation. Even just the output of a ps before killing took place would be very helpful. When memory + swap is all used it's very difficult to log in and run these sorts of commands.
On Mon, 2005-11-28 at 21:09 +0800, John Summerfied wrote:
Naoki wrote:
Hi all,
I've just had oom killer toast a process. From the messages can I tell how much memory (real/swap) this process was using before it was blasted?
I'm not sure it's useful to know that:-( In my experience (which includes 2.6 kernels that are supposed to do this better) the killed process is generally an innocent bystander.
...................................................................................... Mark "Naoki" Rogers /VP - Systems Engineering Systems ValueCommerce Co., Ltd.
Tokyo Bldg 4F 3-32-7 Hongo Bunkyo-ku Tokyo 113-0033 Japan Tel. +81.3.3817.8995 Fax. +81.3.3812.4051 mailto:naoki@valuecommerce.co.jp ......................................................................................
On Mon, 2005-11-28 at 21:09 +0800, John Summerfied wrote:
I'm not sure it's useful to know that:-( In my experience (which includes 2.6 kernels that are supposed to do this better) the killed process is generally an innocent bystander.
The process that triggered the OOM condition is probably just as innocent. There isn't always a "cuplrit", and if there is it isn't easy to spot. The OOM killer tries to do the most sensible thing by killing less active processes that still yield a fair amount of released memory.
BTW, the OpenBSD folks reckon there is no reasonable or fair way of dealing with an OOM situation and take the easy way out: they simply halt the whole system...
Cheers Steffen.
At 3:53 PM +1100 11/29/05, Steffen Kluge wrote:
Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="=-SipagDTVN24gA9Rruz2i"
On Mon, 2005-11-28 at 21:09 +0800, John Summerfied wrote:
I'm not sure it's useful to know that:-( In my experience (which includes 2.6 kernels that are supposed to do this better) the killed process is generally an innocent bystander.
The process that triggered the OOM condition is probably just as innocent. There isn't always a "cuplrit", and if there is it isn't easy to spot. The OOM killer tries to do the most sensible thing by killing less active processes that still yield a fair amount of released memory.
BTW, the OpenBSD folks reckon there is no reasonable or fair way of dealing with an OOM situation and take the easy way out: they simply halt the whole system...
I don't know how Linux does it, but my understanding is that (old) Unix used to kill the process that made the failing request. In most cases, a runaway process has exhausted the memory, and is still making requests for more memory, so it is the likeliest to be killed each time the OOM killer zaps something. Normally the runaway process will be killed after only a few tries, and usually most of the system will still be running. The system may even recover, if there are monitoring processes looking to restart important daemons (such monitoring processes normally aren't allocating any memory and won't be killed).
Killing less active processes seems like a terrible heuristic, as the runaway process will likely be the very last process killed. ____________________________________________________________________ TonyN.:' mailto:tonynelson@georgeanelson.com ' http://www.georgeanelson.com/
I agree with the BSD folks that there is no way to fairly decide which process should be killed. However I'd _much_ rather have a system come back missing a process than a halted box.
The question though is how to diagnose the cause of the memory issue.
If OOM doesn't give any information about process state before it starts killing then it's hard to track down the root cause.
On Tue, 2005-11-29 at 15:53 +1100, Steffen Kluge wrote:
On Mon, 2005-11-28 at 21:09 +0800, John Summerfied wrote:
I'm not sure it's useful to know that:-( In my experience (which includes 2.6 kernels that are supposed to do this better) the killed process is generally an innocent bystander.
The process that triggered the OOM condition is probably just as innocent. There isn't always a "cuplrit", and if there is it isn't easy to spot. The OOM killer tries to do the most sensible thing by killing less active processes that still yield a fair amount of released memory.
BTW, the OpenBSD folks reckon there is no reasonable or fair way of dealing with an OOM situation and take the easy way out: they simply halt the whole system...