Hi,
I recently moved a httpd mirror from i386 to x86_64 hardware. All but the memory capacity were scaled up (100MBit to 1GBit, Athlon to Opteron etc.).
The old server is running FC1/i386 and the new one got FC2/x86_64. Although both distros have httpd in the same upstream version (2.0.50) the memory footage is quite different, and leads to the striking of the oom-killer on the new server:
old (FC1/i386):
root 3329 0.0 0.2 26900 2408 ? S Aug29 0:01 /usr/sbin/httpd apache 32746 0.0 0.5 28388 5984 ? S 06:01 0:03 _ /usr/sbin/httpd apache 6206 0.0 0.5 28320 5968 ? S 09:31 0:02 _ /usr/sbin/httpd apache 8185 0.0 0.5 28452 6080 ? S 10:09 0:04 _ /usr/sbin/httpd apache 10293 0.0 0.5 28452 6056 ? S 11:07 0:03 _ /usr/sbin/httpd apache 10378 0.0 0.5 28396 5984 ? S 11:09 0:03 _ /usr/sbin/httpd [...]
new (FC2/x86_64):
root 2177 0.0 1.5 189664 15556 ? S Aug30 0:00 /usr/sbin/httpd apache 2273 0.0 1.7 193076 17984 ? S Aug30 0:03 _ /usr/sbin/httpd apache 2282 0.0 1.7 193052 17996 ? S Aug30 0:04 _ /usr/sbin/httpd apache 3125 0.0 1.7 193052 17992 ? S Aug30 0:03 _ /usr/sbin/httpd apache 3993 0.0 1.7 193052 18060 ? S 01:27 0:03 _ /usr/sbin/httpd apache 3994 0.0 1.7 193052 17976 ? S 01:27 0:02 _ /usr/sbin/httpd apache 7193 0.0 1.7 193052 17992 ? S 05:01 0:02 _ /usr/sbin/httpd
Is this normal for i386 -> x86_64? Anyone running httpd on x86_64 to compare numbers?
Thanks!
On Tue, Aug 31, 2004 at 04:05:50PM +0200, Axel Thimm wrote:
Hi,
I recently moved a httpd mirror from i386 to x86_64 hardware. All but the memory capacity were scaled up (100MBit to 1GBit, Athlon to Opteron etc.).
The old server is running FC1/i386 and the new one got FC2/x86_64. Although both distros have httpd in the same upstream version (2.0.50) the memory footage is quite different, and leads to the striking of the oom-killer on the new server:
I see the same thing here, not looked into it before though.
It's interesting, if you look at the mappings of the httpd process on x86_64, for each mmaped object there is an extra region mapped with PROT_NONE, which you don't see on i686. I presume this is counted in the VmSize calculation - it adds up to about 100Mb of address space on the system I tested.
e.g.
2a9b033000 12K r-xp /usr/lib64/libpanel.so.5.3 2a9b036000 1012K ---p /usr/lib64/libpanel.so.5.3 2a9b133000 16K rw-p /usr/lib64/libpanel.so.5.3
vs
00d57000 12K r-xp /usr/lib/libpanel.so.5.3 00d5a000 4K rw-p /usr/lib/libpanel.so.5.3
is this libc behaviour by design? Jakub?
old (FC1/i386):
root 3329 0.0 0.2 26900 2408 ? S Aug29 0:01 /usr/sbin/httpd apache 32746 0.0 0.5 28388 5984 ? S 06:01 0:03 _ /usr/sbin/httpd apache 6206 0.0 0.5 28320 5968 ? S 09:31 0:02 _ /usr/sbin/httpd apache 8185 0.0 0.5 28452 6080 ? S 10:09 0:04 _ /usr/sbin/httpd apache 10293 0.0 0.5 28452 6056 ? S 11:07 0:03 _ /usr/sbin/httpd apache 10378 0.0 0.5 28396 5984 ? S 11:09 0:03 _ /usr/sbin/httpd [...]
new (FC2/x86_64):
root 2177 0.0 1.5 189664 15556 ? S Aug30 0:00 /usr/sbin/httpd apache 2273 0.0 1.7 193076 17984 ? S Aug30 0:03 _ /usr/sbin/httpd apache 2282 0.0 1.7 193052 17996 ? S Aug30 0:04 _ /usr/sbin/httpd apache 3125 0.0 1.7 193052 17992 ? S Aug30 0:03 _ /usr/sbin/httpd apache 3993 0.0 1.7 193052 18060 ? S 01:27 0:03 _ /usr/sbin/httpd apache 3994 0.0 1.7 193052 17976 ? S 01:27 0:02 _ /usr/sbin/httpd apache 7193 0.0 1.7 193052 17992 ? S 05:01 0:02 _ /usr/sbin/httpd
Is this normal for i386 -> x86_64? Anyone running httpd on x86_64 to compare numbers?
Thanks!
Axel.Thimm at ATrpms.net
-- fedora-list mailing list fedora-list@redhat.com To unsubscribe: http://www.redhat.com/mailman/listinfo/fedora-list
On Thu, Sep 02, 2004 at 09:23:10PM +0100, Joe Orton wrote:
I see the same thing here, not looked into it before though.
It's interesting, if you look at the mappings of the httpd process on x86_64, for each mmaped object there is an extra region mapped with PROT_NONE, which you don't see on i686. I presume this is counted in the VmSize calculation - it adds up to about 100Mb of address space on the system I tested.
e.g.
2a9b033000 12K r-xp /usr/lib64/libpanel.so.5.3 2a9b036000 1012K ---p /usr/lib64/libpanel.so.5.3 2a9b133000 16K rw-p /usr/lib64/libpanel.so.5.3
vs
00d57000 12K r-xp /usr/lib/libpanel.so.5.3 00d5a000 4K rw-p /usr/lib/libpanel.so.5.3
is this libc behaviour by design? Jakub?
Well, not glibc, but binutils. The thing is, x86-64 ELF has 1MB pagesize, while i386 ELF 4KB, so the x86-64 binaries and shared libraries must be usable even when kernel uses 1MB pagesize.
The gap in between RE and RW segment is there so that the library occupies less memory (eats less 4KB pages).
If something counts in PROT_NONE mappings into the process size, it should be fixed.
Jakub
On Thu, Sep 02, 2004 at 04:34:20PM -0400, Jakub Jelinek wrote:
On Thu, Sep 02, 2004 at 09:23:10PM +0100, Joe Orton wrote:
I see the same thing here, not looked into it before though.
It's interesting, if you look at the mappings of the httpd process on x86_64, for each mmaped object there is an extra region mapped with PROT_NONE, which you don't see on i686. I presume this is counted in the VmSize calculation - it adds up to about 100Mb of address space on the system I tested.
e.g.
2a9b033000 12K r-xp /usr/lib64/libpanel.so.5.3 2a9b036000 1012K ---p /usr/lib64/libpanel.so.5.3 2a9b133000 16K rw-p /usr/lib64/libpanel.so.5.3
vs
00d57000 12K r-xp /usr/lib/libpanel.so.5.3 00d5a000 4K rw-p /usr/lib/libpanel.so.5.3
is this libc behaviour by design? Jakub?
Well, not glibc, but binutils. The thing is, x86-64 ELF has 1MB pagesize, while i386 ELF 4KB, so the x86-64 binaries and shared libraries must be usable even when kernel uses 1MB pagesize.
The gap in between RE and RW segment is there so that the library occupies less memory (eats less 4KB pages).
If something counts in PROT_NONE mappings into the process size, it should be fixed.
Such mappings are certainly counted in the VmSize reported by /proc/pid/status, which looks like it (mm->total_vm) is used in the OOM killer heuristics and also in the 'ps' VSZ output. I couldn't argue otherwise since it is vm space which is being "used".
But if these mappings are not really consuming memory, then they are not necessarily the cause of your problems, Axel. It does mean that if you *do* run out of memory, then the httpd processes are more likely to get OOM killed on x86_64 than on i386.
joe
On Fri, Sep 03, 2004 at 11:20:43AM +0100, Joe Orton wrote:
On Thu, Sep 02, 2004 at 04:34:20PM -0400, Jakub Jelinek wrote:
On Thu, Sep 02, 2004 at 09:23:10PM +0100, Joe Orton wrote:
I see the same thing here, not looked into it before though.
It's interesting, if you look at the mappings of the httpd process on x86_64, for each mmaped object there is an extra region mapped with PROT_NONE, which you don't see on i686. I presume this is counted in the VmSize calculation - it adds up to about 100Mb of address space on the system I tested.
e.g.
2a9b033000 12K r-xp /usr/lib64/libpanel.so.5.3 2a9b036000 1012K ---p /usr/lib64/libpanel.so.5.3 2a9b133000 16K rw-p /usr/lib64/libpanel.so.5.3
vs
00d57000 12K r-xp /usr/lib/libpanel.so.5.3 00d5a000 4K rw-p /usr/lib/libpanel.so.5.3
is this libc behaviour by design? Jakub?
Well, not glibc, but binutils. The thing is, x86-64 ELF has 1MB pagesize, while i386 ELF 4KB, so the x86-64 binaries and shared libraries must be usable even when kernel uses 1MB pagesize.
The gap in between RE and RW segment is there so that the library occupies less memory (eats less 4KB pages).
If something counts in PROT_NONE mappings into the process size, it should be fixed.
Such mappings are certainly counted in the VmSize reported by /proc/pid/status, which looks like it (mm->total_vm) is used in the OOM killer heuristics and also in the 'ps' VSZ output. I couldn't argue otherwise since it is vm space which is being "used".
But if these mappings are not really consuming memory, then they are not necessarily the cause of your problems, Axel. It does mean that if you *do* run out of memory, then the httpd processes are more likely to get OOM killed on x86_64 than on i386.
Well, I can only argue phenomenologically, the same httpd config under FC1/i386/1GB could easily serve the default 150 MaxClients. Migrating to FC2/x86_64/1GB (e.g. same amount of RAM, but i386->x86_64 and FC1->FC2) requires me to reduce MaxClients to below 50 to keep the machine from running OOM.
The VM numbers are 8 times as big, and perhaps this figure is irrelevant as discussed above, but the RSS size is also 3 times higher than on the old server, and that is probably hurting (20MB RSS times 150 MaxClients is already at 3GB, while previously it was at 6MB times resulting to 1.2GB).
How can I debug the memory consumption on this box? Which figures are the ones to look for and which ones do accumulate for the OOM killer?
On Fri, Sep 03, 2004 at 12:40:53PM +0200, Axel Thimm wrote:
How can I debug the memory consumption on this box? Which figures are the ones to look for and which ones do accumulate for the OOM killer?
IMHO best would be to install 32-bit and 64-bit httpd side by side, configure it the same (with a different port number), keep downloading the same page from it and try to grab /proc/<pid>/maps from both processes.
Jakub
Actually, I don't see RSS numbers particularly different between i686 and x86_64.
For a server under ab load for a static page, I get an RSS of ~10.5Mb on amd64, and ~9.5Mb on i686, which seems like a reasonable difference. This is comparing like for like on RHEL3 U3, I don't have a Hammer box running FC2 to compare.
You are comparing active servers not idle ones, right? RSS of an idle server is not interesting since it may be half swapped out.
joe
On Fri, Sep 03, 2004 at 12:32:42PM +0100, Joe Orton wrote:
Actually, I don't see RSS numbers particularly different between i686 and x86_64.
For a server under ab load for a static page, I get an RSS of ~10.5Mb on amd64, and ~9.5Mb on i686, which seems like a reasonable difference. This is comparing like for like on RHEL3 U3, I don't have a Hammer box running FC2 to compare.
In my case this was 20MB vs 6MB.
You are comparing active servers not idle ones, right? RSS of an idle server is not interesting since it may be half swapped out.
No, the numbers were from actual full workload scenarios.
It must not be a i386 vs x86_64 issue, it can also be FC1->FC2 issue (including kernel 2.4->2.6).
On Fri, Sep 03, 2004 at 02:54:12PM +0200, Axel Thimm wrote:
It must not be a i386 vs x86_64 issue, it can also be FC1->FC2 issue (including kernel 2.4->2.6).
I see roughly the same RSS and VSZ between FC1 and FC2 on x86 boxes. I presume you have exactly the same set of modules installed&loaded on both servers?
joe
On Fri, Sep 03, 2004 at 05:16:51PM +0100, Joe Orton wrote:
On Fri, Sep 03, 2004 at 02:54:12PM +0200, Axel Thimm wrote:
It must not be a i386 vs x86_64 issue, it can also be FC1->FC2 issue (including kernel 2.4->2.6).
I see roughly the same RSS and VSZ between FC1 and FC2 on x86 boxes. I presume you have exactly the same set of modules installed&loaded on both servers?
I am using the default set of modules that FC1 resp. FC2 come preconfigured with, I only configure vhosts upon the default config.
Is there such a difference FC1 vs FC2 in the set of default modules and the thus consumed memory?
On Fri, Sep 03, 2004 at 06:46:18AM -0400, Jakub Jelinek wrote:
On Fri, Sep 03, 2004 at 12:40:53PM +0200, Axel Thimm wrote:
How can I debug the memory consumption on this box? Which figures are the ones to look for and which ones do accumulate for the OOM killer?
IMHO best would be to install 32-bit and 64-bit httpd side by side, configure it the same (with a different port number), keep downloading the same page from it and try to grab /proc/<pid>/maps from both processes.
It turns out that memory gets consumed and not returned back to the system independent of httpd (the oom-killer just strikes there first).
On an FC2/x86_64 system (Tyan S2880 with one processor only) with untained 2.6.8-1.521 on 1GB RAM simple compilations can eat up all the memory. I trimmed down such a system up to basic networking to detect which processes were locking the memory, and no userland processes are holding the memory. But almost all memory is flagged as "used" (with negligible size of buffers and cache).
Is this a kernel memory leak? Any other information I should collect?
(I still cannot judge whether the change from kernel 2.4 to 2.6 or the architecture change i386 to x86_64 is responsible for this due to lack of different combinations)
# free total used free shared buffers cached Mem: 1027016 1022600 4416 0 992 7288 -/+ buffers/cache: 1014320 12696 Swap: 2047992 4496 2043496 # vmstat -a procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free inact active si so bi bo in cs us sy id wa 0 0 4496 4352 4548 6556 1 1 399 80 1517 162 2 2 88 8 # cat /proc/meminfo MemTotal: 1027016 kB MemFree: 4352 kB Buffers: 1008 kB Cached: 7316 kB SwapCached: 1148 kB Active: 6528 kB Inactive: 4536 kB HighTotal: 0 kB HighFree: 0 kB LowTotal: 1027016 kB LowFree: 4352 kB SwapTotal: 2047992 kB SwapFree: 2043496 kB Dirty: 236 kB Writeback: 0 kB Mapped: 5296 kB Slab: 14388 kB Committed_AS: 535496 kB PageTables: 494900 kB VmallocTotal: 536870911 kB VmallocUsed: 1568 kB VmallocChunk: 536869323 kB HugePages_Total: 0 HugePages_Free: 0 Hugepagesize: 2048 kB # ps uaxwwf USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 0.0 0.0 3472 428 ? S Sep12 0:01 init [3] root 2 0.0 0.0 0 0 ? SWN Sep12 0:00 [ksoftirqd/0] root 3 0.0 0.0 0 0 ? SW< Sep12 0:00 [events/0] root 4 0.0 0.0 0 0 ? SW< Sep12 0:00 _ [khelper] root 5 0.0 0.0 0 0 ? SW< Sep12 0:00 _ [kacpid] root 30 0.0 0.0 0 0 ? SW< Sep12 0:00 _ [kblockd/0] root 44 0.0 0.0 0 0 ? SW Sep12 0:00 _ [pdflush] root 45 0.0 0.0 0 0 ? SW Sep12 0:02 _ [pdflush] root 47 0.0 0.0 0 0 ? SW< Sep12 0:00 _ [aio/0] root 186 0.0 0.0 0 0 ? SW< Sep12 0:00 _ [ata/0] root 31 0.0 0.0 0 0 ? SW Sep12 0:00 [khubd] root 46 0.0 0.0 0 0 ? SW Sep12 0:01 [kswapd0] root 151 0.0 0.0 0 0 ? SW Sep12 0:00 [kseriod] root 188 0.0 0.0 0 0 ? SW Sep12 0:00 [scsi_eh_0] root 189 0.0 0.0 0 0 ? SW Sep12 0:00 [scsi_eh_1] root 204 0.0 0.0 0 0 ? SW Sep12 0:00 [kjournald] root 339 0.0 0.0 2336 216 ? S< Sep12 0:00 udevd root 896 0.0 0.0 0 0 ? SW Sep12 0:00 [kjournald] root 897 0.0 0.0 0 0 ? SW Sep12 0:00 [kjournald] root 898 0.0 0.0 0 0 ? SW Sep12 0:00 [kjournald] root 899 0.0 0.0 0 0 ? SW Sep12 0:00 [kjournald] root 1637 0.0 0.0 0 0 ? SW< Sep12 0:00 [krfcommd] root 1946 0.0 0.0 18104 748 ? S Sep12 0:00 /usr/sbin/sshd root 5189 0.0 0.1 37540 1056 ? S 02:04 0:00 _ sshd: root@pts/0 root 5195 0.0 0.0 45656 1020 pts/0 S 02:04 0:00 | _ -bash root 5255 0.0 0.1 104764 1892 pts/0 S 02:04 0:00 | _ gkrellm root 29075 0.0 0.0 44836 500 pts/0 S 02:38 0:00 | _ sleep 10 root 6119 0.0 0.0 37284 1020 ? S 02:19 0:00 _ sshd: root@pts/1 root 6133 0.0 0.1 45656 1120 pts/1 S 02:19 0:00 | _ -bash root 29079 0.0 0.0 44476 924 pts/1 S 02:38 0:00 | _ /bin/sh ./memory.sh root 29083 0.0 0.0 5228 784 pts/1 R 02:38 0:00 | _ ps uaxwwf root 6193 0.0 0.0 37284 1020 ? S 02:20 0:00 _ sshd: root@pts/2 root 6212 0.0 0.1 45656 1136 pts/2 S 02:20 0:00 | _ -bash root 29077 0.0 0.1 35936 1932 ? S 02:38 0:00 _ sshd: bin [priv] sshd 29078 0.0 0.1 19448 1120 ? S 02:38 0:00 _ sshd: bin [net] root 2542 0.0 0.0 2344 272 tty1 S Sep12 0:00 /sbin/mingetty tty1 root 2543 0.0 0.0 2344 272 tty2 S Sep12 0:00 /sbin/mingetty tty2 root 2544 0.0 0.0 2344 272 tty3 S Sep12 0:00 /sbin/mingetty tty3 root 2545 0.0 0.0 2344 276 tty4 S Sep12 0:00 /sbin/mingetty tty4 root 2546 0.0 0.0 2344 276 tty5 S Sep12 0:00 /sbin/mingetty tty5 root 2547 0.0 0.0 2344 276 tty6 S Sep12 0:00 /sbin/mingetty tty6
As a follow-up, this looks familiar to the following bug reports in bugzilla:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=131251 https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=131414 https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=132180
The latter two may be leaking due to the SG_IO/bio_uncopy_user memory leak:
http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.8.1/2.6.8.1-m...
But the system in question has no active device driven by SCSI layers (no SATA/SCSI/CD-ROM attached, only plain-old IDE), and the memory leak occurs when trying to rebuild 2.6.8-1.521 kernels or kernel modules.
On Mon, Sep 13, 2004 at 03:05:44AM +0200, Axel Thimm wrote:
On Fri, Sep 03, 2004 at 06:46:18AM -0400, Jakub Jelinek wrote:
On Fri, Sep 03, 2004 at 12:40:53PM +0200, Axel Thimm wrote:
How can I debug the memory consumption on this box? Which figures are the ones to look for and which ones do accumulate for the OOM killer?
IMHO best would be to install 32-bit and 64-bit httpd side by side, configure it the same (with a different port number), keep downloading the same page from it and try to grab /proc/<pid>/maps from both processes.
It turns out that memory gets consumed and not returned back to the system independent of httpd (the oom-killer just strikes there first).
On an FC2/x86_64 system (Tyan S2880 with one processor only) with untained 2.6.8-1.521 on 1GB RAM simple compilations can eat up all the memory. I trimmed down such a system up to basic networking to detect which processes were locking the memory, and no userland processes are holding the memory. But almost all memory is flagged as "used" (with negligible size of buffers and cache).
Is this a kernel memory leak? Any other information I should collect?
(I still cannot judge whether the change from kernel 2.4 to 2.6 or the architecture change i386 to x86_64 is responsible for this due to lack of different combinations)
# free total used free shared buffers cached Mem: 1027016 1022600 4416 0 992 7288 -/+ buffers/cache: 1014320 12696 Swap: 2047992 4496 2043496 # vmstat -a procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free inact active si so bi bo in cs us sy id wa 0 0 4496 4352 4548 6556 1 1 399 80 1517 162 2 2 88 8 # cat /proc/meminfo MemTotal: 1027016 kB MemFree: 4352 kB Buffers: 1008 kB Cached: 7316 kB SwapCached: 1148 kB Active: 6528 kB Inactive: 4536 kB HighTotal: 0 kB HighFree: 0 kB LowTotal: 1027016 kB LowFree: 4352 kB SwapTotal: 2047992 kB SwapFree: 2043496 kB Dirty: 236 kB Writeback: 0 kB Mapped: 5296 kB Slab: 14388 kB Committed_AS: 535496 kB PageTables: 494900 kB VmallocTotal: 536870911 kB VmallocUsed: 1568 kB VmallocChunk: 536869323 kB HugePages_Total: 0 HugePages_Free: 0 Hugepagesize: 2048 kB # ps uaxwwf USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 0.0 0.0 3472 428 ? S Sep12 0:01 init [3] root 2 0.0 0.0 0 0 ? SWN Sep12 0:00 [ksoftirqd/0] root 3 0.0 0.0 0 0 ? SW< Sep12 0:00 [events/0] root 4 0.0 0.0 0 0 ? SW< Sep12 0:00 _ [khelper] root 5 0.0 0.0 0 0 ? SW< Sep12 0:00 _ [kacpid] root 30 0.0 0.0 0 0 ? SW< Sep12 0:00 _ [kblockd/0] root 44 0.0 0.0 0 0 ? SW Sep12 0:00 _ [pdflush] root 45 0.0 0.0 0 0 ? SW Sep12 0:02 _ [pdflush] root 47 0.0 0.0 0 0 ? SW< Sep12 0:00 _ [aio/0] root 186 0.0 0.0 0 0 ? SW< Sep12 0:00 _ [ata/0] root 31 0.0 0.0 0 0 ? SW Sep12 0:00 [khubd] root 46 0.0 0.0 0 0 ? SW Sep12 0:01 [kswapd0] root 151 0.0 0.0 0 0 ? SW Sep12 0:00 [kseriod] root 188 0.0 0.0 0 0 ? SW Sep12 0:00 [scsi_eh_0] root 189 0.0 0.0 0 0 ? SW Sep12 0:00 [scsi_eh_1] root 204 0.0 0.0 0 0 ? SW Sep12 0:00 [kjournald] root 339 0.0 0.0 2336 216 ? S< Sep12 0:00 udevd root 896 0.0 0.0 0 0 ? SW Sep12 0:00 [kjournald] root 897 0.0 0.0 0 0 ? SW Sep12 0:00 [kjournald] root 898 0.0 0.0 0 0 ? SW Sep12 0:00 [kjournald] root 899 0.0 0.0 0 0 ? SW Sep12 0:00 [kjournald] root 1637 0.0 0.0 0 0 ? SW< Sep12 0:00 [krfcommd] root 1946 0.0 0.0 18104 748 ? S Sep12 0:00 /usr/sbin/sshd root 5189 0.0 0.1 37540 1056 ? S 02:04 0:00 _ sshd: root@pts/0 root 5195 0.0 0.0 45656 1020 pts/0 S 02:04 0:00 | _ -bash root 5255 0.0 0.1 104764 1892 pts/0 S 02:04 0:00 | _ gkrellm root 29075 0.0 0.0 44836 500 pts/0 S 02:38 0:00 | _ sleep 10 root 6119 0.0 0.0 37284 1020 ? S 02:19 0:00 _ sshd: root@pts/1 root 6133 0.0 0.1 45656 1120 pts/1 S 02:19 0:00 | _ -bash root 29079 0.0 0.0 44476 924 pts/1 S 02:38 0:00 | _ /bin/sh ./memory.sh root 29083 0.0 0.0 5228 784 pts/1 R 02:38 0:00 | _ ps uaxwwf root 6193 0.0 0.0 37284 1020 ? S 02:20 0:00 _ sshd: root@pts/2 root 6212 0.0 0.1 45656 1136 pts/2 S 02:20 0:00 | _ -bash root 29077 0.0 0.1 35936 1932 ? S 02:38 0:00 _ sshd: bin [priv] sshd 29078 0.0 0.1 19448 1120 ? S 02:38 0:00 _ sshd: bin [net] root 2542 0.0 0.0 2344 272 tty1 S Sep12 0:00 /sbin/mingetty tty1 root 2543 0.0 0.0 2344 272 tty2 S Sep12 0:00 /sbin/mingetty tty2 root 2544 0.0 0.0 2344 272 tty3 S Sep12 0:00 /sbin/mingetty tty3 root 2545 0.0 0.0 2344 276 tty4 S Sep12 0:00 /sbin/mingetty tty4 root 2546 0.0 0.0 2344 276 tty5 S Sep12 0:00 /sbin/mingetty tty5 root 2547 0.0 0.0 2344 276 tty6 S Sep12 0:00 /sbin/mingetty tty6
On Mon, Sep 13, 2004 at 11:36:28AM +0200, Axel Thimm wrote:
On Mon, Sep 13, 2004 at 03:05:44AM +0200, Axel Thimm wrote:
On Fri, Sep 03, 2004 at 06:46:18AM -0400, Jakub Jelinek wrote:
On Fri, Sep 03, 2004 at 12:40:53PM +0200, Axel Thimm wrote:
How can I debug the memory consumption on this box? Which figures are the ones to look for and which ones do accumulate for the OOM killer?
IMHO best would be to install 32-bit and 64-bit httpd side by side,
It turns out that memory gets consumed and not returned back to the system independent of httpd (the oom-killer just strikes there first).
The bug has now been identified as a kernel memory leak in 2.6.8-1.521 and 2.6.8-1.541 on a x86_64 system running ia32 bit apps in IA32_EMULATION mode.
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=132947
It is easily reproducable, and I wonder why there have been so few reports on the net on this (kernel 2.6.8-1.521 has been released before a month). Probably most x86_64 users are running in 64 bits only and don't observe the leak.
Any references/fixes are more than welcome. :)
On an FC2/x86_64 system (Tyan S2880 with one processor only) with untained 2.6.8-1.521 on 1GB RAM simple compilations can eat up all the memory. I trimmed down such a system up to basic networking to detect which processes were locking the memory, and no userland processes are holding the memory. But almost all memory is flagged as "used" (with negligible size of buffers and cache).
Is this a kernel memory leak? Any other information I should collect?
(I still cannot judge whether the change from kernel 2.4 to 2.6 or the architecture change i386 to x86_64 is responsible for this due to lack of different combinations)
# free total used free shared buffers cached Mem: 1027016 1022600 4416 0 992 7288 -/+ buffers/cache: 1014320 12696 Swap: 2047992 4496 2043496 # vmstat -a procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free inact active si so bi bo in cs us sy id wa 0 0 4496 4352 4548 6556 1 1 399 80 1517 162 2 2 88 8 # cat /proc/meminfo MemTotal: 1027016 kB MemFree: 4352 kB Buffers: 1008 kB Cached: 7316 kB SwapCached: 1148 kB Active: 6528 kB Inactive: 4536 kB HighTotal: 0 kB HighFree: 0 kB LowTotal: 1027016 kB LowFree: 4352 kB SwapTotal: 2047992 kB SwapFree: 2043496 kB Dirty: 236 kB Writeback: 0 kB Mapped: 5296 kB Slab: 14388 kB Committed_AS: 535496 kB PageTables: 494900 kB VmallocTotal: 536870911 kB VmallocUsed: 1568 kB VmallocChunk: 536869323 kB HugePages_Total: 0 HugePages_Free: 0 Hugepagesize: 2048 kB # ps uaxwwf USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 0.0 0.0 3472 428 ? S Sep12 0:01 init [3] root 2 0.0 0.0 0 0 ? SWN Sep12 0:00 [ksoftirqd/0] root 3 0.0 0.0 0 0 ? SW< Sep12 0:00 [events/0] root 4 0.0 0.0 0 0 ? SW< Sep12 0:00 _ [khelper] root 5 0.0 0.0 0 0 ? SW< Sep12 0:00 _ [kacpid] root 30 0.0 0.0 0 0 ? SW< Sep12 0:00 _ [kblockd/0] root 44 0.0 0.0 0 0 ? SW Sep12 0:00 _ [pdflush] root 45 0.0 0.0 0 0 ? SW Sep12 0:02 _ [pdflush] root 47 0.0 0.0 0 0 ? SW< Sep12 0:00 _ [aio/0] root 186 0.0 0.0 0 0 ? SW< Sep12 0:00 _ [ata/0] root 31 0.0 0.0 0 0 ? SW Sep12 0:00 [khubd] root 46 0.0 0.0 0 0 ? SW Sep12 0:01 [kswapd0] root 151 0.0 0.0 0 0 ? SW Sep12 0:00 [kseriod] root 188 0.0 0.0 0 0 ? SW Sep12 0:00 [scsi_eh_0] root 189 0.0 0.0 0 0 ? SW Sep12 0:00 [scsi_eh_1] root 204 0.0 0.0 0 0 ? SW Sep12 0:00 [kjournald] root 339 0.0 0.0 2336 216 ? S< Sep12 0:00 udevd root 896 0.0 0.0 0 0 ? SW Sep12 0:00 [kjournald] root 897 0.0 0.0 0 0 ? SW Sep12 0:00 [kjournald] root 898 0.0 0.0 0 0 ? SW Sep12 0:00 [kjournald] root 899 0.0 0.0 0 0 ? SW Sep12 0:00 [kjournald] root 1637 0.0 0.0 0 0 ? SW< Sep12 0:00 [krfcommd] root 1946 0.0 0.0 18104 748 ? S Sep12 0:00 /usr/sbin/sshd root 5189 0.0 0.1 37540 1056 ? S 02:04 0:00 _ sshd: root@pts/0 root 5195 0.0 0.0 45656 1020 pts/0 S 02:04 0:00 | _ -bash root 5255 0.0 0.1 104764 1892 pts/0 S 02:04 0:00 | _ gkrellm root 29075 0.0 0.0 44836 500 pts/0 S 02:38 0:00 | _ sleep 10 root 6119 0.0 0.0 37284 1020 ? S 02:19 0:00 _ sshd: root@pts/1 root 6133 0.0 0.1 45656 1120 pts/1 S 02:19 0:00 | _ -bash root 29079 0.0 0.0 44476 924 pts/1 S 02:38 0:00 | _ /bin/sh ./memory.sh root 29083 0.0 0.0 5228 784 pts/1 R 02:38 0:00 | _ ps uaxwwf root 6193 0.0 0.0 37284 1020 ? S 02:20 0:00 _ sshd: root@pts/2 root 6212 0.0 0.1 45656 1136 pts/2 S 02:20 0:00 | _ -bash root 29077 0.0 0.1 35936 1932 ? S 02:38 0:00 _ sshd: bin [priv] sshd 29078 0.0 0.1 19448 1120 ? S 02:38 0:00 _ sshd: bin [net] root 2542 0.0 0.0 2344 272 tty1 S Sep12 0:00 /sbin/mingetty tty1 root 2543 0.0 0.0 2344 272 tty2 S Sep12 0:00 /sbin/mingetty tty2 root 2544 0.0 0.0 2344 272 tty3 S Sep12 0:00 /sbin/mingetty tty3 root 2545 0.0 0.0 2344 276 tty4 S Sep12 0:00 /sbin/mingetty tty4 root 2546 0.0 0.0 2344 276 tty5 S Sep12 0:00 /sbin/mingetty tty5 root 2547 0.0 0.0 2344 276 tty6 S Sep12 0:00 /sbin/mingetty tty6