I recently setup a server on the Internet in a fairly default FC5T2 configuration. I had it running kernel 2.6.15-1.1955_FC5 for 9 days when I noticed that it had apparently run out of kernel memory. All processes were highly swapped and the OOM killer was killing processes (by the time I noticed the problem most services had been killed).
The machine is a P3 with 256M of RAM, it has full net access with no firewall (iptables is used in a fairly default configuration but there is nothing in front of the machine protecting it).
Is there a known kernel memory leak in 2.6.15-1.1955_FC5?
The machine in question has BIND, Postfix, Amavis + clamav, and Postgrey installed. However as it doesn't yet have an MX record pointing to is there has been little use (some DNS traffic is all that it would get).
I tried rebooting the machine and it crashed. I guess that the OOM killer killed something that was needed for a reboot. Tomorrow when I get it running again I will upgrade it to the latest kernel. Let me know if there are any tests I should perform if this happens again (unfortunately I can't get a net-dump server in there but disk-dump is an option I guess).
Is there a known kernel memory leak in 2.6.15-1.1955_FC5?
Is this the stock kernel? Does it have any 3rd party modules installed?
I ask, because I ran into a memory leak with the 1955 LSPP kernel, with madwifi and nvidia installed. Steve Grubb later said he's found some sort of leak, and it will be fixed. I am now running stock 1977 kernel, will wait and see if it leaks.
On Monday 27 February 2006 02:52, Ivan Gyurdiev ivg2@cornell.edu wrote:
Is there a known kernel memory leak in 2.6.15-1.1955_FC5?
Is this the stock kernel? Does it have any 3rd party modules installed?
Stock kernel with no 3rd party modules.
I ask, because I ran into a memory leak with the 1955 LSPP kernel, with madwifi and nvidia installed.
That's a different thing. My machine in question has no wifi and no X.
Steve Grubb later said he's found some sort of leak, and it will be fixed. I am now running stock 1977 kernel, will wait and see if it leaks.
I'm now running that version too. I guess I'll have an idea of how good it is in about 9 days time. Of course there is the possibility that the memory leak was triggered by some sort of DOS attack in which case the attacker might just target someone else's machine and mine might stay working even with a buggy kernel. :(
On Sunday 26 February 2006 19:29, Russell Coker wrote:
Steve Grubb later said he's found some sort of leak, and it will be fixed. I am now running stock 1977 kernel, will wait and see if it leaks.
I'm now running that version too. I guess I'll have an idea of how good it is in about 9 days time.
Depending on your system you might not make it 9 days. The best way to avoid the memory leak is to disable audit (auditctl -e 0)...especially if its a remote computer. oom killer might zap sshd.
Of course there is the possibility that the memory leak was triggered by some sort of DOS attack in which case the attacker might just target someone else's machine and mine might stay working even with a buggy kernel. :(
No, it was 2 mallocs not being freed on some syscalls. lspp.10 kernel is building right now and I'll update the yum repo tomorrow am. I might put the kernel at http://people.redhat.com/sgrubb/files/lspp in a few minutes where it can be retrieved and installed manually.
-Steve
No, it was 2 mallocs not being freed on some syscalls. lspp.10 kernel is building right now and I'll update the yum repo tomorrow am. I might put the kernel at http://people.redhat.com/sgrubb/files/lspp in a few minutes where it can be retrieved and installed manually.
Allright, I think the stock kernel (+madwifi/nvidia) does not appear to leak memory, although Mono does (or whatever mono is running)... it eats 20% cpu right when I need them most, and then memory usage slowly climbs to 880 MB used over a few days. Killing mono fixes the problem. I'm not very happy with the resources it takes to run mono...esp since I don't really use f-spot or beagle [ people are way too disorganized, why can't they just put their files so they can find them? ]
I will re-verify the leak, and check what mono is running exactly...then file a bug. Will test lspp 10 kernel aftwards to see if that leak is fixed.
On Thursday 02 March 2006 12:52, Ivan Gyurdiev ivg2@cornell.edu wrote:
No, it was 2 mallocs not being freed on some syscalls. lspp.10 kernel is building right now and I'll update the yum repo tomorrow am. I might put the kernel at http://people.redhat.com/sgrubb/files/lspp in a few minutes where it can be retrieved and installed manually.
Allright, I think the stock kernel (+madwifi/nvidia) does not appear to leak memory, although Mono does (or whatever mono is running)... it eats
Below is the output of running uptime immediately after free on two occasions. The buffers, cached, and free memory numbers have decreased. Does this indicate a leak? The machine is totally idle, I have had a single ssh session open for all this time, no-one else has logged in, and it's not being used for any server tasks apart from light DNS serving.
[root@othello ~]# free total used free shared buffers cached Mem: 255196 250824 4372 0 48520 17900 -/+ buffers/cache: 184404 70792 Swap: 1048568 120 1048448 You have new mail in /var/spool/mail/root [root@othello ~]# uptime 18:45:13 up 2 days, 21:44, 1 user, load average: 0.07, 0.03, 0.01 [root@othello ~]# free total used free shared buffers cached Mem: 255196 251544 3652 0 8852 13836 -/+ buffers/cache: 228856 26340 Swap: 1048568 120 1048448 You have new mail in /var/spool/mail/root [root@othello ~]# uptime 09:12:44 up 3 days, 12:12, 1 user, load average: 0.18, 0.11, 0.03 [root@othello ~]# uname -a Linux othello 2.6.15-1.1986.2.1_FC5.lspp.10 #1 Sun Feb 26 19:07:03 EST 2006 i686 i686 i386 GNU/Linux [root@othello ~]#
On Friday 03 March 2006 09:57, Russell Coker russell@coker.com.au wrote:
Below is the output of running uptime immediately after free on two occasions. The buffers, cached, and free memory numbers have decreased. Does this indicate a leak? The machine is totally idle, I have had a single ssh session open for all this time, no-one else has logged in, and it's not being used for any server tasks apart from light DNS serving.
My machine in question has just run out of memory and crashed. It had been operating OK until I tried to copy a set of Fedora ISO files to it's NFS share. I conclude that the bugs which Steve fixed were not the ones that afflict my machine.
I'm now upgrading it to 2.6.15-1.2009.4.2_FC5, I'll see if that fixes it.
On Sunday 05 March 2006 13:58, Russell Coker russell@coker.com.au wrote:
On Friday 03 March 2006 09:57, Russell Coker russell@coker.com.au wrote:
Below is the output of running uptime immediately after free on two occasions. The buffers, cached, and free memory numbers have decreased. Does this indicate a leak? The machine is totally idle, I have had a single ssh session open for all this time, no-one else has logged in, and it's not being used for any server tasks apart from light DNS serving.
My machine in question has just run out of memory and crashed. It had been operating OK until I tried to copy a set of Fedora ISO files to it's NFS share. I conclude that the bugs which Steve fixed were not the ones that afflict my machine.
I'm now upgrading it to 2.6.15-1.2009.4.2_FC5, I'll see if that fixes it.
The machine in question has been running 2.6.15-1.2032.2.3_FC5.lspp.12 for over 8 days now with no sign of a memory leak (and incidentally the load on the machine has increased). I believe that 2.6.15-1.2032.2.3_FC5.lspp.12 has fixed the memory leak problems I experienced.