OOM killer on bapp02

Pierre-Yves Chibon pingou at pingoured.fr
Tue Dec 9 16:17:12 UTC 2014


On Tue, 2014-12-09 at 09:42 +0100, Adrian Reber wrote:
> On Tue, Nov 25, 2014 at 07:24:32AM -0700, Kevin Fenzi wrote:
> > On Tue, 25 Nov 2014 10:27:19 +0100
> > Adrian Reber <adrian at lisas.de> wrote:
> > > The OOM killer on bapp02 has terminated a few mirrormanager crawler
> > > processes. It seems it needs more memory or the number of parallel
> > > crawlers has to be further limited.
> > 
> > Well, it's got 16GB now... I can bump it to 24 without too much
> > trouble. Will of course need a freeze break... 
> > 
> > We are currently doing 60 threads. We could cut it down, but I guess
> > I'd say lets try more memory first. 
> 
> As dmesg on bapp02 has no timestamps it is hard to tell when the last
> crawler was terminated because of OOM. Looking at different log files it
> seems, however, that some crawler processes are terminated without
> finishing correctly. Especially mirrors which take a long time to crawl
> are not examined completely. So maybe it would be a good thing to
> decrease the number of parallel crawls to avoid OOM situations.

I have just managed to reproduce it right now while I was running the
umdl script, maybe it didn't like having both running at the same time?


Pierre


More information about the infrastructure mailing list