[PATCH] haproxy & mirrorlist processes

Wed May 12 01:35:35 UTC 2010

On Tue, 11 May 2010, Toshio Kuratomi wrote:

> On Tue, May 11, 2010 at 02:48:23PM -0500, Matt Domsch wrote:
> > The mirrorlists are falling over - haproxy keeps marking app servers
> > as down, and some requests are getting HTTP 503 Server Temporarily
> > Unavailable responses.  This happens every 10 minutes, for 2-3
> > minutes, as several thousand EC3 instances request the mirrorlist
> > again.
> >
> > For reference, we're seeing a spike of over 2000 simultaneous requests
> > across our 6 proxy and 4 app servers, occuring every 10 minutes,
> > dropping back down to under 20 simultaneous requests inbetween.
> >
> > Trying out several things.
> >
> > 1) increase number of mirrorlist WSGI processes on each app server
> >    from 45 to 100.  This is the maximum number of simultaneous
> >    mirrorlist requests that each server can serve.  I've tried this
> >    value on app01, and running this many still keeps the
> >    mirrorlist_server back end (which fork()s on each connection)
> >    humming right along.  I think this is safe.  Increasing much beyond
> >    this though, the app servers will start to swap, which we must
> >    avoid.  We can watch the swapping, and if it starts, lower this
> >    value somewhat.  The value was 6 just a few days ago, which wasn't
> >    working either.
> >
> >    This gives us 400 slots to work with on the app servers.
> >
> This seems okay as a temporary measure but we won't want this as a permanent
> fix unless we can get more RAM for the app servers or separate app servers
> for the mirrorlist processes.
>
> The reason is that running close to swap means that we don't have room to
> grow the other services if they need it, increase the mirrorlist processes
> if we need even more slots, or add new services.
>
> +1
>
>
> > 2) try limiting the number of connections from each proxy server to
> >    each app server, to 25 per.  Right now we're seeing a max of
> >    between 60 and 135 simultaneous requests from each proxy server to
> >    each app server.  All those over 25 will get queued by haproxy and
> >    then served as app server instances become available.  I did this
> >    on proxy03, and it really helped out the app servers and kept them
> >    humming.  There were still some longish response times (some >30
> >    seconds).
> >
> >    We're still oversubscribing app server slots here though, but
> >    oddly, not by as much as you'd think, as proxy03 is taking 40% of
> >    the incoming requests itself for some reason.
> >
> This does seem like a good thing to try and then decide if we want it
> permanently.
>
> +1
>
> > 3) bump the haproxy timeout up to 60 seconds.  5 seconds (the global
> >    default) is way too low when we get the spikes.  This was causing
> >    haproxy to think app servers were down, and start sending load to
> >    the other app servers, which would then overload, and then start
> >    sending to the first backup server, ...  Let's be nicer.  If during
> >    a spike it takes 60 seconds to get an answer, or be told HTTP 503,
> >    so be it.
> >
> 60 seconds seems a bit long when something does happen to a single
> server that should take it out of rotation for a bit.  We aren't likely to
> purposefully be doing things that take down app server during change freeze
> but it's probably not a good idea to be quite this high in the long run.
> Something to do for now but tweak some after the release?
>
> +1
>
> > 4) have haproxy use all the backup servers when all the app servers
> >    are marked down.  Right now it sends all the requests to a single
> >    backup server, and if that's down, all to the next backup server,
> >    etc.  We know one server can't handle the load (even 4 aren't
> >    really), so don't overload a single backup either.
> >
> +1
>
> > 5) the default mirrorlist_server listen backlog is only 5, meaning
> >    that at most 5 WSGI clients get queued up if all the children are
> >    busy.  To handle spikes, bump that to 300 (though it's limited by
> >    the kernel to 128 by default).  This was the intent, but the code was buggy.
> >
> +1
>
> > 6) bug fix to mirrorlist_server to not ignore SIGCHLD.  Amazing this
> >    ever worked in the first place.  This should resolve the problem
> >    where mirrorlist_server slows down and memory grows over time.
> >
> +1
>
> >
> > diff --git a/modules/haproxy/files/haproxy.cfg b/modules/haproxy/files/haproxy.cfg
> > index 6e538ed..5a6fda0 100644
> > --- a/modules/haproxy/files/haproxy.cfg
> > +++ b/modules/haproxy/files/haproxy.cfg
> > @@ -43,15 +43,17 @@ listen  fp-wiki 0.0.0.0:10001
> >
> >  listen  mirror-lists 0.0.0.0:10002
> >      balance hdr(appserver)
> > -    server  app1 app1:80 check inter 5s rise 2 fall 3
> > -    server  app2 app2:80 check inter 5s rise 2 fall 3
> > -    server  app3 app3:80 check inter 5s rise 2 fall 3
> > -    server  app4 app4:80 check inter 5s rise 2 fall 3
> > -    server  app5 app5:80 backup check inter 10s rise 2 fall 3
> > -    server  app6 app6:80 backup check inter 10s rise 2 fall 3
> > -    server  app7 app7:80 check inter 5s rise 2 fall 3
> > -    server  bapp1 bapp1:80 backup check inter 5s rise 2 fall 3
> > +    timeout connect 60s
> > +    server  app1 app1:80 check inter 5s rise 2 fall 3 maxconn 25
> > +    server  app2 app2:80 check inter 5s rise 2 fall 3 maxconn 25
> > +    server  app3 app3:80 check inter 5s rise 2 fall 3 maxconn 25
> > +    server  app4 app4:80 check inter 5s rise 2 fall 3 maxconn 25
> > +    server  app5 app5:80 backup check inter 10s rise 2 fall 3 maxconn 25
> > +    server  app6 app6:80 backup check inter 10s rise 2 fall 3 maxconn 25
> > +    server  app7 app7:80 check inter 5s rise 2 fall 3 maxconn 25
> > +    server  bapp1 bapp1:80 backup check inter 5s rise 2 fall 3 maxconn 25
> >      option  httpchk GET /mirrorlist
> > +    option  allbackups
> >
> >  listen  pkgdb 0.0.0.0:10003
> >      balance hdr(appserver)
> > diff --git a/modules/mirrormanager/files/mirrorlist-server.conf b/modules/mirrormanager/files/mirrorlist-server.conf
> > index fd7cf98..482f7af 100644
> > --- a/modules/mirrormanager/files/mirrorlist-server.conf
> > +++ b/modules/mirrormanager/files/mirrorlist-server.conf
> > @@ -7,7 +7,7 @@ Alias /publiclist /var/lib/mirrormanager/mirrorlists/publiclist/
> >          ExpiresDefault "modification plus 1 hour"
> >  </Directory>
> >
> > -WSGIDaemonProcess mirrorlist user=apache processes=45 threads=1 display-name=mirrorlist maximum-requests=1000
> > +WSGIDaemonProcess mirrorlist user=apache processes=100 threads=1 display-name=mirrorlist maximum-requests=1000
> >
> >  WSGIScriptAlias /metalink /usr/share/mirrormanager/mirrorlist-server/mirrorlist_client.wsgi
> >  WSGIScriptAlias /mirrorlist /usr/share/mirrormanager/mirrorlist-server/mirrorlist_client.wsgi
> >
> >
> > >From 45d401446bfecba768fdf4f26409bf291172f7bc Mon Sep 17 00:00:00 2001
> > From: Matt Domsch <Matt_Domsch at dell.com>
> > Date: Mon, 10 May 2010 15:23:57 -0500
> > Subject: [PATCH 1/2] mirrorlist_server: set request_queue_size earlier
> >
> > While the docs say that request_queue_size can be a per-instance
> > value, in reality it's used during ForkingUnixStreamServer __init__,
> > meaning it needs to override the default class attribute instead.
> >
> > Moving this up means that connections aren't blocking after about 5
> > are already running (default), and mirrorlist_client can now connect
> > in ~200us like one would expect, rather than seconds or tens of
> > seconds like we were seeing when lots (say, 40+) clients were
> > connecting simultaneously.
> > ---
> >  mirrorlist-server/mirrorlist_server.py |    2 +-
> >  1 files changed, 1 insertions(+), 1 deletions(-)
> >
> > diff --git a/mirrorlist-server/mirrorlist_server.py b/mirrorlist-server/mirrorlist_server.py
> > index 8825a1a..2ade357 100755
> > --- a/mirrorlist-server/mirrorlist_server.py
> > +++ b/mirrorlist-server/mirrorlist_server.py
> > @@ -725,6 +725,7 @@ def sighup_handler(signum, frame):
> >      signal.signal(signal.SIGHUP, sighup_handler)
> >
> >  class ForkingUnixStreamServer(ForkingMixIn, UnixStreamServer):
> > +    request_queue_size = 300
> >      def finish_request(self, request, client_address):
> >          signal.signal(signal.SIGHUP, signal.SIG_IGN)
> >          BaseServer.finish_request(self, request, client_address)
> > @@ -815,7 +816,6 @@ def main():
> >      signal.signal(signal.SIGHUP, sighup_handler)
> >      signal.signal(signal.SIGCHLD, signal.SIG_IGN)
> >      ss = ForkingUnixStreamServer(socketfile, MirrorlistHandler)
> > -    ss.request_queue_size = 300
> >      ss.serve_forever()
> >
> >      try:
> > --
> > 1.7.0.1
> >
> >
> > >From d82f20b10c755e5ce40d67ca7ea4a6dba9e37d34 Mon Sep 17 00:00:00 2001
> > From: Matt Domsch <Matt_Domsch at dell.com>
> > Date: Mon, 10 May 2010 23:56:09 -0500
> > Subject: [PATCH 2/2] mirrorlist_server: don't ignore SIGCHLD
> >
> > Amazing that this ever worked in the first place.  Ignoring SIGCHLD
> > causes the parent's active_children list to grow without bound.  This
> > is also probably the cause of our long-term memory size growth.  The
> > parent really needs to catch SIGCHLD in order to do its reaping.
> > ---
> >  mirrorlist-server/mirrorlist_server.py |    1 -
> >  1 files changed, 0 insertions(+), 1 deletions(-)
> >
> > diff --git a/mirrorlist-server/mirrorlist_server.py b/mirrorlist-server/mirrorlist_server.py
> > index 2ade357..0de7132 100755
> > --- a/mirrorlist-server/mirrorlist_server.py
> > +++ b/mirrorlist-server/mirrorlist_server.py
> > @@ -814,7 +814,6 @@ def main():
> >      open_geoip_databases()
> >      read_caches()
> >      signal.signal(signal.SIGHUP, sighup_handler)
> > -    signal.signal(signal.SIGCHLD, signal.SIG_IGN)
> >      ss = ForkingUnixStreamServer(socketfile, MirrorlistHandler)
> >      ss.serve_forever()
> >
> +1 to implementation
>

+1 to all of these.

	-Mike