[PATCH] haproxy & mirrorlist processes

Tue May 11 21:13:09 UTC 2010

On Tue, May 11, 2010 at 02:48:23PM -0500, Matt Domsch wrote:
> The mirrorlists are falling over - haproxy keeps marking app servers
> as down, and some requests are getting HTTP 503 Server Temporarily
> Unavailable responses.  This happens every 10 minutes, for 2-3
> minutes, as several thousand EC3 instances request the mirrorlist
> again.
> 
> For reference, we're seeing a spike of over 2000 simultaneous requests
> across our 6 proxy and 4 app servers, occuring every 10 minutes,
> dropping back down to under 20 simultaneous requests inbetween.
> 
> Trying out several things.
> 
> 1) increase number of mirrorlist WSGI processes on each app server
>    from 45 to 100.  This is the maximum number of simultaneous
>    mirrorlist requests that each server can serve.  I've tried this
>    value on app01, and running this many still keeps the
>    mirrorlist_server back end (which fork()s on each connection)
>    humming right along.  I think this is safe.  Increasing much beyond
>    this though, the app servers will start to swap, which we must
>    avoid.  We can watch the swapping, and if it starts, lower this
>    value somewhat.  The value was 6 just a few days ago, which wasn't
>    working either.
> 
>    This gives us 400 slots to work with on the app servers.
> 
This seems okay as a temporary measure but we won't want this as a permanent
fix unless we can get more RAM for the app servers or separate app servers
for the mirrorlist processes.

The reason is that running close to swap means that we don't have room to
grow the other services if they need it, increase the mirrorlist processes
if we need even more slots, or add new services.

+1

> 2) try limiting the number of connections from each proxy server to
>    each app server, to 25 per.  Right now we're seeing a max of
>    between 60 and 135 simultaneous requests from each proxy server to
>    each app server.  All those over 25 will get queued by haproxy and
>    then served as app server instances become available.  I did this
>    on proxy03, and it really helped out the app servers and kept them
>    humming.  There were still some longish response times (some >30
>    seconds).
> 
>    We're still oversubscribing app server slots here though, but
>    oddly, not by as much as you'd think, as proxy03 is taking 40% of
>    the incoming requests itself for some reason.
> 
This does seem like a good thing to try and then decide if we want it
permanently.

+1

> 3) bump the haproxy timeout up to 60 seconds.  5 seconds (the global
>    default) is way too low when we get the spikes.  This was causing
>    haproxy to think app servers were down, and start sending load to
>    the other app servers, which would then overload, and then start
>    sending to the first backup server, ...  Let's be nicer.  If during
>    a spike it takes 60 seconds to get an answer, or be told HTTP 503,
>    so be it.
>
60 seconds seems a bit long when something does happen to a single
server that should take it out of rotation for a bit.  We aren't likely to
purposefully be doing things that take down app server during change freeze
but it's probably not a good idea to be quite this high in the long run.
Something to do for now but tweak some after the release?

+1

> 4) have haproxy use all the backup servers when all the app servers
>    are marked down.  Right now it sends all the requests to a single
>    backup server, and if that's down, all to the next backup server,
>    etc.  We know one server can't handle the load (even 4 aren't
>    really), so don't overload a single backup either.
> 
+1

> 5) the default mirrorlist_server listen backlog is only 5, meaning
>    that at most 5 WSGI clients get queued up if all the children are
>    busy.  To handle spikes, bump that to 300 (though it's limited by
>    the kernel to 128 by default).  This was the intent, but the code was buggy.
> 
+1

> 6) bug fix to mirrorlist_server to not ignore SIGCHLD.  Amazing this
>    ever worked in the first place.  This should resolve the problem
>    where mirrorlist_server slows down and memory grows over time.
> 
+1

> 
> diff --git a/modules/haproxy/files/haproxy.cfg b/modules/haproxy/files/haproxy.cfg
> index 6e538ed..5a6fda0 100644
> --- a/modules/haproxy/files/haproxy.cfg
> +++ b/modules/haproxy/files/haproxy.cfg
> @@ -43,15 +43,17 @@ listen  fp-wiki 0.0.0.0:10001
>  
>  listen  mirror-lists 0.0.0.0:10002
>      balance hdr(appserver)
> -    server  app1 app1:80 check inter 5s rise 2 fall 3
> -    server  app2 app2:80 check inter 5s rise 2 fall 3
> -    server  app3 app3:80 check inter 5s rise 2 fall 3
> -    server  app4 app4:80 check inter 5s rise 2 fall 3
> -    server  app5 app5:80 backup check inter 10s rise 2 fall 3
> -    server  app6 app6:80 backup check inter 10s rise 2 fall 3
> -    server  app7 app7:80 check inter 5s rise 2 fall 3
> -    server  bapp1 bapp1:80 backup check inter 5s rise 2 fall 3
> +    timeout connect 60s
> +    server  app1 app1:80 check inter 5s rise 2 fall 3 maxconn 25
> +    server  app2 app2:80 check inter 5s rise 2 fall 3 maxconn 25
> +    server  app3 app3:80 check inter 5s rise 2 fall 3 maxconn 25
> +    server  app4 app4:80 check inter 5s rise 2 fall 3 maxconn 25
> +    server  app5 app5:80 backup check inter 10s rise 2 fall 3 maxconn 25
> +    server  app6 app6:80 backup check inter 10s rise 2 fall 3 maxconn 25
> +    server  app7 app7:80 check inter 5s rise 2 fall 3 maxconn 25
> +    server  bapp1 bapp1:80 backup check inter 5s rise 2 fall 3 maxconn 25
>      option  httpchk GET /mirrorlist
> +    option  allbackups
>   
>  listen  pkgdb 0.0.0.0:10003
>      balance hdr(appserver)
> diff --git a/modules/mirrormanager/files/mirrorlist-server.conf b/modules/mirrormanager/files/mirrorlist-server.conf
> index fd7cf98..482f7af 100644
> --- a/modules/mirrormanager/files/mirrorlist-server.conf
> +++ b/modules/mirrormanager/files/mirrorlist-server.conf
> @@ -7,7 +7,7 @@ Alias /publiclist /var/lib/mirrormanager/mirrorlists/publiclist/
>          ExpiresDefault "modification plus 1 hour"
>  </Directory>
>  
> -WSGIDaemonProcess mirrorlist user=apache processes=45 threads=1 display-name=mirrorlist maximum-requests=1000
> +WSGIDaemonProcess mirrorlist user=apache processes=100 threads=1 display-name=mirrorlist maximum-requests=1000
>  
>  WSGIScriptAlias /metalink /usr/share/mirrormanager/mirrorlist-server/mirrorlist_client.wsgi
>  WSGIScriptAlias /mirrorlist /usr/share/mirrormanager/mirrorlist-server/mirrorlist_client.wsgi
> 
> 
> >From 45d401446bfecba768fdf4f26409bf291172f7bc Mon Sep 17 00:00:00 2001
> From: Matt Domsch <Matt_Domsch at dell.com>
> Date: Mon, 10 May 2010 15:23:57 -0500
> Subject: [PATCH 1/2] mirrorlist_server: set request_queue_size earlier
> 
> While the docs say that request_queue_size can be a per-instance
> value, in reality it's used during ForkingUnixStreamServer __init__,
> meaning it needs to override the default class attribute instead.
> 
> Moving this up means that connections aren't blocking after about 5
> are already running (default), and mirrorlist_client can now connect
> in ~200us like one would expect, rather than seconds or tens of
> seconds like we were seeing when lots (say, 40+) clients were
> connecting simultaneously.
> ---
>  mirrorlist-server/mirrorlist_server.py |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/mirrorlist-server/mirrorlist_server.py b/mirrorlist-server/mirrorlist_server.py
> index 8825a1a..2ade357 100755
> --- a/mirrorlist-server/mirrorlist_server.py
> +++ b/mirrorlist-server/mirrorlist_server.py
> @@ -725,6 +725,7 @@ def sighup_handler(signum, frame):
>      signal.signal(signal.SIGHUP, sighup_handler)
>  
>  class ForkingUnixStreamServer(ForkingMixIn, UnixStreamServer):
> +    request_queue_size = 300
>      def finish_request(self, request, client_address):
>          signal.signal(signal.SIGHUP, signal.SIG_IGN)
>          BaseServer.finish_request(self, request, client_address)
> @@ -815,7 +816,6 @@ def main():
>      signal.signal(signal.SIGHUP, sighup_handler)
>      signal.signal(signal.SIGCHLD, signal.SIG_IGN)
>      ss = ForkingUnixStreamServer(socketfile, MirrorlistHandler)
> -    ss.request_queue_size = 300
>      ss.serve_forever()
>  
>      try:
> -- 
> 1.7.0.1
> 
> 
> >From d82f20b10c755e5ce40d67ca7ea4a6dba9e37d34 Mon Sep 17 00:00:00 2001
> From: Matt Domsch <Matt_Domsch at dell.com>
> Date: Mon, 10 May 2010 23:56:09 -0500
> Subject: [PATCH 2/2] mirrorlist_server: don't ignore SIGCHLD
> 
> Amazing that this ever worked in the first place.  Ignoring SIGCHLD
> causes the parent's active_children list to grow without bound.  This
> is also probably the cause of our long-term memory size growth.  The
> parent really needs to catch SIGCHLD in order to do its reaping.
> ---
>  mirrorlist-server/mirrorlist_server.py |    1 -
>  1 files changed, 0 insertions(+), 1 deletions(-)
> 
> diff --git a/mirrorlist-server/mirrorlist_server.py b/mirrorlist-server/mirrorlist_server.py
> index 2ade357..0de7132 100755
> --- a/mirrorlist-server/mirrorlist_server.py
> +++ b/mirrorlist-server/mirrorlist_server.py
> @@ -814,7 +814,6 @@ def main():
>      open_geoip_databases()
>      read_caches()
>      signal.signal(signal.SIGHUP, sighup_handler)
> -    signal.signal(signal.SIGCHLD, signal.SIG_IGN)
>      ss = ForkingUnixStreamServer(socketfile, MirrorlistHandler)
>      ss.serve_forever()
>  
+1 to implementation

-Toshio
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
Url : http://lists.fedoraproject.org/pipermail/infrastructure/attachments/20100511/4198188c/attachment.bin