May 2010 - infrastructure - Fedora Mailing-Lists

by Stephen John Smoogen

Looking over the configurations of awstats on log01, we copy the data over once per day from each of the servers and then run awstats on the server. However we also seem to run the program against that once a day data every hour. Am I seeing correctly or is there something missing? I am just tyring to figure out why the box is always running bzcat and awstats :)? -- Stephen J Smoogen. “The core skill of innovators is error recovery, not failure avoidance.” Randy Nelson, President of Pixar University. "We have a strategic plan. It's called doing things."" — Herb Kelleher, founder Southwest Airlines

13 years, 11 months

2
1
0 / 0

Change Request: DNS arm.koji.fp.o

by Dennis Gilmore

the arm koji hub is moving to seneca college id like to get the dns change for its new ip in diff --git a/modules/bind/files/master/fedoraproject.org b/modules/bind/files/master/fedoraproject.org index 93d9948..0f43e41 100644 --- a/modules/bind/files/master/fedoraproject.org +++ b/modules/bind/files/master/fedoraproject.org @@ -1,6 +1,6 @@ $TTL 3600 @ IN SOA ns1.fedoraproject.org. hostmaster.fedoraproject.org. ( - 2010050301 ; Serial + 2010051300 ; Serial 8H ; refresh 2H ; retry 4W ; expire @@ -48,7 +48,7 @@ sparc.koji IN AAAA 2001:4978:118:0:5652:ff:fe33:6b8a s390.koji IN A 209.132.181.18 s390pkgs IN A 209.132.181.18 ; arm koji instance -arm.koji IN A 80.101.37.227 +arm.koji IN A 142.204.133.14 ; ppc koji instance ppc.koji IN A 209.132.178.126

13 years, 11 months

3
2
0 / 0

change request, create maps for ppc64

by Dennis Gilmore

I want to apply the following patch to make ppc64 a valid arch. I noticed that there were no el6 maps for ppc64 el-6 we are shipping ppc64, i386 and x86_64 this is because RHEL switched the base userland from 32 bit to 64 bit Dennis diff --git a/modules/maps/files/parse9.pl b/modules/maps/files/parse9.pl index 5d35e01..2cfd65d 100755 --- a/modules/maps/files/parse9.pl +++ b/modules/maps/files/parse9.pl @@ -47,6 +47,8 @@ sub valid_arch { return 1; } elsif("$arch" eq "ppc") { return 1; + } elsif("$arch" eq "ppc64") { + return 1; } elsif("$arch" eq "ia64") { return 1; } elsif("$arch" eq "sparc") {

13 years, 11 months

4
4
0 / 0

Change Request: Docs sync removal

by Mike McGrath

There's a cron job running on the app servers that shouldn't be and it's eating up lots of io time. The docsSync script should be running only on bapp1, it looks like it made it on to the rest of the app servers at some point and was removed in puppet but not removed from the servers. Can I get 2 +1's to remove this cron job? -Mike

13 years, 11 months

3
2
0 / 0

[PATCH] avoid \xZZ in mirrorlist urls

by Matt Domsch

We are getting some mirrorlist requests with escape characters in them such as \xe2 . While I've taken steps to deal with these in the mirrorlist code, at least one client makes such a request hourly, and they are causing the mirrorlist WSGI process to spin. I can't recreate the failure, even using the same request URI, and the fixes I've tried haven't avoided them all. I'd like to block such requests at the proxy, to prevent them from making it all the way to MM. It's a hack, but I'm at a loss for another solution right now. diff --git a/modules/mirrormanager/templates/mirrormanager-mirrorlist.conf.erb b/modules/mirrormanager/templates/mirrormanager-mirrorlist.conf.erb index e52c926..95792fe 100644 --- a/modules/mirrormanager/templates/mirrormanager-mirrorlist.conf.erb +++ b/modules/mirrormanager/templates/mirrormanager-mirrorlist.conf.erb @@ -17,6 +17,10 @@ RewriteEngine On RewriteCond %{QUERY_STRING} repo=epel-5&arch=\$basea\$ RewriteRule ^/mirrorlist - [F] # END hack +# BEGIN hack for escaped chars +RewriteCond %{QUERY_STRING} \\x +RewriteRule ^/(mirrorlist|metalink) - [F] +# END hack RewriteRule ^/publiclist(.*) <%= proxyurl %>/publiclist/$1 [P,L] RewriteRule ^/mirrorlist(.*) <%= proxyurl %>/mirrorlist$1 [P,L] RewriteRule ^/metalink(.*) <%= proxyurl %>/metalink$1 [P,L] -- Matt Domsch Technology Strategist Dell | Office of the CTO

13 years, 11 months

5
7
0 / 0

Change request - remove email2trac from hosted

by Jesse Keating

Currently the only consumers of email2trac setup were pungi and rel-eng. At one time I had somehow disabled email2trac for rel-eng due to the spam, although I can't remember how I disabled it. Unfortunately something changed in the past week or two that re-enabled it and a bunch more spam came through. I'd like to just remove email2trac from our hosted environment all together. -- Jesse Keating Fedora -- Freedom² is a feature! identi.ca: http://identi.ca/jkeating

13 years, 11 months

3
3
0 / 0

[PATCH] mirrorlist_client.wsgi use select() waiting for server

by Matt Domsch

>From 7cd05b296ab426c386e99c3ff6f7143fbf6ed052 Mon Sep 17 00:00:00 2001 From: Matt Domsch <Matt_Domsch(a)dell.com> Date: Wed, 12 May 2010 08:59:16 -0500 Subject: [PATCH] mirrorlist_client: use select() waiting on the response from mirrorlist_server Client was spinning waiting for read() to complete, during the time the server was doing its thinking. Instead, use select() to sleep until the server has data to return. This should reduce CPU time spent in the client considerably. --- mirrorlist-server/mirrorlist_client.wsgi | 15 ++++++++++----- 1 files changed, 10 insertions(+), 5 deletions(-) diff --git a/mirrorlist-server/mirrorlist_client.wsgi b/mirrorlist-server/mirrorlist_client.wsgi index cc4416c..15b3a15 100755 --- a/mirrorlist-server/mirrorlist_client.wsgi +++ b/mirrorlist-server/mirrorlist_client.wsgi @@ -4,7 +4,7 @@ # by Matt Domsch <Matt_Domsch(a)dell.com> # Licensed under the MIT/X11 license -import socket +import socket, select import cPickle as pickle from string import zfill, atoi, strip, replace from paste.wsgiwrappers import * @@ -32,24 +32,29 @@ def get_mirrorlist(d): s.shutdown(socket.SHUT_WR) del p + # wait for other end to start writing + expiry = datetime.utcnow() + timedelta(seconds=request_timeout) + rlist, wlist, xlist = select.select([s],[],[],request_timeout) + if len(rlist) == 0: + s.shutdown(socket.SHUT_RD) + raise socket.timeout + readlen = 0 resultsize = '' while readlen < 10: resultsize += s.recv(10 - readlen) readlen = len(resultsize) resultsize = atoi(resultsize) - - expiry = datetime.utcnow() + timedelta(seconds=request_timeout) + readlen = 0 p = '' while readlen < resultsize and datetime.utcnow() < expiry: p += s.recv(resultsize - readlen) readlen = len(p) - - s.shutdown(socket.SHUT_RD) results = pickle.loads(p) del p + s.shutdown(socket.SHUT_RD) return results def real_client_ip(xforwardedfor): -- 1.7.0.1 -- Matt Domsch Technology Strategist Dell | Office of the CTO

13 years, 11 months

3
2
0 / 0

Self-Introduction: ruigo

by Rui Gouveia

Hi, my name is Rui Gouveia and I live in Porto/Portugal. My Fedora Account System (FAS) username is ruigo, and my IRC nick is ruigo. I'm one of the co-coordinators of the Fedora translation team for pt_PT since Fedora 10. Professionally, I'm a sysadmin since 2002, skills that I would hope to utilize in the benefit of the Fedora Project. A couple of goals I have for the Fedora Project are increase the user base in this city and recruit help for Fedora projects. For now, I'll be glad just to seat back and learn how you guys work. Please help me get started! Thanks for your time. -- Rui Gouveia -- O Software Livre não é apenas software. É também uma filosofia de vida. Aprenda mais sobre este assunto em http://www.gnu.org/philosophy/ e liberte-se. http://www.google.com/reader/shared/05174907382601741850

13 years, 11 months

1
0
0 / 0

[PATCH] mirrorlist_client timeouts

by Matt Domsch

>From 1aa19dfed950c209ad5a2ddf48e2b828b50c07ee Mon Sep 17 00:00:00 2001 From: Matt Domsch <Matt_Domsch(a)dell.com> Date: Wed, 12 May 2010 13:52:51 -0500 Subject: [PATCH] mirrorlist_client: a better way to handle socket timeouts blocking sockets, calling recv(), may block forever if the server end doesn't send anything for some reason. Don't let that happen. Python has a socket.settimeout() capability. We'll use that to let any individual operation (except the select()) take up to 5 seconds (they should all be in the microsecond range, so this is very generous), and let select() continue to wait for 60 seconds for the server to respond at all. If a timeout happens, an exception is raised, which is caught by the caller and a HTTP 503 returned to the web client. --- mirrorlist-server/mirrorlist_client.wsgi | 17 ++++++++--------- 1 files changed, 8 insertions(+), 9 deletions(-) diff --git a/mirrorlist-server/mirrorlist_client.wsgi b/mirrorlist-server/mirrorlist_client.wsgi index 15b3a15..3508f19 100755 --- a/mirrorlist-server/mirrorlist_client.wsgi +++ b/mirrorlist-server/mirrorlist_client.wsgi @@ -13,14 +13,14 @@ import cStringIO from datetime import datetime, timedelta socketfile = '/var/run/mirrormanager/mirrorlist_server.sock' -request_timeout = 60 # seconds +select_timeout = 60 # seconds +timeout = 5 # seconds def get_mirrorlist(d): - try: - s = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM) - s.connect(socketfile) - except: - raise + # any exceptions or timeouts raised here get handled by the caller + s = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM) + s.settimeout(timeout) + s.connect(socketfile) p = pickle.dumps(d) del d @@ -33,8 +33,7 @@ def get_mirrorlist(d): del p # wait for other end to start writing - expiry = datetime.utcnow() + timedelta(seconds=request_timeout) - rlist, wlist, xlist = select.select([s],[],[],request_timeout) + rlist, wlist, xlist = select.select([s],[],[],select_timeout) if len(rlist) == 0: s.shutdown(socket.SHUT_RD) raise socket.timeout @@ -48,7 +47,7 @@ def get_mirrorlist(d): readlen = 0 p = '' - while readlen < resultsize and datetime.utcnow() < expiry: + while readlen < resultsize: p += s.recv(resultsize - readlen) readlen = len(p) results = pickle.loads(p) -- 1.7.0.1 -- Matt Domsch Technology Strategist Dell | Office of the CTO

13 years, 11 months

2
1
0 / 0

[PATCH] haproxy & mirrorlist processes

by Matt Domsch

The mirrorlists are falling over - haproxy keeps marking app servers as down, and some requests are getting HTTP 503 Server Temporarily Unavailable responses. This happens every 10 minutes, for 2-3 minutes, as several thousand EC3 instances request the mirrorlist again. For reference, we're seeing a spike of over 2000 simultaneous requests across our 6 proxy and 4 app servers, occuring every 10 minutes, dropping back down to under 20 simultaneous requests inbetween. Trying out several things. 1) increase number of mirrorlist WSGI processes on each app server from 45 to 100. This is the maximum number of simultaneous mirrorlist requests that each server can serve. I've tried this value on app01, and running this many still keeps the mirrorlist_server back end (which fork()s on each connection) humming right along. I think this is safe. Increasing much beyond this though, the app servers will start to swap, which we must avoid. We can watch the swapping, and if it starts, lower this value somewhat. The value was 6 just a few days ago, which wasn't working either. This gives us 400 slots to work with on the app servers. 2) try limiting the number of connections from each proxy server to each app server, to 25 per. Right now we're seeing a max of between 60 and 135 simultaneous requests from each proxy server to each app server. All those over 25 will get queued by haproxy and then served as app server instances become available. I did this on proxy03, and it really helped out the app servers and kept them humming. There were still some longish response times (some >30 seconds). We're still oversubscribing app server slots here though, but oddly, not by as much as you'd think, as proxy03 is taking 40% of the incoming requests itself for some reason. 3) bump the haproxy timeout up to 60 seconds. 5 seconds (the global default) is way too low when we get the spikes. This was causing haproxy to think app servers were down, and start sending load to the other app servers, which would then overload, and then start sending to the first backup server, ... Let's be nicer. If during a spike it takes 60 seconds to get an answer, or be told HTTP 503, so be it. 4) have haproxy use all the backup servers when all the app servers are marked down. Right now it sends all the requests to a single backup server, and if that's down, all to the next backup server, etc. We know one server can't handle the load (even 4 aren't really), so don't overload a single backup either. 5) the default mirrorlist_server listen backlog is only 5, meaning that at most 5 WSGI clients get queued up if all the children are busy. To handle spikes, bump that to 300 (though it's limited by the kernel to 128 by default). This was the intent, but the code was buggy. 6) bug fix to mirrorlist_server to not ignore SIGCHLD. Amazing this ever worked in the first place. This should resolve the problem where mirrorlist_server slows down and memory grows over time. diff --git a/modules/haproxy/files/haproxy.cfg b/modules/haproxy/files/haproxy.cfg index 6e538ed..5a6fda0 100644 --- a/modules/haproxy/files/haproxy.cfg +++ b/modules/haproxy/files/haproxy.cfg @@ -43,15 +43,17 @@ listen fp-wiki 0.0.0.0:10001 listen mirror-lists 0.0.0.0:10002 balance hdr(appserver) - server app1 app1:80 check inter 5s rise 2 fall 3 - server app2 app2:80 check inter 5s rise 2 fall 3 - server app3 app3:80 check inter 5s rise 2 fall 3 - server app4 app4:80 check inter 5s rise 2 fall 3 - server app5 app5:80 backup check inter 10s rise 2 fall 3 - server app6 app6:80 backup check inter 10s rise 2 fall 3 - server app7 app7:80 check inter 5s rise 2 fall 3 - server bapp1 bapp1:80 backup check inter 5s rise 2 fall 3 + timeout connect 60s + server app1 app1:80 check inter 5s rise 2 fall 3 maxconn 25 + server app2 app2:80 check inter 5s rise 2 fall 3 maxconn 25 + server app3 app3:80 check inter 5s rise 2 fall 3 maxconn 25 + server app4 app4:80 check inter 5s rise 2 fall 3 maxconn 25 + server app5 app5:80 backup check inter 10s rise 2 fall 3 maxconn 25 + server app6 app6:80 backup check inter 10s rise 2 fall 3 maxconn 25 + server app7 app7:80 check inter 5s rise 2 fall 3 maxconn 25 + server bapp1 bapp1:80 backup check inter 5s rise 2 fall 3 maxconn 25 option httpchk GET /mirrorlist + option allbackups listen pkgdb 0.0.0.0:10003 balance hdr(appserver) diff --git a/modules/mirrormanager/files/mirrorlist-server.conf b/modules/mirrormanager/files/mirrorlist-server.conf index fd7cf98..482f7af 100644 --- a/modules/mirrormanager/files/mirrorlist-server.conf +++ b/modules/mirrormanager/files/mirrorlist-server.conf @@ -7,7 +7,7 @@ Alias /publiclist /var/lib/mirrormanager/mirrorlists/publiclist/ ExpiresDefault "modification plus 1 hour" </Directory> -WSGIDaemonProcess mirrorlist user=apache processes=45 threads=1 display-name=mirrorlist maximum-requests=1000 +WSGIDaemonProcess mirrorlist user=apache processes=100 threads=1 display-name=mirrorlist maximum-requests=1000 WSGIScriptAlias /metalink /usr/share/mirrormanager/mirrorlist-server/mirrorlist_client.wsgi WSGIScriptAlias /mirrorlist /usr/share/mirrormanager/mirrorlist-server/mirrorlist_client.wsgi >From 45d401446bfecba768fdf4f26409bf291172f7bc Mon Sep 17 00:00:00 2001 From: Matt Domsch <Matt_Domsch(a)dell.com> Date: Mon, 10 May 2010 15:23:57 -0500 Subject: [PATCH 1/2] mirrorlist_server: set request_queue_size earlier While the docs say that request_queue_size can be a per-instance value, in reality it's used during ForkingUnixStreamServer __init__, meaning it needs to override the default class attribute instead. Moving this up means that connections aren't blocking after about 5 are already running (default), and mirrorlist_client can now connect in ~200us like one would expect, rather than seconds or tens of seconds like we were seeing when lots (say, 40+) clients were connecting simultaneously. --- mirrorlist-server/mirrorlist_server.py | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/mirrorlist-server/mirrorlist_server.py b/mirrorlist-server/mirrorlist_server.py index 8825a1a..2ade357 100755 --- a/mirrorlist-server/mirrorlist_server.py +++ b/mirrorlist-server/mirrorlist_server.py @@ -725,6 +725,7 @@ def sighup_handler(signum, frame): signal.signal(signal.SIGHUP, sighup_handler) class ForkingUnixStreamServer(ForkingMixIn, UnixStreamServer): + request_queue_size = 300 def finish_request(self, request, client_address): signal.signal(signal.SIGHUP, signal.SIG_IGN) BaseServer.finish_request(self, request, client_address) @@ -815,7 +816,6 @@ def main(): signal.signal(signal.SIGHUP, sighup_handler) signal.signal(signal.SIGCHLD, signal.SIG_IGN) ss = ForkingUnixStreamServer(socketfile, MirrorlistHandler) - ss.request_queue_size = 300 ss.serve_forever() try: -- 1.7.0.1 >From d82f20b10c755e5ce40d67ca7ea4a6dba9e37d34 Mon Sep 17 00:00:00 2001 From: Matt Domsch <Matt_Domsch(a)dell.com> Date: Mon, 10 May 2010 23:56:09 -0500 Subject: [PATCH 2/2] mirrorlist_server: don't ignore SIGCHLD Amazing that this ever worked in the first place. Ignoring SIGCHLD causes the parent's active_children list to grow without bound. This is also probably the cause of our long-term memory size growth. The parent really needs to catch SIGCHLD in order to do its reaping. --- mirrorlist-server/mirrorlist_server.py | 1 - 1 files changed, 0 insertions(+), 1 deletions(-) diff --git a/mirrorlist-server/mirrorlist_server.py b/mirrorlist-server/mirrorlist_server.py index 2ade357..0de7132 100755 --- a/mirrorlist-server/mirrorlist_server.py +++ b/mirrorlist-server/mirrorlist_server.py @@ -814,7 +814,6 @@ def main(): open_geoip_databases() read_caches() signal.signal(signal.SIGHUP, sighup_handler) - signal.signal(signal.SIGCHLD, signal.SIG_IGN) ss = ForkingUnixStreamServer(socketfile, MirrorlistHandler) ss.serve_forever() -- 1.7.0.1 -- Matt Domsch Technology Strategist Dell | Office of the CTO

13 years, 11 months

4
5
0 / 0

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

infrastructure May 2010