On Tue, 20 Mar 2012 21:38:13 -0500
Dennis Gilmore <dennis(a)ausil.us> wrote:
...snip...
probably we would be adding 100-300 systems. not only do we need to
consider overloading of puppet, but also logging and monitoring. I
guess its more how do we scale our infrastructure from at a guess ~100
nodes today to 3 to 4 times that
Yeah.
...snip...
im ok with that, im pretty sure fas will scale to the extra boxes.
do
we drop monitoring of the builders? what about collectd etc.
There's a few things we could do on fas load:
a) add more fas servers.
b) reduce the number of runs. How often do we change someone in
sysadmin-noc, sysadmin-main, sysadmin-build?
c) move to a system where we only re-run fasClient when there is a
change.
I'd agree collectd off probibly. Or at least a seperate one if we
needed to monitor them.
main issue is that today we are not 100% sure of how we will install
arm boxes. how do we deal with all the non puppet related systems?
also need to look into how we can better scale koji itself. when we
go from 20 to 200+ builders we need to make sure that load doesn't
cause koji to fall over.
yeah.
all the arm boxes will have management consoles. but today im not
100%
sure how access to that would be. we would also need to deploy fedora
for any arm based systems. things we need to reconsider also is
networking today the storage network and the builder networks
are /24's so we could use 253 nodes. i suspect we will go over that
on the build network. we could not have the storage network on arm
builders. it is really only needed for createrepo. but we may need to
look at expanding kojipkgs to more nodes. or increase its network
throughput with multiple bonded gig network ports. think mass rebuild
and 100 or 200 buildroots initialising at once. it will stress our
resources on all levels. but the flexibility of so many nodes could
allow us to deploy solid solutions to scale and show that fedora is
still the leader in open infrastructure and sets industry best
practices.
Yeah, we could hopefully have another network thats larger than /24 for
the arm builders.
I'm sure some of this will be a process of 'oh no, what we have now
doesn't scale, lets fix it'. Of course some of it we can get ready for
up front too.
Overall I like the idea of the automated builder re-install and think
it will get us more ready for things like a large arm cluster.
kevin