builders of the future!!!!!

Dennis Gilmore dennis at ausil.us
Wed Mar 21 02:38:13 UTC 2012


El Tue, 20 Mar 2012 14:44:07 -0400
seth vidal <skvidal at fedoraproject.org> escribió:
> The discussion on devel list about ARM and my work last week on
> reinstalling builders quickly and commonly has raised a number of
> issues with how we manage our builders and how we should manage them
> in the future.
> 
> It is apparent that if we add arm builders they will be lots of
> physical systems (probably in a very small space) but physical,
> none-the-less. So we need a sensible way to manage and reinstall these
> hosts commonly and quickly. 

Today there is not a way to do an anaconda install on any arm system.
though hopefully we will have that for deployment.

> Additionally, we need to consider what the introduction of a largish
> number of arm builders (and other arm infrastructure) would do to our
> existing puppet setup. Specifically overloading it pretty badly and
> making it not-very-manageable.

probably we would be adding 100-300 systems. not only do we need to
consider overloading of puppet, but also logging and monitoring. I
guess its more how do we scale our infrastructure from at a guess ~100
nodes today to 3 to 4 times that 
 
> I'm making certain assumptions here and I'd like to be clear about
> what those are:
> 
> 1. the builders need to be kept pristine
> 2. that currently our builders are not freshly installed frequently
> enough.
> 3. that the builders are relatively static in their
> configuration and most changes are done with pkg additions
> 4. that builder setups require at least two manual-ish steps of a koji
> admin who can disable/enable/register the builder with the kojihub.
> 5. that the builders are fairly different networking and setup-wise to
> the rest of our systems.
> 
> So I am proposing that we consider the following as a general process
> for maintaining our builders:
> 
> 1. disable the builder in koji
> 2. make sure all jobs are finished
> 3. add installer entries into grub (or run the undefine, reinstall
> process if the builder is virt-based)
> 4. reinstall the system
> 5. monitor for ssh to return
> 6. connect in and force our post-install configuration:
> identification, network, mount-point setup, ssl certs/keys for koji,
> etc 7. reboot
> 8. re-enable host in koji
> 
> We would do this with frequency and regularity. Perhaps even having
> some percentage of our builders doing this at all times. Ie: 1/10th of
> the boxes reinstalling at any given moment so in a certain time
> frame*10 all of them are reinstalled. 

honestly we could do this instead of the monthly updates. just rebuild
them instead

> 
> Additionally, this would mean these  systems would NOT have a puppet
> management piece at all. Package updates would still be handled
> by pushes as we do now, if things were security critical, but barring
> the need for significant changes we could rely on the boxes simply
> being refreshed frequently enough that it wouldn't need to be pushed.

im ok with that, im pretty sure fas will scale to the extra boxes. do
we drop monitoring of the builders? what about collectd etc. 
 
> What do folks think about this idea? It would dramatically reduce the
> node entries in our puppet config, it would drop the number of hosts
> connecting to puppet, too. It will mean more systems being reinstalled
> and more often. It will also require some work to make the steps I
> mention above be automated. I think I can achieve that without too
> much difficulty, actually. I think, in general, it will increase our
> ability to scale up to more and more builders.

main issue is that today we are not 100% sure of how we will install
arm boxes. how do we deal with all the non puppet related systems? also
need to look into how we can better scale koji itself. when we go from
20 to 200+ builders we need to make sure that load doesn't cause koji
to fall over.


all the arm boxes will have management consoles. but today im not 100%
sure how access to that would be. we would also need to deploy fedora
for any arm based systems. things we need to reconsider also is
networking today the storage network and the builder networks are /24's
so we could use 253 nodes. i suspect we will go over that on the build
network. we could not have the storage network on arm builders. it is
really only needed for createrepo. but we may need to look at expanding
kojipkgs to more nodes. or increase its network throughput with multiple
bonded gig network ports. think mass rebuild and 100 or 200 buildroots
initialising at once.  it will stress our resources on all levels. but
the flexibility of so many nodes could allow us to deploy solid
solutions to scale and show that fedora is still the leader in open
infrastructure and sets industry best practices.

Dennis


More information about the infrastructure mailing list