builders of the future!!!!!

Tue Mar 20 18:44:07 UTC 2012

The discussion on devel list about ARM and my work last week on
reinstalling builders quickly and commonly has raised a number of
issues with how we manage our builders and how we should manage them in
the future.

It is apparent that if we add arm builders they will be lots of
physical systems (probably in a very small space) but physical,
none-the-less. So we need a sensible way to manage and reinstall these
hosts commonly and quickly. 

Additionally, we need to consider what the introduction of a largish
number of arm builders (and other arm infrastructure) would do to our
existing puppet setup. Specifically overloading it pretty badly and
making it not-very-manageable.

I'm making certain assumptions here and I'd like to be clear about what
those are:

1. the builders need to be kept pristine
2. that currently our builders are not freshly installed frequently
enough.
3. that the builders are relatively static in their
configuration and most changes are done with pkg additions
4. that builder setups require at least two manual-ish steps of a koji
admin who can disable/enable/register the builder with the kojihub.
5. that the builders are fairly different networking and setup-wise to
the rest of our systems.

So I am proposing that we consider the following as a general process
for maintaining our builders:

1. disable the builder in koji
2. make sure all jobs are finished
3. add installer entries into grub (or run the undefine, reinstall
process if the builder is virt-based)
4. reinstall the system
5. monitor for ssh to return
6. connect in and force our post-install configuration: identification,
network, mount-point setup, ssl certs/keys for koji, etc
7. reboot
8. re-enable host in koji

We would do this with frequency and regularity. Perhaps even having
some percentage of our builders doing this at all times. Ie: 1/10th of
the boxes reinstalling at any given moment so in a certain time
frame*10 all of them are reinstalled. 

Additionally, this would mean these  systems would NOT have a puppet
management piece at all. Package updates would still be handled
by pushes as we do now, if things were security critical, but barring
the need for significant changes we could rely on the boxes simply being
refreshed frequently enough that it wouldn't need to be pushed.

What do folks think about this idea? It would dramatically reduce the
node entries in our puppet config, it would drop the number of hosts
connecting to puppet, too. It will mean more systems being reinstalled
and more often. It will also require some work to make the steps I
mention above be automated. I think I can achieve that without too much
difficulty, actually. I think, in general, it will increase our ability
to scale up to more and more builders.

I'd like input, constructive, please.

Thanks,
-sv