Beaker provisioning without Cobbler - Beaker-devel - Fedora mailing-lists

30 Jan 2012


      Dear list,
I've been working on removing Beaker's dependence on Cobbler for
provisioning systems. This mail is to describe the approach I am
proposing, and to seek feedback on it.
At present, Beaker requires cobblerd to be running on each lab
controller. When a new distro comes along, it must be imported into
Cobbler. From there, a script is run to register new Cobbler distros
with the Beaker server. (Bill has been doing some work on this side of
things separately.)
When it comes time to reboot or power off a system -- either because a
user has manually requested it, or the scheduler is starting/stopping a
job -- the Beaker server makes a series of XML-RPC calls to cobblerd on
the lab controller, requesting that it execute the appropriate power
control script. The parameters for the power script are based on the
system's power settings stored in Beaker.
Similarly, provisioning a distro on a system means making a series of
XML-RPC calls to cobblerd to configure netbooting for the system and
then rebooting it.
As a first step towards removing Cobbler, I am tackling the power
command side of things. Since version 0.6.14 Beaker already has a
per-system queue of power commands, to handle the fact that some systems
take a very long time to power on and off. Right now a dedicated thread
runs in beakerd (on the Beaker server) processing new power commands,
and checking the status of running ones. In each case it has to make an
XML-RPC call back to cobblerd on the lab controller.
We can flip this relationship on its head, by making the lab controller
poll the Beaker server for new power commands. When it sees one, it
executes the power script and reports the result back. This is similar
to the way the existing beaker-watchdog daemon works: it polls the
server periodically for new and expired watchdog records and acts on
them.
The main advantage to this approach is that we no longer need
bi-directional communication between lab controllers and the Beaker
server. Instead, all requests come from the lab controllers to the
server. If a lab controller goes down, or becomes unreachable, errors do
not pile up on the Beaker server. The queue for systems in that lab will
simply not progress until the lab controller comes back online, at which
point it should be able to recover gracefully. It will also allow labs
to be behind a NAT. Plus it is more efficient to have the lab controller
report back when a power script is finished, rather than having the
Beaker server polling cobblerd to check the status of the command.
I have a proof-of-concept patch which implements power command handling.
You can view and comment on the patch here:
http://gerrit.beaker-project.org/912
In this patch I have used gevent, which is a library for event-driven
asynchronous programming using "greenlets". I wanted to avoid using
threads for supervising the power scripts, because in a large lab with
hundreds of test systems and many power commands running concurrently,
having a thread per command (actually at least two, one to read from
stdout and one to read from stderr) would waste a lot of memory. I chose
gevent over Twisted because a lot of existing code can be used as-is,
without porting all of it to Twisted. You can read more about gevent
here:
http://www.gevent.org/
Still on my todo list for this patch:
* implement timeouts for the power scripts, so that they can't run
  forever and never return
* add optional support for receiving power commands over AMQP (as an
  optimisation for polling), like the beaker-watchdog daemon currently
  has
* make the daemon shutdown cleanly: if there are any power scripts
  running, they should be allowed to complete and report their result
  before being killed
As a next step towards removing Cobbler, we can expand the command queue
to include provisioning commands and have the new beaker-provision
daemon process those also. I will be working on this next.
-- 
Dan Callaghan dcallagh@redhat.com