This was brought up a little while ago and we decided to put off the discussion a little bit but I'd like to re-start the conversation before we get too much farther with disposable clients.
My plan for how our hosts would be set up once we deploy support for disposable clients is this: - virthosts would have N buildslave processes running on them - each buildslave would launch VMs for disposable clients as needed - each virthost would have access to a shared filesystem used to store at least VM images, maybe logs and other data
While virt-in-virt is possible, I'd prefer to avoid the extra complexity and performance penalty and figure that running on bare metal makes more sense. If we disable local task execution, there should be little risk of one task disrupting other stuff on that virthost that can't be easily reverted.
All of our infrastructure outside of the database host are running Fedora (F21 at the moment) which is fine but I'm wondering about migrating to RHEL for everything that doesn't have to run Fedora.
For observers of this discussion - I'm not in any way asserting that Fedora isn't capable of running our production services well. I'm simply acknowledging that it will likely be more expensive (in terms of human resources) to run Fedora on our production systems and I'm wondering whether it's wise to be spending that much of our rather limited resources to do it.
In my mind, the big pros/cons for the two approaches are:
-------------------------- Moving to RHEL -------------------------- Pros: * Less frequent updates, less downtime required for kernel updates etc. * Using the same bits that infra is already using and there should be less maintenance work on our part
Cons: * Will make python and library compatibility more of an issue (if we're going to support tasks on el7 anyways, this point is kind of moot since we'll need to do it anyways) * No buildbot packages for el7 - I'm not willing to take on epel7 maintenance for buildbot 0.8.x and I'd rather not see packages for it in epel7, either. We'll be migrating to buildbot 0.9 before long and I'd prefer to avoid compat packages. * Migration and some code changes may be required * May need to take on maintenance of EPEL packages and/or deal with folks not testing updates before making version changes in EPEL (I've gotten burned by that before with blockerbugs)
-------------------------- Staying With Fedora -------------------------- Pros: * Dogfooding * Production environment is closer to dev environments * Don't have to worry about as many compat packages * No EPEL * More testing of Fedora - we are QA after all
Cons: * We'll probably hit problems before infra does since we'd be upgrading everything more often * More frequent migrations since Fedora releases are supported for 1 year * More frequent downtime for updates * Will need to be more diligent about keeping dev/stg on updates-testing so that we don't get any nasty surprises in production
I think that infra would like to see us migrate to RHEL so that the Taskotron systems are more like everything else that supports Fedora but I don't think that they'd object to us running Fedora as long as we accept responsibility for keeping everything working and actually do it. If we do decide that we'd prefer to keep Fedora, we'll discuss it with them but I wanted to start the discussion here before bothering the infra folks with it.
I suspect that our virthosts for Taskotron will be slightly different from infra's either way - we still need to figure out a shared filesystem for the images (gluster is the first thing that comes to mind but there are other options) and none of infra's virthosts have that. They do use gluster for a few things, so I suspect that it'd be less work to set that up on rhel than fedora.
I'm a bit torn on this - as much as I'd like see us use Fedora, I think it's important to provide as much value to Fedora as we can and I'm not sure that our time is best spent admin-ing and upgrading systems. That being said, migrating to el7 wouldn't be trivial but once we get over the initial hurdle of packaging and deployment I think it will require less maintenance.
What does everyone else think? Keep in mind that you'll be the folks helping with all of this :)
Tim