This was brought up a little while ago and we decided to put off the discussion a little bit but I'd like to re-start the conversation before we get too much farther with disposable clients.
My plan for how our hosts would be set up once we deploy support for disposable clients is this: - virthosts would have N buildslave processes running on them - each buildslave would launch VMs for disposable clients as needed - each virthost would have access to a shared filesystem used to store at least VM images, maybe logs and other data
While virt-in-virt is possible, I'd prefer to avoid the extra complexity and performance penalty and figure that running on bare metal makes more sense. If we disable local task execution, there should be little risk of one task disrupting other stuff on that virthost that can't be easily reverted.
All of our infrastructure outside of the database host are running Fedora (F21 at the moment) which is fine but I'm wondering about migrating to RHEL for everything that doesn't have to run Fedora.
For observers of this discussion - I'm not in any way asserting that Fedora isn't capable of running our production services well. I'm simply acknowledging that it will likely be more expensive (in terms of human resources) to run Fedora on our production systems and I'm wondering whether it's wise to be spending that much of our rather limited resources to do it.
In my mind, the big pros/cons for the two approaches are:
-------------------------- Moving to RHEL -------------------------- Pros: * Less frequent updates, less downtime required for kernel updates etc. * Using the same bits that infra is already using and there should be less maintenance work on our part
Cons: * Will make python and library compatibility more of an issue (if we're going to support tasks on el7 anyways, this point is kind of moot since we'll need to do it anyways) * No buildbot packages for el7 - I'm not willing to take on epel7 maintenance for buildbot 0.8.x and I'd rather not see packages for it in epel7, either. We'll be migrating to buildbot 0.9 before long and I'd prefer to avoid compat packages. * Migration and some code changes may be required * May need to take on maintenance of EPEL packages and/or deal with folks not testing updates before making version changes in EPEL (I've gotten burned by that before with blockerbugs)
-------------------------- Staying With Fedora -------------------------- Pros: * Dogfooding * Production environment is closer to dev environments * Don't have to worry about as many compat packages * No EPEL * More testing of Fedora - we are QA after all
Cons: * We'll probably hit problems before infra does since we'd be upgrading everything more often * More frequent migrations since Fedora releases are supported for 1 year * More frequent downtime for updates * Will need to be more diligent about keeping dev/stg on updates-testing so that we don't get any nasty surprises in production
I think that infra would like to see us migrate to RHEL so that the Taskotron systems are more like everything else that supports Fedora but I don't think that they'd object to us running Fedora as long as we accept responsibility for keeping everything working and actually do it. If we do decide that we'd prefer to keep Fedora, we'll discuss it with them but I wanted to start the discussion here before bothering the infra folks with it.
I suspect that our virthosts for Taskotron will be slightly different from infra's either way - we still need to figure out a shared filesystem for the images (gluster is the first thing that comes to mind but there are other options) and none of infra's virthosts have that. They do use gluster for a few things, so I suspect that it'd be less work to set that up on rhel than fedora.
I'm a bit torn on this - as much as I'd like see us use Fedora, I think it's important to provide as much value to Fedora as we can and I'm not sure that our time is best spent admin-ing and upgrading systems. That being said, migrating to el7 wouldn't be trivial but once we get over the initial hurdle of packaging and deployment I think it will require less maintenance.
What does everyone else think? Keep in mind that you'll be the folks helping with all of this :)
Tim
On Sat, 9 May 2015 12:05:04 -0600 Tim Flink tflink@redhat.com wrote:
This was brought up a little while ago and we decided to put off the discussion a little bit but I'd like to re-start the conversation before we get too much farther with disposable clients.
My plan for how our hosts would be set up once we deploy support for disposable clients is this:
- virthosts would have N buildslave processes running on them
- each buildslave would launch VMs for disposable clients as needed
- each virthost would have access to a shared filesystem used to store at least VM images, maybe logs and other data
Thats each virthost, not all virthosts having the same storage right?
...snip good description of pros and cons...
I think that infra would like to see us migrate to RHEL so that the Taskotron systems are more like everything else that supports Fedora but I don't think that they'd object to us running Fedora as long as we accept responsibility for keeping everything working and actually do it. If we do decide that we'd prefer to keep Fedora, we'll discuss it with them but I wanted to start the discussion here before bothering the infra folks with it.
Yeah, we run Fedora in places it makes sense to do so (builders are all fedora for example), but they are some more work.
I suspect that our virthosts for Taskotron will be slightly different from infra's either way - we still need to figure out a shared filesystem for the images (gluster is the first thing that comes to mind but there are other options) and none of infra's virthosts have that. They do use gluster for a few things, so I suspect that it'd be less work to set that up on rhel than fedora.
gluster might be actually easier on fedora. RHEL ships all gluster but the server in base repos, but the server is in some storage channel, etc.
I'm happy to provide any info or support needed.
kevin
On Mon, 11 May 2015 13:09:33 -0600 Kevin Fenzi kevin@scrye.com wrote:
On Sat, 9 May 2015 12:05:04 -0600 Tim Flink tflink@redhat.com wrote:
This was brought up a little while ago and we decided to put off the discussion a little bit but I'd like to re-start the conversation before we get too much farther with disposable clients.
My plan for how our hosts would be set up once we deploy support for disposable clients is this:
- virthosts would have N buildslave processes running on them
- each buildslave would launch VMs for disposable clients as
needed
- each virthost would have access to a shared filesystem used to store at least VM images, maybe logs and other data
Thats each virthost, not all virthosts having the same storage right?
All of the virthosts (at least in each group of dev/stg/prod) would have a chunk of shared storage used to store the canonical VM images that we use to boot the disposable clients (the disk changes would be done locally to the virthosts). This way we only have to build them once instead of once per virthost.
In the back of my head, I'm thinking that it may make sense to store logs and artifacts on a chunk of shared storage instead of transferring everything to the taskotron master using buildbot. I figure that may make sense if the shared storage is already set up but this hasn't gotten past the "thinking about it" stage yet :).
...snip good description of pros and cons...
I think that infra would like to see us migrate to RHEL so that the Taskotron systems are more like everything else that supports Fedora but I don't think that they'd object to us running Fedora as long as we accept responsibility for keeping everything working and actually do it. If we do decide that we'd prefer to keep Fedora, we'll discuss it with them but I wanted to start the discussion here before bothering the infra folks with it.
Yeah, we run Fedora in places it makes sense to do so (builders are all fedora for example), but they are some more work.
I suspect that our virthosts for Taskotron will be slightly different from infra's either way - we still need to figure out a shared filesystem for the images (gluster is the first thing that comes to mind but there are other options) and none of infra's virthosts have that. They do use gluster for a few things, so I suspect that it'd be less work to set that up on rhel than fedora.
gluster might be actually easier on fedora. RHEL ships all gluster but the server in base repos, but the server is in some storage channel, etc.
Hrm, I would have expected it to be the other way around. I also figured that gluster would be one of those things that was the most likely to cause trouble on upgrades but that's just instinct, no experience to back it up.
I'm happy to provide any info or support needed.
Thanks for the info, it's much appreciated.
Tim
On Mon, 11 May 2015 17:16:09 -0600 Tim Flink tflink@redhat.com wrote:
On Mon, 11 May 2015 13:09:33 -0600 Kevin Fenzi kevin@scrye.com wrote:
On Sat, 9 May 2015 12:05:04 -0600 Tim Flink tflink@redhat.com wrote:
This was brought up a little while ago and we decided to put off the discussion a little bit but I'd like to re-start the conversation before we get too much farther with disposable clients.
My plan for how our hosts would be set up once we deploy support for disposable clients is this:
- virthosts would have N buildslave processes running on them
- each buildslave would launch VMs for disposable clients as
needed
- each virthost would have access to a shared filesystem used to store at least VM images, maybe logs and other data
Thats each virthost, not all virthosts having the same storage right?
All of the virthosts (at least in each group of dev/stg/prod) would have a chunk of shared storage used to store the canonical VM images that we use to boot the disposable clients (the disk changes would be done locally to the virthosts). This way we only have to build them once instead of once per virthost.
Would it be feasable to just build them once and rsync them between hosts? Or would you prefer shared storage?
In the back of my head, I'm thinking that it may make sense to store logs and artifacts on a chunk of shared storage instead of transferring everything to the taskotron master using buildbot. I figure that may make sense if the shared storage is already set up but this hasn't gotten past the "thinking about it" stage yet :).
Yeah. We may be able to do a netapp nfs volume, we will have to see what all we can do once we move to our new c-mode filer.
kevin
- Will need to be more diligent about keeping dev/stg on updates-testing so that we don't get any nasty surprises in production
I don't have much advice about the other points, but this one caught my attention. Do we really need to use updates-testing for dev/stg? That might be quite problematic, because anyone can submit anything, no matter how broken, into updates-testing. Wouldn't be a safer approach to update dev daily (and stg e.g. every other day) from stable updates? And production would be updated weekly or bi-weekly (or however often we need it), with the exception of security updates. Security updates would be applied to dev/stg immediately and after a few jobs were successfully executed, it would be applied to production. Would this approach work?
I guess the approach with security updates would be the same, no matter whether it's Fedora or RHEL. So the only difference in the volume and speed of standard updates.
This doesn't mean I'm in favor of running Fedora, I think you have much more experienced view on this. I'm just thinking aloud about some of the details.
----- Original Message -----
From: "Tim Flink" tflink@redhat.com To: qa-devel@lists.fedoraproject.org Sent: Saturday, May 9, 2015 8:05:04 PM Subject: To RHEL or Not to RHEL?
...snip...
While virt-in-virt is possible, I'd prefer to avoid the extra complexity and performance penalty and figure that running on bare metal makes more sense. If we disable local task execution, there should be little risk of one task disrupting other stuff on that virthost that can't be easily reverted.
Does it make sense not to disable local execution on one or more buildslave? I wonder if some tasks could benefit from not running in vm. Or it might be waste of resources to run tasks like rpmlint on a disposable client?
Martin
On Tue, 12 May 2015 17:34:24 -0600 Kevin Fenzi kevin@scrye.com wrote:
On Mon, 11 May 2015 17:16:09 -0600 Tim Flink tflink@redhat.com wrote:
On Mon, 11 May 2015 13:09:33 -0600 Kevin Fenzi kevin@scrye.com wrote:
On Sat, 9 May 2015 12:05:04 -0600 Tim Flink tflink@redhat.com wrote:
This was brought up a little while ago and we decided to put off the discussion a little bit but I'd like to re-start the conversation before we get too much farther with disposable clients.
My plan for how our hosts would be set up once we deploy support for disposable clients is this:
- virthosts would have N buildslave processes running on them
- each buildslave would launch VMs for disposable clients as
needed
- each virthost would have access to a shared filesystem used
to store at least VM images, maybe logs and other data
Thats each virthost, not all virthosts having the same storage right?
All of the virthosts (at least in each group of dev/stg/prod) would have a chunk of shared storage used to store the canonical VM images that we use to boot the disposable clients (the disk changes would be done locally to the virthosts). This way we only have to build them once instead of once per virthost.
Would it be feasable to just build them once and rsync them between hosts? Or would you prefer shared storage?
Honestly, that hadn't even occurred to me. I think it may end up depending on how we kick off the image builds but that sounds much easier and more reliable than a shared filesystem.
In the back of my head, I'm thinking that it may make sense to store logs and artifacts on a chunk of shared storage instead of transferring everything to the taskotron master using buildbot. I figure that may make sense if the shared storage is already set up but this hasn't gotten past the "thinking about it" stage yet :).
Yeah. We may be able to do a netapp nfs volume, we will have to see what all we can do once we move to our new c-mode filer.
I think it may be a little while before we're ready to look at doing something different for the logs and shared storage - still at step 1 "make it work" in this whole process but will keep it in mind.
Thanks,
Tim
On Wed, 13 May 2015 08:45:08 -0400 (EDT) Kamil Paral kparal@redhat.com wrote:
- Will need to be more diligent about keeping dev/stg on updates-testing so that we don't get any nasty surprises in
production
I don't have much advice about the other points, but this one caught my attention. Do we really need to use updates-testing for dev/stg? That might be quite problematic, because anyone can submit anything, no matter how broken, into updates-testing. Wouldn't be a safer approach to update dev daily (and stg e.g. every other day) from stable updates? And production would be updated weekly or bi-weekly (or however often we need it), with the exception of security updates. Security updates would be applied to dev/stg immediately and after a few jobs were successfully executed, it would be applied to production. Would this approach work?
Yeah, I'm not dead set on using updates-testing in that scenario - it was just the easiest way to express the "test updates on dev/stg before they make it to production. I probably could have been more specific
Something like that could work as long as we were careful about only applying non-security updates on prod that had been sufficiently tested on dev/stg. At the moment, security updates are applied automatically on our fedora machines via cron job.
The one thing I'd like to improve on if we continue to use fedora is regular updates. At the moment, I try to apply updates to everything every couple of weeks but it's not a set schedule and I'd like to improve that. I'm open to suggestions on what that schedule should be and how to implement it (reminders to folks with access, cron-ish, etc.) if we go that route.
I guess the approach with security updates would be the same, no matter whether it's Fedora or RHEL. So the only difference in the volume and speed of standard updates.
Yeah, that leads into one of the disadvantages of running Fedora - more frequent updates and especially more frequent kernel updates that require a reboot.
This doesn't mean I'm in favor of running Fedora, I think you have much more experienced view on this. I'm just thinking aloud about some of the details.
Yeah, it's a good point. Thanks for pointing this out.
On Wed, 13 May 2015 09:31:48 -0400 (EDT) Martin Krizek mkrizek@redhat.com wrote:
----- Original Message -----
From: "Tim Flink" tflink@redhat.com To: qa-devel@lists.fedoraproject.org Sent: Saturday, May 9, 2015 8:05:04 PM Subject: To RHEL or Not to RHEL?
...snip...
While virt-in-virt is possible, I'd prefer to avoid the extra complexity and performance penalty and figure that running on bare metal makes more sense. If we disable local task execution, there should be little risk of one task disrupting other stuff on that virthost that can't be easily reverted.
Does it make sense not to disable local execution on one or more buildslave? I wonder if some tasks could benefit from not running in vm. Or it might be waste of resources to run tasks like rpmlint on a disposable client?
Yeah, that had occurred to me but hadn't gotten much farther with it than that.
It's something that we should probably look into. I suspect that you're right that it'd be more efficient to run some if not all of our regular tasks on non-dispoable clients. It makes triggering a bit more complicated but I don't think it would be too terrible to have a new "non-disposable" builder and trigger certain tasks on that instead of the regular builder.
Another option would be to maintain some of the vm buildslaves that we're currently using instead of running tasks on bare metal. I've filed a task for investigating this once we have a minimal system working:
https://phab.qadevel.cloud.fedoraproject.org/T480
Thanks,
Tim
On Wed, 13 May 2015 08:56:54 -0600 Tim Flink tflink@redhat.com wrote:
<snip>
All of the virthosts (at least in each group of dev/stg/prod) would have a chunk of shared storage used to store the canonical VM images that we use to boot the disposable clients (the disk changes would be done locally to the virthosts). This way we only have to build them once instead of once per virthost.
Would it be feasable to just build them once and rsync them between hosts? Or would you prefer shared storage?
Honestly, that hadn't even occurred to me. I think it may end up depending on how we kick off the image builds but that sounds much easier and more reliable than a shared filesystem.
When I was working on the disposable clients today, I rememberd another reason that I was thinking "shared filesystem" and I figured that I'd add it here.
One of the use cases that we're designing for is cloud image testing. For this use case, the task would be given details about an image, download that compose product and run some task on it. With a shared filesystem, we could procure these images once and it would be available across all the virthosts instead of downloading it once per virthost that was assigned a task using a given compose product.
I'm still not sure that the shared filesystem would be worth the trouble but there was another reason besides "I didn't think about it" that we were thinking about going that route :)
Tim
qa-devel@lists.fedoraproject.org