Beaker-devel December 2012

beaker-devel@lists.fedorahosted.org

7 participants
4 discussions

by Dan Callaghan

An important first step towards supporting alternative harnesses and/or the mythical Beaker Simple Harness is coming up with a stable, documented API for the harness to interact with Beaker. So I'm starting this thread now to get the ball rolling. My first thoughts: * It should have the smallest possible surface area -- just enough to expose all of Beaker's functionality and nothing more. * It should be defined from the point of view of lab controller <-> test system, not scheduler <-> test system. Corollary: we might need to start treating the lab controller as a first-class citizen instead of just a dumb proxy to the scheduler. * Just because we use XML-RPC now doesn't mean it's the best choice, and doesn't mean we need to keep using it. * In particular, the HTTP protocol probably supports everything we need to build the API (in other words, it can be a "RESTful API" even though I hate that phrase). For example logs could be uploaded using HTTP PUT (with Content-Range), which means if we wanted we could potentially use Apache mod_dav_fs to efficiently write these directly to the filesystem without any intervening Beaker code. There are four areas the API needs to cover (that I can think of). The harness needs to be able to: * find out what to run * report results and upload logs * extend the watchdog time * synchronize with other recipes in the recipe set This last area is a new one, it's currently handled entirely in beah and there is no API on the Beaker side for it. But the dynamic FQDNs in Beaker 0.10 mean it is now needed, even if it's as simple as having a call to wait for all other harnesses in the recipe set to check in. Thoughts? -- Dan Callaghan <dcallagh(a)redhat.com> Software Engineer, Infrastructure Engineering and Development Red Hat, Inc.

11 years, 2 months

5
10
0 / 0

Scheduling recipe sets rather than recipes

by Nick Coghlan

The current scheduler works almost purely at the recipe level. The extent to which it pays attention to recipesets pretty much amounts to ensuring all recipes in a recipe set are scheduled on the same lab controller. This creates some interesting problems with multi-host testing: - a recipe set with strict host requirements for only some systems may hold on to common systems for a long time while waiting for rare ones (the addition of dynamic virt support opens the door for a recipe set to hold on to dynamic virt resources while waiting for physical hardware for other recipes) - recipe sets scheduled for unique systems may deadlock if a high priority job is competing with a previously queued low priority job which has already claimed some resources To better explain the latter problem, consider a lab with only 2 systems, A and B, containing a particular piece of hardware, and a multi-host recipe set that needs both of them. Queue a low priority version of that job while a test is running on system A, and the job will claim system B immediately for one recipe, while the other will remain in the queue. If a high priority copy of the job is added before the test running on system A completes, then system A will be claimed by Job 2. This leaves the two jobs in a classic ABBA deadlock, as Job 1 has System B and is waiting for System A, while Job 2 has System A and is waiting for System B. Some of the metrics support being added in 0.11 is actually about measuring the overall impact of the first problem (by seeing what proportion of their time systems spend in the Scheduled state). For other reasons to do with being able to effectively partition the scheduling task between multiple schedulers each handling the systems managed by a particular lab controller, I've been considering proposing the inclusion of a "Claimed" state in the recipe lifecycle. The "Claimed" state would fit between "Queued" and "Scheduled", and indicate that the recipe had been assigned to a specific lab controller, but not yet assigned to a specific system (at the moment, this state change is handled implicitly through setting "recipe.recipeset.lab_controller" when the first recipe in the recipeset is scheduled). Furthermore, the scheduler would be updated to work on a *cached* copy of the System status data. This is needed to avoid the current problem where there's a race condition with system status changes occurring during a scheduling pass leading to recipes jumping the queue (I'm interested in hearing about relatively clean ways to this with SQL Alchemy, though: http://stackoverflow.com/questions/13983067/cached-reads-immediate-writes...) In combination, these two would allow Claimed recipes to be given priority over Queued recipes on subsequent passes, preventing the deadlock problem and theoretically also improving system utilitization. One social challenge with addressing this is that we don't want to enable/encourage queue jumping for rare systems by scheduling them in a recipe set with a job that will be scheduled quickly, but I'm not sure we can solve that at the technical level. Cheers, Nick. -- Nick Coghlan Red Hat Infrastructure Engineering & Development, Brisbane Python Applications Team Lead Beaker Development Lead (http://beaker-project.org/) GlobalSync Development Lead (http://pulpdist.readthedocs.org)

11 years, 4 months

3
4
0 / 0

Design sketch: RPM task for running beah/beakerlib tests from Git repos

by Nick Coghlan

I was pondering the question of running tests from Git repos recently, and Dan's recent efforts in resurrecting patchbot (to sanity check patches on Gerrit), which kinda does exactly that for our own dogfood tests, prompted me to post my ideas for people to poke holes in :) Goal: Allow developers to run tests based on the existing results reporting infrastructure directly from Git, without the need to build a test RPM Benefits: - eliminates a step in the test development workflow (bkr task-add) - avoids versioning issues when updating tests - potentially helps with VM-image-library-based testing (since the runtest.sh in a "run from Git" task gives us another location for harness code execution, independent of kickstart %post snippets) (Deliberate) Limitations: - retains the dependency on beah/beakerlib for setting up the environment and reporting results - thus doesn't help with the cross-platform testing problem (and, to be frank, I don't think we *should* ever try to solve that problem directly - instead, we eventually need to figure out how to integrate STAF, autotest or both as alternatives to beah for the components that run on the system under test. That's a much harder problem than simply allowing ordinary beak/beakerlib tests to be executed from Git, though, since the differences in execution and reporting models would need to be aligned somehow) Design Details: The proposal is fairly simple: - create a new standard task (maintained in the main beaker repo) that accepts parameters defining: - a Git URL to clone - a test execution command to be run from the base directory of the clone - the task's runtest.sh would take care of any setup-and-teardown needed to clone the repo, run the test and then delete the repo again - the exact Git changeset id for the checked out repo would be reported as part of the test details (for cases where the submitted URL doesn't specify a particular tag or revision) - (What else would we need in the task parameters to make up for the lack of per-test-case tasks? Probably everything that would otherwise be set in runtest.sh) Possible additions: - while the standard version of the task would permit arbitrary Git URLs, it likely wouldn't be hard to create a modified version that only allowed URLs from a defined subset of hosts. Actually implementing this isn't high on my priority at the moment, but if the above idea seems workable, then it should be a lot easier to make happen than an approach that requires server side changes. Cheers, Nick. -- Nick Coghlan Red Hat Infrastructure Engineering & Development, Brisbane Python Applications Team Lead Beaker Development Lead (http://beaker-project.org/) GlobalSync Development Lead (http://pulpdist.readthedocs.org)

11 years, 4 months

3
2
0 / 0

Support for extracting metrics data via raw SQL

by Nick Coghlan

Something we're focusing on at the moment is improving the ability to extract metrics data from a Beaker installation without relying on the main server (either via the web UI or the XML-RPC interface). The problem with relying on either of those is that some of the more interesting queries can be quite resource intensive, and end up interfering with the operation of the job scheduler. For the more volatile metrics (like current system utilisation and the state of the recipe queue), Beaker 0.11 will be sending several additional signals to Graphite. We likely won't have a nice dashboard for those in this release (instead relying on a few direct links to appropriately designed graphs in Graphite web UI), but creating a "Beaker Dashboard" for an installation is definitely on the cards for the subsequent release. For the more resource intensive queries though, I'd like to be able to rely on data aggregation systems like Teiid and business reporting tools like Jasper Reports. That means: 1. Identifying the Beaker metrics which we think are interesting (and aren't covered by the Graphite data) 2. Figuring out how to extract those from the database schema 3. Figuring out how to publish them to Beaker users in a way that allows them to be used in a reporting system like Jasper, but won't be a nightmare for us to maintain Amit's patch at http://gerrit.beaker-project.org/#/c/1546 is a decent attempt at 1 and 2, but ultimately fails 3. Amit, Dan and I spent some time discussing this on IRC, so here's what we're currently thinking: 1. Create a new location in the Beaker source repo for metrics .sql files 2. Have a section in the admin guide on metrics extraction (as Amit's patch does), but: - drop the "Job Congestion Measurement" use case (already covered by the Graphite metrics) - drop the "Hardware Utilization and Coverage" use case (this part of the schema is seriously complicated, so it's better to use the Search UI on the main server. Perhaps mention using that UI to craft a query and then adding "&tg_format=atom&list_tgp_limit=0" to get the results as a machine readable list of links as per http://beaker-project.org/server-api/http.html#system-inventory-information) - modify the User Accounting section to focus on specific architectures, rather than distro versions - adjust the remaining sections to reference the appropriate .sql file instead of including the SQL in line - add a caveat noting that the details of these queries may change between releases, but this should always be mentioned in the release notes 3. For each .sql file, have a test case which runs that SQL and checks it gives the same answer as the SQL Alchemy model Cheers, Nick. -- Nick Coghlan Red Hat Infrastructure Engineering & Development, Brisbane Python Applications Team Lead Beaker Development Lead (http://beaker-project.org/) GlobalSync Development Lead (http://pulpdist.readthedocs.org)

11 years, 4 months

3
3
0 / 0

← Newer
1
Older →

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Beaker-devel December 2012