> I'll go slightly off-topic here, but consider this:
> help us identify unique results. But it will not help us decide
> whether "all" results have been submitted (maybe some are still
> running, maybe some have crashed hard and not sent any results). So
> there's a good chance that rel-eng tools will need to have deep
> knowledge of the testcases anyway. Not just list of all monitored
> testcases, but also that this particular test case needs to be
> performed for bios and uefi (or, that this particular testcase needs
> to be present with these two scenarios).
> So while having scenarios seems definitely helpful in certain cases,
> it might not help us in avoiding having too much knowledge in the
> gating consumers. Just food for thought.
I have actually been thinking about that problem as well :) And it is
indeed a tricky one. Bodhi already basically ducks the question; it
just shows whatever results are there at the time. We don't have a good
answer for cases like 'can we ship this compose?' or check-compose,
cases where the tool needs to know that testing is complete.
My best idea so far is that we should implement a 'testing complete'
fedmsg with a consistent name for each testing system - so
whatever.test.system.prefix.is.testing.complete or something like that.
I'm skeptical about this. The implementation is easier for OpenQA, because it's a
standalone system and you know that the testing is "complete" once all the test
cases passed. But for generic Taskotron tasks (depcheck, rpmgrill, etc), we schedule them
in the trigger based on fedmsg contents and then we don't track them anymore. We have
no idea if and when all of them completed. We could make a complex system to track that,
of course, but it seems to me that it's fundamentally wrong anyway. It moves the test
plan knowledge from the consumer to the producer. So even though rel-eng gating script is
the consumer that should decide whether we're good to go or not, we would move this
logic into OpenQA or Taskotron just to be able to trigger "testing complete"
message. That doesn't seem worth it. And it will break once we have multiple consumers
with different requirements (what happens when one of the consumers needs depcheck and the
other doesn't, when is the testing "complete"?).
I'd rather have a definition file in releng's pagure and send PRs against that,
than emulate the logic in all our testing systems.
So yes, this is a tricky problem indeed. If anyone has a magic solution, I'm
Then at least the consumer only has to know which test systems it
about, and it can quite trivially trigger whenever it has
'testing.complete' messages from each one for the relevant item. I
really dunno if we can do anything better than this. I suppose if we
also required all systems to send a 'testing.started' message, we could
have some sort of meta-consumer check when it's seen a
'testing.complete' message for each 'testing.started' message and send
out a 'all testing for compose X is complete' message, but even if we
*do* that, it means that any single system can prevent the 'all testing
complete' message going out if it's slow or broken, even if it's one a
given consumer doesn't actually care about. So I don't think consumers
would use such a message even if we managed to build it...
Note that I've already implemented something like this for openQA,
though it doesn't work exactly the way I described at present. Each
openQA 'test complete' message contains a count of currently running or
scheduled tests for the same build, and things like check-compose
trigger when they see a 'job done' message with that count at 0.
So as suggested above, this will work well only until you start adding tests which are
interesting to run, but not necessary for the consumer. Or until we have 2 different
> To be fair, I can think only of a few minor points:
> * Data duplication, makes the field longer and harder to read.
This seems pretty minor - honestly there's almost no case for a human
to be manually reading the field ever, except for debugging purposes. I
never actually go and manually inspect what the openQA scenario values
are, there's just no use case for that - it's just a concept that the
tools use to do a job.
I said all those points were minor :)
> * A possibility for errors. Since tasks are allowed to add arbitrary
> suffixes to their testcase names, a task can create tens of "sub-
> testcases" (e.g. look at dist.rpmgrill*). The scenario field is not
> automatically generated but fully under task's control. There might
> be copy&paste errors or logical errors in the code, and the scenario
> might not correctly reflect the used testcase name.
Hum, I guess, though it doesn't actually *matter* if the scenario still
does its job of being unique within the results for a single item, and
being attached to 'the same' test across multiple items. We're
definitely not going to want consumers to get into the business of
parsing things out of the scenario value, if they actually want one of
the values that makes up the scenario value they should request it
directly. But sure, it's a possibility.
I can see this easily going wrong like this:
1. You have dist.rpmlint that includes "dist.rpmlint" in scenario.
2. You decide to create dist.rpmguard check that is very similar, so you copy dist.rpmlint
code and modify it. You forget to adjust the scenario value, so it still contains
"dist.rpmlint" in it.
3. Now when you query for item+scenario combination, as suggested by you, you get invalid
results (you'll get both dist.rpmlint and dist.rpmguard results mixed up, even though
just requested just dist.rpmlint results. Time ordering will decide whether you get a
correct one or incorrect one as the latest result).
Another realization is that namespace would definitely need to be present in scenario key
(otherwise you get all rpmlint results, even from other people, even from scratch, etc).
That again makes it prone to errors when you switch namespaces (going from scratch to
dist, for example).
> * Consistency (even though that can be arguable). I'd like to be
> clear that testcase+item combination is always the default way to
> consume results. For more complex tests, it might also require
> looking at scenario to distinguish unique results. If I write
> instructions that say that results are identified by item+scenario if
> scenario exists, otherwise as testcase+item, it seems more complex
> and less obvious to me. (But this is maybe just about phrasing).
I...dunno if this is going to be a tenable place to stand, to be
Funnily enough I went through the same process in a somewhat
different context. When thinking about relval-ng in my head, I
initially had the idea that we could kill the Wikitcms 'environment'
columns - which, if you think about it, are basically this 'scenario'
Yep, I realized scenario is the same as environment, or more precisely a combination of
environments. I even wanted to propose to rename it, but scenario is probably a better
by associating all results with a specific image - which is,
if you think about it, the 'item' concept.
NOTE IF YOU'VE NO IDEA WHAT THIS IS ABOUT: we're talking about the wiki
pages Fedora QA uses to store release validation test results, like
Note each row is for one 'test case', and most rows have multiple
columns for results in different 'environments' (i.e. scenarios) for
that test case. 'relval-ng' is the working name of a system we're
proposing to build to replace the wiki system.
So I was effectively thinking the same as you: we can always just
identify a result as the combination of "a test case" and "a tested
But then I thought, nah, it's still not really that simple. In relval-
ng as in openQA, the easiest example is BIOS vs. UEFI: we have some
tests, e.g. Anaconda_User_Interface_Basic_Video_Driver , where we want
to test on both BIOS and UEFI. This involves the same test case and the
same tested 'item' - an x86_64 installer image - but a different
'scenario'. Unless you start stuffing scenario items into the test case
name (basicvideo.uefi , basicvideo.bios ?) or item name (foobar.iso
BIOS, foobar.iso UEFI ?) - an idea we rejected back at the start of
this thread - I fundamentally don't see a way around this.
There are other examples, though - take the 'Default boot and install'
table, where we consider installing with the same image to a VM and to
bare metal as being different scenarios, and also installing with the
same image written to an optical disc and written to a USB stick.
There's just fundamentally no way around that without invoking the
'scenario' concept, or something very much like it but phrased
As we've also already noted, the existing package tests have also run
into this problem: "test name + item" is not sufficient to really
define the result, as 'item' is a source package but the same test is
run for all binary package arches and may have different results on
different arches (IIUC).
At present, we report rpmlint, rpmgrill, abicheck etc as a single result against source
NVR. We did not see a good reason why to separate the results per arch, especially when we
can do the check for all archs in a single pass. You can see the separated results in the
log, of course. The only difference is depcheck, which is executed in several runs, and
it's easier to report that separately.
Basically, my contention is that this 'scenario' concept is going to
just keep on turning up, all over the place, and it's going to be more
realistic to expect to be dealing with a 'scenario' most of the time,
than to expect to be dealing with just 'test case plus item' most of
the time and consider 'scenario' to be a kind of "advanced" thing.
We'll see how often it'll be used, but I agree it will not be rare. For basic
tests like rpmlint it will not be needed, but for anything heavier, it probably will.
So I think I still slightly prefer the idea of including the test name
in the scenario value, but I really don't have a strong preference
either way. Let's just pick one approach and go with it.
I dislike the fact you'll need to maintain the same values in several places
(checkname and namespace in both task formula and your code generating the resultsyaml). I
also don't see good benefits in having it there. So my preference is to avoid putting
namespace+checkname into scenario. I'm not adamant here and if I got anything wrong,
please try to convince me otherwise. Or maybe we can have someone else voice an opinion
here... Josef? :-)
Note if I'm being honest I have a practical reason for this, as openQA
defines the test case name as one of the 'scenario keys', so if we go
with 'test name not in scenario value', in the openQA reporter we'll
have to take the list of 'scenario keys' and then remove the test name
from it. Obviously this is a trivial point, but just to be honest, it's
the real reason why I initially assumed we'd put the test name in the
scenario value: just cos that's how openQA is currently set up to do
it, if you do it the easiest way :)
Ah, now we know! :-) Hmm, maybe we can take a different path here then. We don't
really need to standardize how scenario value looks. We just need to say:
"the purpose of scenario key is to identify unique runs of your test, when
testcase+item combination is not enough. You should include all extradata fields values
that make this test run unique (e.g. 'scenario=x86_64.uefi' for
'arch=x86_64' and 'firmware=uefi' extradata)."
So the recommendation is clear. If you include the test name in openqa results, it does
not negatively affect the uniqueness resolution. We just need to be clear in our
documentation that in order to recognize unique results, you always need to consider
testcase + item + scenario (if present).
Of course if someone takes inspiration from openqa results, he/she might also include
non-important strings (testcase name) in it, but if we're clear in our documentation,
that should not be a problem (i.e. hopefully the consumer authors will read it). Would
that work for you?
> Since you're clearly powered by Duracell batteries ,
> ahead and implement this. If we decide to implement this later in
> resultsdb directly, we can always go and simplify the consumer code.
Right. And I've merged the version where the test name is included in
the scenario value, but again, we can change this later if we want to.
So long as we write the consumer code to query for 'scenario plus test
name', even if it happens to hit an older result where the test name
was included in the scenario, no harm is done, the correct result will
So yeah, openQA results now include a 'scenario' value.
For openQA the
scenario value is constructed by joining the values for all keys openQA
considers 'scenario keys' with periods, but we don't actually have to
make this consistent between different systems - the only requirement
is that the value correctly identify the scenario.
I can try and send a
patch for Taskotron to do this as well, if you like, or would you
rather do it?
I'll do it this week. Unless you want it in 24 hours, in that case feel free to submit
> Btw, if you end up submitting patches to Bodhi and using
> 'type=bodhi_update' queries, please also use the 'since=' argument
>  and set it to the update's 'date_modified' timestamp. That
> doesn't really implement the higher reliability of such results (as
> we talked elsewhere), but it doesn't stress resultsdb so much, which
> is always good (we can't do the same for type=koji_build easily, but
> we can for type=bodhi_update).
I'm planning to do this, but I did want to talk over with you guys what
would be appropriate for the Taskotron results. Bodhi is definitely
going to have to query for 'type=bodhi_update' results where the item
is the update ID in order to find the openQA results. But I don't know
if we should just make it find the Taskotron results in the same way,
or if it's best to have it query for both koji_build and bodhi_update
results and somehow deduplicate them (so it doesn't show Taskotron
results which were reported against both the Koji build and the Bodhi
update twice). But that's probably a separate thread.
Yep, let's talk about that separately.