Hi folks! So rather than send a welcome mail I figured let's get
right
into something real ;) I actually got the idea for this list because I
wrote up some thoughts on something, and realized there was nowhere
good to send it. So I invented this mailing list. And now I can send
it! Here it is.
I've been thinking some more about convenient consumption of ResultsDB
results. I'm thinking about the problem of generic handling of
duplicated / repeated tests.
Say Alice from release engineering wants to decide whether a given
deliverable is of release quality. With ResultsDB, Alice can query all
test results for that deliverable, regardless of where they came from.
Great. However, it's not uncommon for a test to have been repeated. Say
the test failed, the failure was investigated and determined to be a
bug in the test, and the test was repeated and passed. Both results
wind up in ResultsDB; we have a fail, then a pass.
How does Alice conveniently and *without special knowledge of the
system that submitted the result* identify the second result as being
for 'the same test' as the first, and thus know she can consider only
the second (most recent) result, and not worry about the failure? (I'm
expecting that to usually be the desired behaviour).
Let's even complicate it more. Sometimes it's the same result repeated again (e.g.
new depcheck result with updated repo state), but sometimes it's a different tool
submitting the result for the "same" test case. For example, I assume this is
the reason why you chose "compose.base_selinux" testcase name instead of
"compose.openqa.base_selinux". The idea is that several tools can submit the
result for the same test case, so openqa, autocloud or even a manual tester can do it.
I'm not currently sold on this idea (sharing the testcase name instead of having
"compose.openqa.base_selinux", "compose.autocloud.base_selinux" and
"compose.manual.base_selinux"), but that seems to be the current state. So with
this, it's even harder to recognize whether we've received two results from openqa
(the latter superseding the former), or whether we received two results from two different
tools (and therefore we should consider both).
There are also
other situations in which it's useful to be able to identify 'the same
test' for different executions; for instance, `check-compose` needs to
do this when it does its 'system information comparison' checks from
compose to compose.
I guess it's worth noting that this is somewhat related to the similar
question for test 'items' (the 'thing' being tested, in ResultsDB
parlance) - the question of uniquely identifying 'the same' item within
and across composes. At least for productmd 'images', myself and
lsedlar are currently discussing that in
https://pagure.io/pungi/issue/525 .
We've discussed this with Josef a while back on qa-devel. We seemed to agree that item
should identify the thing under test well, even uniquely if possible, but stay simple. We
want to avoid having too many pieces of information concatenated into a single string,
just for the purpose of unique identification. Extra data should be used for that
(structured data, no string parsing). The tradeoff is that searching is a bit more
difficult (we'd need to allow users to also search by extra data in the frontend, and
they'd have to know what to search for).
For example, for git commits, we don't really like items like
"pagure#namespace/project#githash". Perhaps we could have just githash as item
(because it's almost unique identification even across many projects) and the rest as
extradata. This way we keep item simple, it's easy to search for manually, and
it's easy to search for automatically (no string parsing).
Obviously it's more or less a
solved problem for RPMs.
Almost. For upgradepath, yes, NVR uniquely identifies the result. For depcheck, NVR + arch
(from extradata) is unique identification.
I can think of two possible ways to handle this: via the extradata, or
via the test case name.
openQA has a useful concept here. It defines what combination of
metadata defines a unique test scenario like this, and calls it...well,
that - the 'scenario'. There's a constant definition called
SCENARIO_KEYS in openQA that you can use to discover the appropriate
keys. So I'm going to use the term 'scenario' for this from now on.
There's kinda two levels of scenario, now I think about it, depending
on whether you include 'item' identification in the scenario definition
or not. For identifying duplicates within the results for a single
item, you don't need to, but it doesn't hurt; for identifying the same
scenario across multiple composes, you do need to.
I don't follow here. In order to identify another execution of the same scenario, you
need at least testcase name and item to be exactly the same (and possibly also some
metadata). Do you have some examples to show it otherwise?
I suppose someone
may have a case for identifying 'the same' test against different
items; for that purpose, you'd need the lighter 'scenario' definition
(not including the item identifier).
I don't understand this at all, it seems to go against the intended meaning of
"item".
One thing we could do is make it a convention that each test case (and
/ or test case name?)
What's the difference between the two?
indicates a test 'scenario' - such that all
results for the same test case for the same item should be considered
'duplicates' in this sense, and consumers can confidently count all
results for the same test case as results for the same test 'scenario'.
That was our naive idea originally, until we got things that are not easily uniquely
identifiable via item only (if you don't want to make item a horrible compound of all
needed information).
This seems to me like the simplest possibility, but I do see two
potential issues.
First, there's a possibility it may result in rather long and unwieldy
test case names in some situations. If we take the more complete
'scenario' definition and include sufficient information to uniquely
identify the item, The test case name for an openQA test that includes
sufficient information to uniquely identify the item under test may
look something like: `fedora.25.server-dvd-
iso.x86_64.install_default.uefi` (and that's with a fairly short test
name).
Abomination! Let's not do that.
The purpose of testcase is to identify the steps that were taken to perform the test, or
the tool, or both. In case of rpmlint, both the tool and what gets done is the same
("the steps that rpmlint performs"), so the testcase name (excluding namespace)
is just "rpmlint". For openqa, it's a tool that performs many different test
cases, so it should create a separate testcase for each of them. For your example, it
should be "install_default", or better "openqa.install_default". I
wouldn't mind adding ".uefi" to the end, even though it's not clear
whether it's a sub-step result or part of the testcase (so maybe rather
"install_default_uefi"). But it would also make complete sense to make
"firmware_type" as part of the extradata and have just
"install_default".
We didn't want to impose any restrictions on what the tool produces (so everything
under "compose.openqa" is your playground, do whatever you wish), but of course
once the tool is important enough and we want to use it for gating or any other important
tasks, we need to recommend some approaches to make it report similarly to all other
important tasks, so that querying those results is not overly difficult. That could be
part of the resultsdb_conventions, I guess.
Second, it makes it difficult to handle the two different kinds of
'scenario' - i.e. it's not obvious how to split off the bits that
identify the 'item' from the bits that identify the 'test scenario'
proper. In this case the 'test scenario' is `install_default.uefi` and
the 'item identifier' is `fedora.25.server-dvd.iso.x86_64`, but there's
no real way to *know* that from the outside, unless we get into
defining separators, which always seems to be a losing game.
Yes, splitting strings is a bad idea.
Another possibility would be to make it a convention to include some
kind of indication of the test 'scenarios' in the extradata for each
result: a 'scenario' key, or something along those lines. This would
make it much easier to include the 'item identifier' and 'test
scenario' proper separately, and you could simply combine them when you
needed the 'complete' scenario.
I'm not sure what do you mean exactly, but having a "scenario" key that
would list all the other keys which are necessary to understand what makes this scenario
unique looks like a reasonable idea. For example:
scenario = [firmware_type, arch] # testcase name and item are implied
The downside is that task authors are required to provide this, and therefore it's
error-prone. I'm not sure how to do it better, though. We could set some reasonable
defaults for each type - so e.g. for koji_build type, we know we compare testcase name +
item (required to be nvr) + arch (if present). For bodhi_update type, it would be
testcase_name + item (required to be Bodhi ID) + last_updated_timestamp. Etc. Anything
above those defaults would need to be in "scenario".
I'm trying to avoid consumers of ResultsDB data having to start
learning about the details of individual test 'sources' in order to be
able to perform this kind of de-duplication. It'd suck if releng had to
learn the openQA 'scenario keys' concept directly, for instance, then
learn corresponding smarts for any other system that submitted results.
Yes, that's definitely an important goal. Initially, I think, we didn't think
about this much. My idea was that each task would document its results structure (it would
become an API basically), and the tools would learn how to consume that. I didn't
expect to have too many important tasks that we use for gating etc. But having common
conventions is definitely easier.