De-duplicating test results: 'scenarios'

Friday, 17 February 2017

Hi folks! So rather than send a welcome mail I figured let's get right
into something real ;) I actually got the idea for this list because I
wrote up some thoughts on something, and realized there was nowhere
good to send it. So I invented this mailing list. And now I can send
it! Here it is.

I've been thinking some more about convenient consumption of ResultsDB
results. I'm thinking about the problem of generic handling of
duplicated / repeated tests.

Say Alice from release engineering wants to decide whether a given
deliverable is of release quality. With ResultsDB, Alice can query all
test results for that deliverable, regardless of where they came from.
Great. However, it's not uncommon for a test to have been repeated. Say
the test failed, the failure was investigated and determined to be a
bug in the test, and the test was repeated and passed. Both results
wind up in ResultsDB; we have a fail, then a pass.

How does Alice conveniently and *without special knowledge of the
system that submitted the result* identify the second result as being
for 'the same test' as the first, and thus know she can consider only
the second (most recent) result, and not worry about the failure? (I'm
expecting that to usually be the desired behaviour). There are also
other situations in which it's useful to be able to identify 'the same
test' for different executions; for instance, `check-compose` needs to
do this when it does its 'system information comparison' checks from
compose to compose.

I guess it's worth noting that this is somewhat related to the similar
question for test 'items' (the 'thing' being tested, in ResultsDB
parlance) - the question of uniquely identifying 'the same' item within
and across composes. At least for productmd 'images', myself and
lsedlar are currently discussing that in
https://pagure.io/pungi/issue/525 . Obviously it's more or less a
solved problem for RPMs.

I can think of two possible ways to handle this: via the extradata, or
via the test case name.

openQA has a useful concept here. It defines what combination of
metadata defines a unique test scenario like this, and calls it...well,
that - the 'scenario'. There's a constant definition called
SCENARIO_KEYS in openQA that you can use to discover the appropriate
keys. So I'm going to use the term 'scenario' for this from now on.

There's kinda two levels of scenario, now I think about it, depending
on whether you include 'item' identification in the scenario definition
or not. For identifying duplicates within the results for a single
item, you don't need to, but it doesn't hurt; for identifying the same
scenario across multiple composes, you do need to. I suppose someone
may have a case for identifying 'the same' test against different
items; for that purpose, you'd need the lighter 'scenario' definition
(not including the item identifier).

One thing we could do is make it a convention that each test case (and
/ or test case name?) indicates a test 'scenario' - such that all
results for the same test case for the same item should be considered
'duplicates' in this sense, and consumers can confidently count all
results for the same test case as results for the same test 'scenario'.
This seems to me like the simplest possibility, but I do see two
potential issues.

First, there's a possibility it may result in rather long and unwieldy
test case names in some situations. If we take the more complete
'scenario' definition and include sufficient information to uniquely
identify the item, The test case name for an openQA test that includes
sufficient information to uniquely identify the item under test may
look something like: `fedora.25.server-dvd-
iso.x86_64.install_default.uefi` (and that's with a fairly short test
name).

Second, it makes it difficult to handle the two different kinds of
'scenario' - i.e. it's not obvious how to split off the bits that
identify the 'item' from the bits that identify the 'test scenario'
proper. In this case the 'test scenario' is `install_default.uefi` and
the 'item identifier' is `fedora.25.server-dvd.iso.x86_64`, but there's
no real way to *know* that from the outside, unless we get into
defining separators, which always seems to be a losing game.

Another possibility would be to make it a convention to include some
kind of indication of the test 'scenarios' in the extradata for each
result: a 'scenario' key, or something along those lines. This would
make it much easier to include the 'item identifier' and 'test
scenario' proper separately, and you could simply combine them when you
needed the 'complete' scenario.

I'm trying to avoid consumers of ResultsDB data having to start
learning about the details of individual test 'sources' in order to be
able to perform this kind of de-duplication. It'd suck if releng had to
learn the openQA 'scenario keys' concept directly, for instance, then
learn corresponding smarts for any other system that submitted results.

Any thoughts on this? Any better ideas? Any existing work? Thanks!
-- 
Adam Williamson
Fedora QA Community Monkey
IRC: adamw | Twitter: AdamW_Fedora | XMPP: adamw AT happyassassin . net
http://www.happyassassin.net

2024

2023

2022

2021

2020

2019

2018

2017