[Resultsdb-users] Re: De-duplicating test results: 'scenarios'

Thursday, 2 March 2017

...
 > Let's even complicate it more. Sometimes it's the same
result
 > repeated again (e.g. new depcheck result with updated repo state),
 > but sometimes it's a different tool submitting the result for the
 > "same" test case. For example, I assume this is the reason why you
 > chose "compose.base_selinux" testcase name instead of
 > "compose.openqa.base_selinux". The idea is that several tools can
 > submit the result for the same test case, so openqa, autocloud or
 > even a manual tester can do it. I'm not currently sold on this idea
 > (sharing the testcase name instead of having
 > "compose.openqa.base_selinux", "compose.autocloud.base_selinux"
and
 > "compose.manual.base_selinux"), but that seems to be the current
 > state. So with this, it's even harder to recognize whether we've
 > received two results from openqa (the latter superseding the former),
 > or whether we received two results from two different tools (and
 > therefore we should consider both).

 Well, I wasn't really thinking about having multiple test systems
 running 'the same' test concurrently. I was only thinking about the
 possibility of moving tests between systems. base_selinux is a fairly
 good example, as it's a trivial test that isn't terribly tied to
 openQA: 'boot a freshly installed system and check SELinux is enabled'.
 We certainly might, at some point, move that test from openQA to some
 other system. I was envisaging that if we can say with a high degree of
 confidence that the new system is truly performing 'the same test', we
 could just 'transfer' the name rather than naming it differently. 
It seems to me that you're envisioning a system where we all work together, we have a
shared definition of test cases (so that base_selinux means the same, whatever tool it is
implemented in), and we all know what we're doing (using the conventions, being
up-to-date to latest test structure and extradata fields, etc). I'm envisioning a
total chaos and a mess :) , with *some* groups sharing the same conventions and
approaches, and other groups or people using the system in their own way.

So let's say there's a person who has a script to check composes in some
interesting way, and later makes some conclusions or stats from it. As long as it is his
personal hobby, he'll probably use user.<fasname> namespace. If other people
find that results interesting, we might make it more official and move it to
`compose.<toolname>` namespace. The important thing is that his results don't
clash with our results, be it testcase names, or extradata, or whether he uses conventions
or not. Only once we decide to use his results for gating, we might need to enforce some
common conventions to be used. Until that time, he's free to use whatever results
structure he wishes, and the consumers need to accommodate. Free for all space.

That's why I'm not completely happy about "polluting" the top level
`compose` namespace, because it means it's now completely reserved to openqa (or other
tools tightly collaborating with it, sharing the test cases, etc). I'd rather create
`compose.openqa`, or maybe `compose.relval` or `compose.qa` (to have a generic name for
release validation performed by us), and everything in there would be openqa's +
related tools' playground. With this approach, we could create `compose.othertool`
namespace and not worry about naming clashes.

I don't really object to sharing testcase names, if it makes sense to you. So
`compose.qa.base_selinux` could be either submitted using openqa, or autocloud, or
manually, or all of them. An alternative approach is to have
`compose.openqa.base_selinux`, `compose.autocloud.base_selinux` and
`compose.manual.base_selinux`, which has different advantages and disadvantages. I
don't care that much, the important part for me is access separation, so that if I
decide to write a new task, I don't really need to care about what testcase names
other tools use, I don't need to sync up with anyone, I don't need to be worried
that I overwrite something important or screw something up, I have a space that is
exclusive to me.

...
 > > 
 > > I can think of two possible ways to handle this: via the extradata, or
 > > via the test case name.
 > > 
 > > openQA has a useful concept here. It defines what combination of
 > > metadata defines a unique test scenario like this, and calls it...well,
 > > that - the 'scenario'. There's a constant definition called
 > > SCENARIO_KEYS in openQA that you can use to discover the appropriate
 > > keys. So I'm going to use the term 'scenario' for this from now
on.
 > > 
 > > There's kinda two levels of scenario, now I think about it, depending
 > > on whether you include 'item' identification in the scenario
definition
 > > or not. For identifying duplicates within the results for a single
 > > item, you don't need to, but it doesn't hurt; for identifying the same
 > > scenario across multiple composes, you do need to.
 > 
 > I don't follow here. In order to identify another execution of the
 > same scenario, you need at least testcase name and item to be exactly
 > the same (and possibly also some metadata). Do you have some examples
 > to show it otherwise?

 Well, this is all about that "possibly also some metadata". *What*
 metadata? How do you, some random releng (or whatever) person trying to
 consume arbitrary ResultsDB data, know *what* "possibly also some"
 metadata you need to look at to identify 'duplicate' results?

 As of right now you have to come ask someone and we say "well, for
 depcheck tests do foo, for upgradepath tests do bar, for openQA tests
 do moo..."

 I'm trying to fix that. 
Yes, that all sounds reasonable. I didn't get your last paragraph, because in my view
you always need 'item' to identify duplicates. But after reading it again, it
seems there's no difference, just choice of words. If you say "For identifying
duplicates within the results for a single item, you don't need to", you actually
just used 'item', so it matches my claim that you always need it.

...

 So concrete examples, okay. Here's two openQA test results:

 https://taskotron.fedoraproject.org/resultsdb/results/12461058
 https://taskotron.fedoraproject.org/resultsdb/results/12460886

 they are both results for testcase 'compose.install_ext3' on item
 'Fedora-Server-dvd-x86_64-Rawhide-20170228.n.0.iso' . In this case,
 they even have the same arch - 'x86_64'. Does this mean they're dupes
 (i.e. the test got restarted for some reason)? No. They're actually
 different tests; one was run on a BIOS VM, one on a UEFI VM. In other
 words, the 'machine' (in openQA terms) is part of the 'scenario' for
 openQA tests. But this is hardly a universal rule; I can't just throw
 openQA's 'machine' setting into the results and tell everyone trying to
 de-duplicate ResultsDB results to look for the 'machine' value. 
You can, but it requires heavy logic at the consumer side, and it's pain to maintain.
We could supply a library to do that (here's a new thought). But I agree it would be
nice to make it simpler ootb, if we can.

...

 The idea is just to be able to say, not "you have to look for same test
 case and item and possibly some metadata", but "you have to look for
 same test case and item and 'scenario' metadata item".

 > > I suppose someone
 > > may have a case for identifying 'the same' test against different
 > > items; for that purpose, you'd need the lighter 'scenario'
definition
 > > (not including the item identifier).
 > 
 > I don't understand this at all, it seems to go against the intended
 > meaning of "item".

 I just mean, say you want to look at all the results for 'the same'
 test but for different tested items; I want to look at the last three
 weeks worth of all x86_64 BIOS compose.install_ext3 tests, or something
 like that. In that case, your 'scenario' does not include the 'item'.

Do I need scenario in this case at all? If I know exactly what I want, why wouldn't I
query testcase=compose.install_ext3&arch=x86_64&firmware=bios ?

I understand you want to introduce scenario for cases when the consumer doesn't want
to have particular knowledge of that system. If the consumer has deep knowledge of this,
what is scenario good for?

...

 > > 
 > > One thing we could do is make it a convention that each test case (and
 > > / or test case name?)
 > 
 > What's the difference between the two?

 I, uh, honestly don't remember what distinction I was trying to draw
 there :/. I think it was about the fact that a 'test case' to ResultsDB
 is a more complex item than just a name - it has a URL and stuff - so
 the 'scenario' properties could possibly be included in something other
 than just the test case name.

 > > Another possibility would be to make it a convention to include some
 > > kind of indication of the test 'scenarios' in the extradata for each
 > > result: a 'scenario' key, or something along those lines. This would
 > > make it much easier to include the 'item identifier' and 'test
 > > scenario' proper separately, and you could simply combine them when you
 > > needed the 'complete' scenario.
 > 
 > I'm not sure what do you mean exactly, but having a "scenario" key
 > that would list all the other keys which are necessary to understand
 > what makes this scenario unique looks like a reasonable idea. For
 > example:
 > scenario = [firmware_type, arch]  # testcase name and item are implied

 Well, it's a simpler idea than that: just include a key that has all
 the necessary *values*. The indirection of having a key that tells you
 what other keys to go look up just seems unnecessarily complex. The
 idea was simply that there'd be an item like this in the metadata:

 scenario:
 fedora.Rawhide.Server-dvd-iso.x86_64.server_realmd_join_kickstart.64bit

 That's an actual openQA scenario: DISTRI.VERSION.FLAVOR.ARCH.TESTCASE.MACHINE

 Then you can look up 'all results for same item, same scenario' (for
 de-duplication) or 'all results for same scenario' (to compare results
 for "the same test" across different composes). 
I guess I need query examples here. I can imagine three options:

a) query
`results?testcases=compose.*&item=Fedora-Server-dvd-x86_64-Rawhide-20170228.n.0.iso`
and then go through all the results, make them unique by eliminating everything that has
the same 'scenario', and work with that

b) query
`results?testcases=compose.*&item=Fedora-Server-dvd-x86_64-Rawhide-20170228.n.0.iso&unique_key=scenario`
and have the duplication filtering implemented inside resultsdb (this way it seems it can
be implemented in a generic fashion), work with that

c) query
`results?testcases=compose.*&item=Fedora-Server-dvd-x86_64-Rawhide-20170228.n.0.iso&scenario=fedora.Rawhide.Server-dvd-iso.x86_64.server_realmd_join_kickstart.64bit`
and work with that. But in this case the consumer would again have to have a deep
knowledge of how the scenario is constructed, which would miss the whole point, I think.

So please specify how exactly you imagine this working from the consumer POV.

...

 I have a diff in right now that would add this to the openQA reporter:

 https://phab.qa.fedoraproject.org/D1155

 since we kinda need it right now for the update stuff (so Bodhi can
 find the correct results to display).

 > The downside is that task authors are required to provide this, and
 > therefore it's error-prone. I'm not sure how to do it better, though.
 > We could set some reasonable defaults for each type - so e.g. for
 > koji_build type, we know we compare testcase name + item (required to
 > be nvr) + arch (if present). For bodhi_update type, it would be
 > testcase_name + item (required to be Bodhi ID) +
 > last_updated_timestamp. Etc. Anything above those defaults would need
 > to be in "scenario".

 Eh, I dunno about having defaults by type. Though I was thinking
 that 'testcase name' is kinda the implicit default; if a result doesn't
 have a 'scenario' item at all, then just assume you should use
 'testcase name'. For any case where it's more complex than that, the
 system that submits the results should provide the scenario info (so
 Taskotron should add a 'scenario' key like 'TESTCASE_NAME.ARCH' for
 Koji results.

 Note I don't think we should include the 'item' in the scenario for all
 the reasons discussed above; it should just be understood that for
 different purposes you might want to query on "scenario + item" or just
 "scenario". 
I don't understand why you would ever want to put testcase name into the scenario
field. Both the item and the scenario are defined externally in a well known fields that
are mandatory to fill in. There's no guessing like with 'arch' or
'firmware' (whether it exists, what the allowed values are). So why wouldn't
you specify only the extra args that make the execution unique in the scenario field? If
you ever need to search through multiple testcases with the same scenario value (I
don't know why you would need that, but let's assume you do), you can always do
that either for a particular testcase like this:
results?testcases=compose.base_selinux&scenario=fedora.Rawhide.Server-dvd-iso.x86_64.64bit
or across many testcases by using a list or a wildcard:
results?testcases=compose.*&scenario=fedora.Rawhide.Server-dvd-iso.x86_64.64bit

I don't think you even want to do this query:
results?scenario=foobar
because you can *bet* that you'll eventually receive some results from
scratch.tmp.testthis.fake namespace or similar. You should always specify at least a
wildcarded testcase (compose.openqa.*). I'd like our system to allow in-development,
proof-of-concept and experimental tools to run and submit results. There will be errors,
mistakes, copycats, everything. But they will be constrained to their namespaces. And so
should you also limit your queries only to those namespaces that are known and maintained
with the level of quality you need for your purposes. You can imagine the concept to be
the same as e.g. github - you always work in your namespace, and there's no global
namespace, because that's just asking for trouble.

Of course, those are all just my visions of how the project should work. Josef often
disagrees with me. If you think some of that should work differently, please disagree with
me as well :)

2024

2023

2022

2021

2020

2019

2018

2017

[Resultsdb-users] Re: De-duplicating test results: 'scenarios'