[Resultsdb-users] Re: De-duplicating test results: 'scenarios'

Wednesday, 10 May 2017

On Wed, May 3, 2017 at 1:15 AM, Adam Williamson <adamwill(a)fedoraproject.org&gt;
wrote:

...
 On Fri, 2017-03-03 at 15:12 -0800, Adam Williamson wrote:
 > If we implement a 'scenario' key, we can then just teach Bodhi to look
 > for that and combine it with the item, then maybe if it can't find one,
 >  treat 'testname-arch' as the scenario (as a fallback for old results,
 > results from Taskotron until we teach it to write 'scenario' items into
 > its results, and a general fallback for cases where 'scenario' is
 > missing for some reason). That could be the general convention, I
 > guess: first look for a 'scenario' key, otherwise try and construct one
 > from very commonly available keys.

 So I come back, a whole two months older and wiser, with a report from
 the field on this ;)

And I again forgot all the details in the meantime :) So take my replies
with a grain of salt.

...

 In the end we went ahead with more or less my initial plan for the
 'scenario' handling: openQA reports a single 'scenario' value in the
 extradata for each report it submits, containing a string
 representation of the 'scenario' (it does this by smooshing the values
 of all the openQA 'scenario keys' together using periods, for the
 record).

 Then I got to teach Bodhi to handle openQA results. Happily, this
 *does* involve dealing with the 'scenario problem' (though the initial
 implementation we did in Bodhi ignores it, and winds up omitting some
 openQA results). For each update, we run four of the tests twice, once
 on a Server image, once on a Workstation image. In openQA's terms, this
 is the same test run on a different 'flavor', and 'flavor' is one of
 openQA's "scenario keys".

 So for reference, here are a couple of sample results:

 https://taskotron.fedoraproject.org/resultsdb/results/13573636
 https://taskotron.fedoraproject.org/resultsdb/results/13573633

 Same 'testcase', same 'item', same 'arch' even...but different
 'scenario':

 fedora.26.updates-workstation.x86_64.base_selinux.64bit
 fedora.26.updates-server.x86_64.base_selinux.64bit

 So, first the good news: fundamentally, the concept works. I can tell
 Bodhi to query 'all bodhi_update results for the item FEDORA-2017-
 4e95959f0d' and then display (only) the latest result for each
 'scenario' that appears in the whole set of results.

 However, there are two pieces of bad news.

 Bad News One: As kparal pointed out in the original thread, the
 `/latest` endpoint that was added to ResultsDB 2.0 and intended to help
 optimize queries like this doesn't handle 'scenario' cases at all: it
 only gives you the single 'latest' result for each test case in your
 query, there is no option to ask for 'latest by scenario' rather than
 'latest by test case'. As we discussed further in the thread,
 implementing this in RDB would be difficult (though not impossible) and
 not as efficient as latest-by-testcase, due to details of how RDB
 actually stores data. So Bodhi has to either:

 1. use `/latest` but then have special knowledge of the cases where it
 needs to go and run supplemental queries to handle 'scenario' cases
 (which is what it currently does for depcheck, and what my pull request
 does for openQA for now)

We already talked about this
https://github.com/fedora-infra/bodhi/issues/1471 and you didn't like my
proposed solution. But I still think it's a good way forward. We should
continue the discussion here.

My goal is be programatically able to determine whether additional queries
are needed or not, after receiving the /latest results. The simplest
approach is to insist that tasks should only fill 'scenario' key when they
need it - then we know we need to pull the whole history. I understand this
might require non-trivial changes in openqa code. But is it that
insurmountable?

I see only one potential problem here - if you were not able to determine
whether scenario is needed or not, neither during coding nor during
runtime. I can't imagine such a case right now. Do you have some example of
this? My current opinion is that we know well what we're going to test when
we write the task. For example initially you write an openqa test for
booting using bios only. No need for scenario. But later you extend it with
ability to test uefi as well. At that moment you add 'scenario: bios/uefi'
to both. Similarly let's say we have a taskotron task to check shell
variable expansion. First we target only bash, no scenario. Later we start
testing zsh as well, so we add 'scenario: bash/zsh'.

Actually, I just realized one such "don't know in advance" scenario - arch.
Supported arches to run on is likely to be a server-side configuration
value, and the task can't know whether it will be a single or multi value.
So arch-based tasks will always need to put arch in scenario. That's
unfortunate.

To answer your concern from github, what if the newly added results (with
scenario) would be for the same item, therefore clashing with the previous
results with no scenario? No problem, proper results will be available for
the next "item". Of course assuming you added scenario key even to the
original test case. If you want the new results for the current item, just
reschedule the task. /latest will yield results with scenario, the consumer
will pull the history, and will need to be to handle the case when some
historical results don't have scenario and the new ones do.

...

 2. just not use /latest and get *all* the results for the items, then
 do the latest-by-scenario work itself. This works, and results in
 cleaner Bodhi code, but is slower and makes RDB work harder

/latest has two uses - it's efficient and convenient. It's based on the
premise that most test cases will not need to use scenario and therefore
deduplication is trivial. If there's a substantial part of test cases that
use scenarios, we might get rid of /latest "extra step" and pull all
results for all test cases, as we used to. But let's just be aware that
this is extremely slow for both the client and the server, as it requires
you to do hundreds/thousands of calls (query all test cases one by one).
It's a performance killer. I think this approach is not viable long-term.
If we don't do it with /latest, we'll have to do some optimizations
elsewhere - in the worst case by manually keeping a list of test cases and
their supported extra args values. Or by changing the database design
completely.

...

 Bad News Two: The 'just provide a scenario string' approach fails to
 solve a problem I didn't think about until I came to actually work from
 the consumer end. Say we get all the openQA results for a given update,
 then do the 'latest-by-scenario' filtering using the 'scenario' values.
 That works, and now we have all the results we need to display. Great.
 But...how do we (Bodhi) *display* them?

 Currently Bodhi displays each result as a table row, with four columns
 - an icon indicating the outcome; the test case name; the result
 'note', if there is one; and the time the result was submitted.

 But if we just do that, then you really can't tell the difference
 between the results for the 'same test, different flavor' cases. And
 the key point is, *RDB doesn't really help Bodhi here*. The ugly,
 smooshed-together scenario string does the job I meant it to do - lets
 Bodhi identify which results it should display - but it doesn't do a
 job it ought to do, but which I didn't think about: it doesn't tell
 Bodhi how it can cleanly display the results with the same test case
 name so that a reader actually knows what the difference is. Bodhi
 *could* just stuff the scenario name in the table somewhere (or use it
 instead of the test case name), but it wasn't actually designed to be
 read and looks pretty ugly.

The solution seems pretty easy here. Same as test cases should be quite
human readable, so that the reader can easily understand what's going on,
let's keep the scenario string also short and readable. There's no reason
(apart from technical ones on openqa side) to have these ugly scenarios:

fedora.26.updates-workstation.x86_64.base_selinux.64bit
fedora.26.updates-server.x86_64.base_selinux.64bit

Really, they should be just this instead:

workstation
server

Everything else in there is junk (duplicating info from elsewhere). So
let's just fix the scenario strings in openqa?

Of course, we can't forbid people using long and ugly scenarios. But UIs
have a way to deal with that. Just display 10 characters with an ellipsis
in the middle (and expand on click or mouse over). I believe we'll need to
have some level of quality for results shown in Bodhi anyway (in the
future), so we won't show every random task that somebody has coded up for
his personal pleasure. And for those high quality, useful results, we and
package maintainers will push their developers to keep things readable.

Honestly, people will not care about the results unless they fail. And for
the few failing results, even a longer scenario (as openqa currently
produces) wouldn't be that much of a problem, I feel.

...

 So I wound up just having to do ugly special-case stuff again, and
 effectively 'teach' Bodhi that the 'scenario' element that can actually
 vary between openQA results at present is the 'flavor', and teach it to
 parse the flavor out of the scenario value (stupidly, we don't actually
 report a 'flavor' key in the extradata...I should fix that) and display
 it as a label next to the test case name. (There is also pre-existing
 similar 'special casing' to have it display the arch as a label, for
 the dist.depcheck case we've talked about before.)

Let me use your own quote against you:
...
 I dunno why you always want to make the scenario concept so
*complicated*. -- adamw
And now you're parsing it! That is a road to hell :-)

(Even if you didn't parse it and had it as an extra field, that still
presents a lot of added complexity if you want to make it generic, and not
hardcoded to openqa results).

...

 Addressing Bad News One is a bit difficult, as mentioned. But perhaps
 we can address Bad News Two with a second try at implementing the
 'scenario' concept in results. I think, somewhere back in the thread,
 kparal suggested that each result could include each scenario-relevant
 value in its extradata, and also indicate which keys were the 'scenario
 keys' for the result. I thought that was over-complicated at the time,
 but it'd actually help address Bad News Two. Bodhi would then have the
 opportunity to, say, compare all the scenario values for the results
 for the item, ignore ones that are the same for *every* result, and for
 the values that actually do vary across some of the results, display
 those values in the result table. That wouldn't require it to have
 special knowledge.

To me, this seems to me like a lot of work and complexity for something
that isn't even a problem. Keeping the scenario string short and readable
is the simple solution here. And even in the worst case with long scenario
strings, I don't think it hurts so much.

Now, going back to Bad News One. I'm going back and forth on approaches to
that one. We have this triangle:

   simple
   /    \
fast -- generic

And it seems we can only get two vertices out of three, as usual. We have
the simple generic solution right now. The fast generic solution would be
the one with "just latest results" middleware, or alternatively lots of
latest/scenario optimization, possibly combined with keeping of
testcases+available extra args lists to be used by consumers. Lots of
complexity for everyone. The simple and fast solution is making the
database schema directly tailored to our Fedora use cases, making scenario
searches fast.

Now, regarding sacrificing generality. I think the fact that RHEL folks are
interested in ResultsDB is partly due to the fact that we decided to keep
the project generic in the first place. If we tied it heavily to Fedora,
maybe they wouldn't be able to use it internally. That being said, there
needs to be a reasonable limit on this. If we don't expect ResultsDB to be
used in other distributions any time soon, and we don't plan to use it for
other projects outside of release validation, maybe we can start tailoring
it towards the common denominator of Fedora+RHEL+CentOS use cases? This
would mean making scenario a mandatory, indexed field for fast searches.
Maybe having arch as a first-class field as well. And maybe some other
optimizations.

Another idea would be to revise what we actually expect from ResultsDB. Do
we really need it to hold every submitted result for all the history? I
don't think we really do, in Fedora. And we definitely don't need 16000
depcheck results for a NVR, that's bat-shit crazy. So we could make
ResultsDB prune the outdated results regularly (and here I mean pruning the
database itself, not standing up a middleware for this). The actual policy
can be very flexible, the pruning could happen on each new result, or
regularly once a day during quiet hours, and we could keep only the latest
result, or the last X results, or all the last results for the past X days
before the latest result, etc. This pruning would be done by a different
process, with its own logic (which would implement how to recognize unique
results), so the database itself would still be generic, unaware of the
pruning process and logic. If we did this, it would probably solve most of
our performance issues, and we would probably be able to use the simplest
approach "give me all of history" and process that in the consumer. But
here we might clash with the RHEL use cases - pruning outdated results
might be a no-go for them. And while it's true that they're able to throw
more hardware at it, to counteract lack of pruning, it might not scale good
enough. A good topic to discuss, if we decide to investigate this approach.

2024

2023

2022

2021

2020

2019

2018

2017

[Resultsdb-users] Re: De-duplicating test results: 'scenarios'