On Wed, May 3, 2017 at 1:15 AM, Adam Williamson <adamwill@fedoraproject.org> wrote:
On Fri, 2017-03-03 at 15:12 -0800, Adam Williamson wrote:
> If we implement a 'scenario' key, we can then just teach Bodhi to look
> for that and combine it with the item, then maybe if it can't find one,
>  treat 'testname-arch' as the scenario (as a fallback for old results,
> results from Taskotron until we teach it to write 'scenario' items into
> its results, and a general fallback for cases where 'scenario' is
> missing for some reason). That could be the general convention, I
> guess: first look for a 'scenario' key, otherwise try and construct one
> from very commonly available keys.

So I come back, a whole two months older and wiser, with a report from
the field on this ;)

And I again forgot all the details in the meantime :) So take my replies with a grain of salt.
 

In the end we went ahead with more or less my initial plan for the
'scenario' handling: openQA reports a single 'scenario' value in the
extradata for each report it submits, containing a string
representation of the 'scenario' (it does this by smooshing the values
of all the openQA 'scenario keys' together using periods, for the
record).

Then I got to teach Bodhi to handle openQA results. Happily, this
*does* involve dealing with the 'scenario problem' (though the initial
implementation we did in Bodhi ignores it, and winds up omitting some
openQA results). For each update, we run four of the tests twice, once
on a Server image, once on a Workstation image. In openQA's terms, this
is the same test run on a different 'flavor', and 'flavor' is one of
openQA's "scenario keys".

So for reference, here are a couple of sample results:

https://taskotron.fedoraproject.org/resultsdb/results/13573636
https://taskotron.fedoraproject.org/resultsdb/results/13573633

Same 'testcase', same 'item', same 'arch' even...but different
'scenario':

fedora.26.updates-workstation.x86_64.base_selinux.64bit
fedora.26.updates-server.x86_64.base_selinux.64bit

So, first the good news: fundamentally, the concept works. I can tell
Bodhi to query 'all bodhi_update results for the item FEDORA-2017-
4e95959f0d' and then display (only) the latest result for each
'scenario' that appears in the whole set of results.

However, there are two pieces of bad news.

Bad News One: As kparal pointed out in the original thread, the
`/latest` endpoint that was added to ResultsDB 2.0 and intended to help
optimize queries like this doesn't handle 'scenario' cases at all: it
only gives you the single 'latest' result for each test case in your
query, there is no option to ask for 'latest by scenario' rather than
'latest by test case'. As we discussed further in the thread,
implementing this in RDB would be difficult (though not impossible) and
not as efficient as latest-by-testcase, due to details of how RDB
actually stores data. So Bodhi has to either:

1. use `/latest` but then have special knowledge of the cases where it
needs to go and run supplemental queries to handle 'scenario' cases
(which is what it currently does for depcheck, and what my pull request
does for openQA for now)


We already talked about this https://github.com/fedora-infra/bodhi/issues/1471 and you didn't like my proposed solution. But I still think it's a good way forward. We should continue the discussion here.

My goal is be programatically able to determine whether additional queries are needed or not, after receiving the /latest results. The simplest approach is to insist that tasks should only fill 'scenario' key when they need it - then we know we need to pull the whole history. I understand this might require non-trivial changes in openqa code. But is it that insurmountable?

I see only one potential problem here - if you were not able to determine whether scenario is needed or not, neither during coding nor during runtime. I can't imagine such a case right now. Do you have some example of this? My current opinion is that we know well what we're going to test when we write the task. For example initially you write an openqa test for booting using bios only. No need for scenario. But later you extend it with ability to test uefi as well. At that moment you add 'scenario: bios/uefi' to both. Similarly let's say we have a taskotron task to check shell variable expansion. First we target only bash, no scenario. Later we start testing zsh as well, so we add 'scenario: bash/zsh'.

Actually, I just realized one such "don't know in advance" scenario - arch. Supported arches to run on is likely to be a server-side configuration value, and the task can't know whether it will be a single or multi value. So arch-based tasks will always need to put arch in scenario. That's unfortunate.

To answer your concern from github, what if the newly added results (with scenario) would be for the same item, therefore clashing with the previous results with no scenario? No problem, proper results will be available for the next "item". Of course assuming you added scenario key even to the original test case. If you want the new results for the current item, just reschedule the task. /latest will yield results with scenario, the consumer will pull the history, and will need to be to handle the case when some historical results don't have scenario and the new ones do.

 

2. just not use /latest and get *all* the results for the items, then
do the latest-by-scenario work itself. This works, and results in
cleaner Bodhi code, but is slower and makes RDB work harder

/latest has two uses - it's efficient and convenient. It's based on the premise that most test cases will not need to use scenario and therefore deduplication is trivial. If there's a substantial part of test cases that use scenarios, we might get rid of /latest "extra step" and pull all results for all test cases, as we used to. But let's just be aware that this is extremely slow for both the client and the server, as it requires you to do hundreds/thousands of calls (query all test cases one by one). It's a performance killer. I think this approach is not viable long-term. If we don't do it with /latest, we'll have to do some optimizations elsewhere - in the worst case by manually keeping a list of test cases and their supported extra args values. Or by changing the database design completely.
 

Bad News Two: The 'just provide a scenario string' approach fails to
solve a problem I didn't think about until I came to actually work from
the consumer end. Say we get all the openQA results for a given update,
then do the 'latest-by-scenario' filtering using the 'scenario' values.
That works, and now we have all the results we need to display. Great.
But...how do we (Bodhi) *display* them?

Currently Bodhi displays each result as a table row, with four columns
- an icon indicating the outcome; the test case name; the result
'note', if there is one; and the time the result was submitted.

But if we just do that, then you really can't tell the difference
between the results for the 'same test, different flavor' cases. And
the key point is, *RDB doesn't really help Bodhi here*. The ugly,
smooshed-together scenario string does the job I meant it to do - lets
Bodhi identify which results it should display - but it doesn't do a
job it ought to do, but which I didn't think about: it doesn't tell
Bodhi how it can cleanly display the results with the same test case
name so that a reader actually knows what the difference is. Bodhi
*could* just stuff the scenario name in the table somewhere (or use it
instead of the test case name), but it wasn't actually designed to be
read and looks pretty ugly.

The solution seems pretty easy here. Same as test cases should be quite human readable, so that the reader can easily understand what's going on, let's keep the scenario string also short and readable. There's no reason (apart from technical ones on openqa side) to have these ugly scenarios:

fedora.26.updates-workstation.x86_64.base_selinux.64bit
fedora.26.updates-server.x86_64.base_selinux.64bit

Really, they should be just this instead:

workstation
server

Everything else in there is junk (duplicating info from elsewhere). So let's just fix the scenario strings in openqa?

Of course, we can't forbid people using long and ugly scenarios. But UIs have a way to deal with that. Just display 10 characters with an ellipsis in the middle (and expand on click or mouse over). I believe we'll need to have some level of quality for results shown in Bodhi anyway (in the future), so we won't show every random task that somebody has coded up for his personal pleasure. And for those high quality, useful results, we and package maintainers will push their developers to keep things readable.

Honestly, people will not care about the results unless they fail. And for the few failing results, even a longer scenario (as openqa currently produces) wouldn't be that much of a problem, I feel.
 

So I wound up just having to do ugly special-case stuff again, and
effectively 'teach' Bodhi that the 'scenario' element that can actually
vary between openQA results at present is the 'flavor', and teach it to
parse the flavor out of the scenario value (stupidly, we don't actually
report a 'flavor' key in the extradata...I should fix that) and display
it as a label next to the test case name. (There is also pre-existing
similar 'special casing' to have it display the arch as a label, for
the dist.depcheck case we've talked about before.)

Let me use your own quote against you:
> I dunno why you always want to make the scenario concept so complicated. -- adamw
And now you're parsing it! That is a road to hell :-)

(Even if you didn't parse it and had it as an extra field, that still presents a lot of added complexity if you want to make it generic, and not hardcoded to openqa results).
 

Addressing Bad News One is a bit difficult, as mentioned. But perhaps
we can address Bad News Two with a second try at implementing the
'scenario' concept in results. I think, somewhere back in the thread,
kparal suggested that each result could include each scenario-relevant
value in its extradata, and also indicate which keys were the 'scenario
keys' for the result. I thought that was over-complicated at the time,
but it'd actually help address Bad News Two. Bodhi would then have the
opportunity to, say, compare all the scenario values for the results
for the item, ignore ones that are the same for *every* result, and for
the values that actually do vary across some of the results, display
those values in the result table. That wouldn't require it to have
special knowledge.

To me, this seems to me like a lot of work and complexity for something that isn't even a problem. Keeping the scenario string short and readable is the simple solution here. And even in the worst case with long scenario strings, I don't think it hurts so much.


Now, going back to Bad News One. I'm going back and forth on approaches to that one. We have this triangle:

   simple
   /    \
fast -- generic

And it seems we can only get two vertices out of three, as usual. We have the simple generic solution right now. The fast generic solution would be the one with "just latest results" middleware, or alternatively lots of latest/scenario optimization, possibly combined with keeping of testcases+available extra args lists to be used by consumers. Lots of complexity for everyone. The simple and fast solution is making the database schema directly tailored to our Fedora use cases, making scenario searches fast.

Now, regarding sacrificing generality. I think the fact that RHEL folks are interested in ResultsDB is partly due to the fact that we decided to keep the project generic in the first place. If we tied it heavily to Fedora, maybe they wouldn't be able to use it internally. That being said, there needs to be a reasonable limit on this. If we don't expect ResultsDB to be used in other distributions any time soon, and we don't plan to use it for other projects outside of release validation, maybe we can start tailoring it towards the common denominator of Fedora+RHEL+CentOS use cases? This would mean making scenario a mandatory, indexed field for fast searches. Maybe having arch as a first-class field as well. And maybe some other optimizations.

Another idea would be to revise what we actually expect from ResultsDB. Do we really need it to hold every submitted result for all the history? I don't think we really do, in Fedora. And we definitely don't need 16000 depcheck results for a NVR, that's bat-shit crazy. So we could make ResultsDB prune the outdated results regularly (and here I mean pruning the database itself, not standing up a middleware for this). The actual policy can be very flexible, the pruning could happen on each new result, or regularly once a day during quiet hours, and we could keep only the latest result, or the last X results, or all the last results for the past X days before the latest result, etc. This pruning would be done by a different process, with its own logic (which would implement how to recognize unique results), so the database itself would still be generic, unaware of the pruning process and logic. If we did this, it would probably solve most of our performance issues, and we would probably be able to use the simplest approach "give me all of history" and process that in the consumer. But here we might clash with the RHEL use cases - pruning outdated results might be a no-go for them. And while it's true that they're able to throw more hardware at it, to counteract lack of pruning, it might not scale good enough. A good topic to discuss, if we decide to investigate this approach.