On Thu, Sep 15, 2016 at 4:20 PM, Tim Flink <tflink(a)redhat.com> wrote:
On Mon, 15 Aug 2016 22:48:38 +0200
Josef Skladanka <jskladan(a)redhat.com> wrote:
> Hey gang,
>
> I spent most of today working on the new API docs for ResultsDB,
> making use of the even better Apiary.io tool.
>
> Before I put even more hours into it, please let me know, whether you
> think it's fine at all - I'm yet to find a better tool for describing
> APIs, so I'm definitely biased, but since it's the Documentation, it
> needs to also be useful.
>
>
http://docs.resultsdb20.apiary.io/
>
> I am also trying to put more work towards documenting the attributes
> and the "usual" queries, so please try and think about this aspect of
> the docs too.
After the conversation about resultsdb yesterday, I have a proposal for
a change to ResultsDB and clarification about how we'd be using it in
Taskotron to answer some of the questions I asked earlier.
1. Add a null-able column to result to indicate the job it came from.
This could be a URI or just UUID so long as the final URI could be
computed from whatever is stored in this new column.
If we go this way, I'd rather add the whole URL, instead of just UUID - the
UUID thing is quite taskotron specific IMHO, but something like "exec_url"
(? I wish I had a better name) could be useful in a more general sense.
2. Change "group" to "tag" and plan on it being
used for the grouping
of results by/for humans. This isn't something that we'd be making
use of right away but it seems like a logical feature to add given
where things are going.
After thinking about it today, I'd rather keep the groups as they are now
- this is mostly about semantics, and on the practical level, I'd expect
that "tag" would be unique, and identified by name (the same as testcase).
The groups, on the other hand, are identified by UUID, which is not a nice
UID for tag, from the semantical point of view, IMO.
I could, of course, do the changes, and make the Group (Tag)
name-identified, but it is not a minor change, and would take considerable
effort to do.
The groups, as they are now, can have a description set (it might be a good
idea to change it to 'name' though, to express what's it supposed to be in
a better way), and thus we can effectively do the same as we would with
tags.
I also feel, that for other uses than Taskotron (OpenQA, Testdays) - it's
easier, and more spot-on to have the groups as they are now - grouping by
tag would be possible, but coming up with unique names, that also are
meaningful, programatically is tough, and unnecessarily complicated.
Generating UUID, and setting a reasonable name is not, on the other hand.
What do you guys think?
This would mean that we can find the job that every result came from
without having to worry about grouping them at submission time. I
can
think of use cases where there either be no need for a job UUID/URI or
one would not exist, hence the suggestion that the column could be
empty.
If grouping at submission is the concern here, then it would be more than
easy to do - the idea here (maybe I did not communicate it properly) was to
use the ExecDB generated UUID as the identifier, the same way we do in the
whole stack.
Since the Groups can be created "on the fly" (meaning, that if you submit a
result, with a group-uuid that is not yet in the database, it is created
for you), we would not need to worry about it at all.
If we wanted to be a bit more descriptive, we could create the Group, and
set the Name/Description during trigger time (probably as a part of
creating the execdb job).
This would, of course, lead to having the 'exec_url' set in the Group's
'ref_url' and thus having the back-reference "by convention", as we have
it
now, with Job. I don't think that either of the options (exec_url in
Result, or using group "by convention") is necessarily better than the
other, it's mostly about what semantics we want to have.
The underlying reason, I brought this up, is that some of our tests create
"unecessary" Jobs/Groups (i.e. 1 job to 1 result) at the moment (rpmlint,
abicheck, dockerautotest), and whether we think we should handle it
differenty. But I think that with what is coming, we'll be adding more of
the 1 job X results stuff (distigt tasks, basically), so it is not that big
of a deal.
The last question to ask is, whether the "execution grouping" is even
something usefull - what do we (would we) use the information that "these X
results come from the same execution"? Is it even something we care about?
I use the Job overview to have a better idea of which tasks were run (e.g.
"when did scratch.dockerautotest run lately"), but that is what ExecDB is
for, ultimately. It's just that it's handily available in the resultsdb, so
I'm not looking into execdb at all, but I would like to belive, that when
we have the dashboard, I'll just use the dashboard, and will forget about
both execdb, and resultsdb.
So, even though I made it unnecessarily complicated (and I diverged from
the original problem, of course), the question I see is:
What is the actual motivation to switch from the "current" schema (have a
group in resultsdb per execution "unit", use that to have exec_url "by
convention"), to "have an 'exec_url' column in the Results verbatim.
I don't think that having a "This was executed together" group that big of
a deal, the problem I see (and I'm not sure it's really a problem) is that
if we have Result in multiple Groups, then linking from ResultsDB to ExecDB
can not be done programatically (which group is the "this was executed
together", and which are "something else) without relying to convention
(like beginning the name of the "exec" group with "Taskotron
Execution").
Linking from ExecDB to ResultsDB would be fine, though, as ExecDB has the
"execution-unique" UUID.
Thoughts? :)