On Thu, 15 Sep 2016 19:10:56 +0200
Josef Skladanka <jskladan(a)redhat.com> wrote:
On Thu, Sep 15, 2016 at 4:20 PM, Tim Flink <tflink(a)redhat.com>
> On Mon, 15 Aug 2016 22:48:38 +0200
> Josef Skladanka <jskladan(a)redhat.com> wrote:
> > Hey gang,
> > I spent most of today working on the new API docs for ResultsDB,
> > making use of the even better Apiary.io tool.
> > Before I put even more hours into it, please let me know, whether
> > you think it's fine at all - I'm yet to find a better tool for
> > describing APIs, so I'm definitely biased, but since it's the
> > Documentation, it needs to also be useful.
> > http://docs.resultsdb20.apiary.io/
> > I am also trying to put more work towards documenting the
> > attributes and the "usual" queries, so please try and think about
> > this aspect of the docs too.
> After the conversation about resultsdb yesterday, I have a proposal
> for a change to ResultsDB and clarification about how we'd be using
> it in Taskotron to answer some of the questions I asked earlier.
> 1. Add a null-able column to result to indicate the job it came
> from. This could be a URI or just UUID so long as the final URI
> could be computed from whatever is stored in this new column.
If we go this way, I'd rather add the whole URL, instead of just UUID
- the UUID thing is quite taskotron specific IMHO, but something like
"exec_url" (? I wish I had a better name) could be useful in a more
Either one is fine with me.
As far as names go, what about ref_url to point at execdb and something
like log_url, output_url or artifact_url for what's currently called
ref_url and used to be log_url?
> 2. Change "group" to "tag" and plan on it
being used for the
> grouping of results by/for humans. This isn't something that we'd
> be making use of right away but it seems like a logical feature to
> add given where things are going.
After thinking about it today, I'd rather keep the groups as they are
- this is mostly about semantics, and on the practical level, I'd
expect that "tag" would be unique, and identified by name (the same
as testcase). The groups, on the other hand, are identified by UUID,
which is not a nice UID for tag, from the semantical point of view,
IMO. I could, of course, do the changes, and make the Group (Tag)
name-identified, but it is not a minor change, and would take
considerable effort to do.
The groups, as they are now, can have a description set (it might be
a good idea to change it to 'name' though, to express what's it
supposed to be in a better way), and thus we can effectively do the
same as we would with tags.
I also feel, that for other uses than Taskotron (OpenQA, Testdays) -
it's easier, and more spot-on to have the groups as they are now -
grouping by tag would be possible, but coming up with unique names,
that also are meaningful, programatically is tough, and unnecessarily
complicated. Generating UUID, and setting a reasonable name is not,
on the other hand.
What do you guys think?
You have a point about the uniqueness issue with names vs UIDs. I'm
fine with keeping the groups as they are.
This would mean that we can find the job that every result came from
> without having to worry about grouping them at submission time. I
> can think of use cases where there either be no need for a job
> UUID/URI or one would not exist, hence the suggestion that the
> column could be empty.
If grouping at submission is the concern here, then it would be more
than easy to do - the idea here (maybe I did not communicate it
properly) was to use the ExecDB generated UUID as the identifier, the
same way we do in the whole stack.
Since the Groups can be created "on the fly" (meaning, that if you
submit a result, with a group-uuid that is not yet in the database,
it is created for you), we would not need to worry about it at all.
If we wanted to be a bit more descriptive, we could create the Group,
and set the Name/Description during trigger time (probably as a part
of creating the execdb job).
This would, of course, lead to having the 'exec_url' set in the
Group's 'ref_url' and thus having the back-reference "by
as we have it now, with Job. I don't think that either of the options
(exec_url in Result, or using group "by convention") is necessarily
better than the other, it's mostly about what semantics we want to
Storing the exec_url with the result seems simpler to me which is the
main reason why I suggested it.
The underlying reason, I brought this up, is that some of our tests
create "unecessary" Jobs/Groups (i.e. 1 job to 1 result) at the
moment (rpmlint, abicheck, dockerautotest), and whether we think we
should handle it differenty. But I think that with what is coming,
we'll be adding more of the 1 job X results stuff (distigt tasks,
basically), so it is not that big of a deal.
The last question to ask is, whether the "execution grouping" is even
something usefull - what do we (would we) use the information that
"these X results come from the same execution"? Is it even something
we care about? I use the Job overview to have a better idea of which
tasks were run (e.g. "when did scratch.dockerautotest run lately"),
but that is what ExecDB is for, ultimately. It's just that it's
handily available in the resultsdb, so I'm not looking into execdb at
all, but I would like to belive, that when we have the dashboard,
I'll just use the dashboard, and will forget about both execdb, and
The primary use case I can think of for "execution grouping" is for
triage. If I see something odd in a result, one of my first questions
is whether other results from this execution show the same issue or if
it's isolated to this single result.
In the currently deployed system, the link from resultsdb to execdb is
done by linking to the job, looking it up by the UUID from execdb.
There is no other data in execdb which points to resultsdb.
One of the use cases for execdb is for the folks maintaining Taskotron
to have a single place to track a job from initial trigger to final
result and execution status. I'd like to see that link maintained but
if it's not stored in resultsdb anymore, where do we put it?
I suppose this brings up the question of whether its wise to continue
keeping resultsdb_frontend as the thing users interface with instead of
putting a similar (or better) frontend in front of execdb but that's
way out of scope for this discussion.
So, even though I made it unnecessarily complicated (and I diverged
from the original problem, of course), the question I see is:
What is the actual motivation to switch from the "current" schema
(have a group in resultsdb per execution "unit", use that to have
exec_url "by convention"), to "have an 'exec_url' column in the
The biggest one for me is along the lines of "Explicit is better than
implicit" but to be honest, I'm more interested in the functionality
than the exact implementation.
I don't think that having a "This was executed
together" group that
big of a deal, the problem I see (and I'm not sure it's really a
problem) is that if we have Result in multiple Groups, then linking
from ResultsDB to ExecDB can not be done programatically (which group
is the "this was executed together", and which are "something else)
without relying to convention (like beginning the name of the "exec"
group with "Taskotron Execution"). Linking from ExecDB to ResultsDB
would be fine, though, as ExecDB has the "execution-unique" UUID.
Like I said for both of the things above, I'm not all that picky about
how things are implemented in this case as long as they work the way we
need them to work. I would like to at least have a link from execdb to
the results that came out of that execution, being able to find all the
results in one grouped execution from resultsdb is less important to me
because if all else fails, we can find the global UUID in the artifacts