On Thu, 15 Sep 2016 19:10:56 +0200 Josef Skladanka jskladan@redhat.com wrote:
On Thu, Sep 15, 2016 at 4:20 PM, Tim Flink tflink@redhat.com wrote:
On Mon, 15 Aug 2016 22:48:38 +0200 Josef Skladanka jskladan@redhat.com wrote:
Hey gang,
I spent most of today working on the new API docs for ResultsDB, making use of the even better Apiary.io tool.
Before I put even more hours into it, please let me know, whether you think it's fine at all - I'm yet to find a better tool for describing APIs, so I'm definitely biased, but since it's the Documentation, it needs to also be useful.
http://docs.resultsdb20.apiary.io/
I am also trying to put more work towards documenting the attributes and the "usual" queries, so please try and think about this aspect of the docs too.
After the conversation about resultsdb yesterday, I have a proposal for a change to ResultsDB and clarification about how we'd be using it in Taskotron to answer some of the questions I asked earlier.
- Add a null-able column to result to indicate the job it came
from. This could be a URI or just UUID so long as the final URI could be computed from whatever is stored in this new column.
If we go this way, I'd rather add the whole URL, instead of just UUID
- the UUID thing is quite taskotron specific IMHO, but something like
"exec_url" (? I wish I had a better name) could be useful in a more general sense.
Either one is fine with me.
As far as names go, what about ref_url to point at execdb and something like log_url, output_url or artifact_url for what's currently called ref_url and used to be log_url?
- Change "group" to "tag" and plan on it being used for the
grouping of results by/for humans. This isn't something that we'd be making use of right away but it seems like a logical feature to add given where things are going.
After thinking about it today, I'd rather keep the groups as they are now
- this is mostly about semantics, and on the practical level, I'd
expect that "tag" would be unique, and identified by name (the same as testcase). The groups, on the other hand, are identified by UUID, which is not a nice UID for tag, from the semantical point of view, IMO. I could, of course, do the changes, and make the Group (Tag) name-identified, but it is not a minor change, and would take considerable effort to do. The groups, as they are now, can have a description set (it might be a good idea to change it to 'name' though, to express what's it supposed to be in a better way), and thus we can effectively do the same as we would with tags. I also feel, that for other uses than Taskotron (OpenQA, Testdays) - it's easier, and more spot-on to have the groups as they are now - grouping by tag would be possible, but coming up with unique names, that also are meaningful, programatically is tough, and unnecessarily complicated. Generating UUID, and setting a reasonable name is not, on the other hand.
What do you guys think?
You have a point about the uniqueness issue with names vs UIDs. I'm fine with keeping the groups as they are.
This would mean that we can find the job that every result came from
without having to worry about grouping them at submission time. I can think of use cases where there either be no need for a job UUID/URI or one would not exist, hence the suggestion that the column could be empty.
If grouping at submission is the concern here, then it would be more than easy to do - the idea here (maybe I did not communicate it properly) was to use the ExecDB generated UUID as the identifier, the same way we do in the whole stack. Since the Groups can be created "on the fly" (meaning, that if you submit a result, with a group-uuid that is not yet in the database, it is created for you), we would not need to worry about it at all. If we wanted to be a bit more descriptive, we could create the Group, and set the Name/Description during trigger time (probably as a part of creating the execdb job).
This would, of course, lead to having the 'exec_url' set in the Group's 'ref_url' and thus having the back-reference "by convention", as we have it now, with Job. I don't think that either of the options (exec_url in Result, or using group "by convention") is necessarily better than the other, it's mostly about what semantics we want to have.
Storing the exec_url with the result seems simpler to me which is the main reason why I suggested it.
The underlying reason, I brought this up, is that some of our tests create "unecessary" Jobs/Groups (i.e. 1 job to 1 result) at the moment (rpmlint, abicheck, dockerautotest), and whether we think we should handle it differenty. But I think that with what is coming, we'll be adding more of the 1 job X results stuff (distigt tasks, basically), so it is not that big of a deal.
The last question to ask is, whether the "execution grouping" is even something usefull - what do we (would we) use the information that "these X results come from the same execution"? Is it even something we care about? I use the Job overview to have a better idea of which tasks were run (e.g. "when did scratch.dockerautotest run lately"), but that is what ExecDB is for, ultimately. It's just that it's handily available in the resultsdb, so I'm not looking into execdb at all, but I would like to belive, that when we have the dashboard, I'll just use the dashboard, and will forget about both execdb, and resultsdb.
The primary use case I can think of for "execution grouping" is for triage. If I see something odd in a result, one of my first questions is whether other results from this execution show the same issue or if it's isolated to this single result.
In the currently deployed system, the link from resultsdb to execdb is done by linking to the job, looking it up by the UUID from execdb. There is no other data in execdb which points to resultsdb.
One of the use cases for execdb is for the folks maintaining Taskotron to have a single place to track a job from initial trigger to final result and execution status. I'd like to see that link maintained but if it's not stored in resultsdb anymore, where do we put it?
I suppose this brings up the question of whether its wise to continue keeping resultsdb_frontend as the thing users interface with instead of putting a similar (or better) frontend in front of execdb but that's way out of scope for this discussion.
So, even though I made it unnecessarily complicated (and I diverged from the original problem, of course), the question I see is:
What is the actual motivation to switch from the "current" schema (have a group in resultsdb per execution "unit", use that to have exec_url "by convention"), to "have an 'exec_url' column in the Results verbatim.
The biggest one for me is along the lines of "Explicit is better than implicit" but to be honest, I'm more interested in the functionality than the exact implementation.
I don't think that having a "This was executed together" group that big of a deal, the problem I see (and I'm not sure it's really a problem) is that if we have Result in multiple Groups, then linking from ResultsDB to ExecDB can not be done programatically (which group is the "this was executed together", and which are "something else) without relying to convention (like beginning the name of the "exec" group with "Taskotron Execution"). Linking from ExecDB to ResultsDB would be fine, though, as ExecDB has the "execution-unique" UUID.
Like I said for both of the things above, I'm not all that picky about how things are implemented in this case as long as they work the way we need them to work. I would like to at least have a link from execdb to the results that came out of that execution, being able to find all the results in one grouped execution from resultsdb is less important to me because if all else fails, we can find the global UUID in the artifacts path.
Tim