Resultsdb v2.0 - API docs - qa-devel

Resultsdb v2.0 - API docs

[Fedora QA] #494: F25 Atomic Test...

What to do with fedora-qa...

Josef Skladanka

Monday, 15 August 2016 Mon, 15 Aug '16

3:48 p.m.

Hey gang, I spent most of today working on the new API docs for ResultsDB, making use of the even better Apiary.io tool. Before I put even more hours into it, please let me know, whether you think it's fine at all - I'm yet to find a better tool for describing APIs, so I'm definitely biased, but since it's the Documentation, it needs to also be useful. http://docs.resultsdb20.apiary.io/ I am also trying to put more work towards documenting the attributes and the "usual" queries, so please try and think about this aspect of the docs too. Thanks, Joza

Attachments:

attachment.html (text/html — 746 bytes)

Show replies by date

Josef Skladanka

Thursday, 18 August Thu, 18 Aug

7:28 a.m.

So, I have completed the first draft of the ResultsDB 2.0 API. The documentation lives here: http://docs.resultsdb20.apiary.io/# and I'd be glad if you could have a look at it. The overall idea is still not changed - ResultsDB should be a "dumb" results store, that knows next to nothing (if not nothing at all) about the semantics/meaning of the data stored, and this should be applied in the consumer. This is why, for example, no result override is planned, although it might make sense to override a known fail to pas for some usecase (like gating), it might not be the right thing to do for some other tool in the pipeline, thus the override needs to happen at the consumer side. What's not covered in detail is auth model - I only reflected it by acknowledging the probable future presence of some kind of auth in the POST queries (reserved _auth parameter), but the actual implementation is not a problem to solve today. On top of that I'd also want to know (and this is probably mostly question for Ralph), whether it makes sense to try and keep both the old and new API up for some time. It should not be that complicated to do, I'd just rather not spend too much time on it, as changing the consumers (bodhi, as far as I know) is most probably much less time consuming than keeping the old API running. At the moment, I will probably make it happen, but if we agree it's not worth the time... Feel free to post comments/feature requests/whatever - I'd love for this to be stable (or at least a base for non-breaking changes) for at least next few years (lol I know, right...), so let's do it right :) joza On Mon, Aug 15, 2016 at 10:48 PM, Josef Skladanka <jskladan(a)redhat.com> wrote:

...

Tim Flink

Monday, 12 September Mon, 12 Sep

4:39 p.m.

On Mon, 15 Aug 2016 22:48:38 +0200 Josef Skladanka <jskladan(a)redhat.com> wrote:

...

I think we talked about this in person earlier but I didn't write any notes about it and I don't recall the details. How exactly are we going to be using Groups? The first thing that comes to mind is to group results by execution so that there would be a group of results which were all produced from the same run of the same task. That's kinda what we're using Job for in resultsdb 1.0 right now, anyways. I realize that the docs for resultsdb are supposed to be not-specific-to-taskotron but was there anything else we thought the Group might be useful for? Also, what do we want to do about a link to execdb? If we're planning to have a group for each execution's results, that could be the group's ref_url but that relies on convention which could change if Group is used for more than just grouping results by execution. I assume that the new API will also help fix some of the slowness we've been seeing? IIRC, there were some schema changes which would probably help with query time. Tim

Josef Skladanka

Wednesday, 14 September Wed, 14 Sep

4:27 a.m.

On Mon, Sep 12, 2016 at 11:39 PM, Tim Flink <tflink(a)redhat.com> wrote:

...

I see two (maybe three) options - either we'll be using the groups in the same way we used Jobs - to group results per execution as we do now, and use the group's ref_url to point to execdb. If we ever need to use the groups for more than that, then we could just have the result in more than one group, and set meaningful descriptions. The other way would be to not use Groups at all, and just store the execdb's UUID in the key-value store. Those would then be rendered to an URL in the frontend. Last option would be a combination of both - we'd be using Groups as we do now to group by execution, but instead of the "default" resultsdb_frontend, we'd use something tailored for taskotron - we could show links to execdb in the results "view", and either disregard the existence of groups alltogether, or just have a special description (like "ExecDB related Group - ....") that would get filtered out in the default "group" view. I don't really see one being directly better than the others it's just what we want to do. I did not put much thought to it, as I just expected us to keep it basically the same. Do you have any ideas?

...

I assume that the new API will also help fix some of the slowness we've been seeing? IIRC, there were some schema changes which would probably help with query time. Tim

Yep, most of what was really slow should be solved now - or at least it seemed so from my tests. The only thing we still have troubles are the really sparse results - the issue which we thought could get solved by the new Postgres, but wasn't. On the other hand, it is a non-issue, as long as the query is limited by datetime range. If you only care about results that are "newer than X" the amount of data really gets cut down, and the queries are fast, even for the sparse results, since the DB does not need to crawl the whole dataset to be sure there's only LIMIT-n results. This is how Bodhi queries the ResultsDB now - they use 'submitted' timestamp as a constraint. If we communicate this behaviour, I think we'll be fine. I would almost go as far as setting a default time-constraint to (and I'm just thinking out loud here, no reasons for the number whatsoever) three months, and be done with it - if you ever want older results, just set the time-constraint yourself, and be avare that it probably will take time. I don't see a reason we (as in FedoraQA and the related processes) would need to regularly access results older than that anyway. J.

Randy Barlow

Tuesday, 13 September Tue, 13 Sep

1:19 p.m.

Will the api/v1.0/ endpoint continue to function as-is for a while, to give integrators time to adjust to the new API? That would be ideal for Bodhi, so we can adjust our code to work with v2.0 after it is already in production. If not, we will need to coordinate bodhi and resultsdb releases at the same time.

Josef Skladanka

Wednesday, 14 September Wed, 14 Sep

3:57 a.m.

On Tue, Sep 13, 2016 at 8:19 PM, Randy Barlow <bowlofeggs(a)fedoraproject.org> wrote:

...

Hey! There is a plan for the v1.0 endpoint to work, even though being a bit limited in features, but from what I remember about Bodhi, that will not affect it at all.

Randy Barlow

Thursday, 15 September Thu, 15 Sep

3:38 p.m.

That sounds great Josef, thanks!

Tim Flink

9:20 a.m.

On Mon, 15 Aug 2016 22:48:38 +0200 Josef Skladanka <jskladan(a)redhat.com> wrote:

...

After the conversation about resultsdb yesterday, I have a proposal for a change to ResultsDB and clarification about how we'd be using it in Taskotron to answer some of the questions I asked earlier. 1. Add a null-able column to result to indicate the job it came from. This could be a URI or just UUID so long as the final URI could be computed from whatever is stored in this new column. 2. Change "group" to "tag" and plan on it being used for the grouping of results by/for humans. This isn't something that we'd be making use of right away but it seems like a logical feature to add given where things are going. This would mean that we can find the job that every result came from without having to worry about grouping them at submission time. I can think of use cases where there either be no need for a job UUID/URI or one would not exist, hence the suggestion that the column could be empty. How does this sound? Any suggestions, concerns or comments? Tim

Josef Skladanka

12:10 p.m.

On Thu, Sep 15, 2016 at 4:20 PM, Tim Flink <tflink(a)redhat.com> wrote:

...

On Mon, 15 Aug 2016 22:48:38 +0200 Josef Skladanka <jskladan(a)redhat.com> wrote: > Hey gang, > > I spent most of today working on the new API docs for ResultsDB, > making use of the even better Apiary.io tool. > > Before I put even more hours into it, please let me know, whether you > think it's fine at all - I'm yet to find a better tool for describing > APIs, so I'm definitely biased, but since it's the Documentation, it > needs to also be useful. > > http://docs.resultsdb20.apiary.io/ > > I am also trying to put more work towards documenting the attributes > and the "usual" queries, so please try and think about this aspect of > the docs too. After the conversation about resultsdb yesterday, I have a proposal for a change to ResultsDB and clarification about how we'd be using it in Taskotron to answer some of the questions I asked earlier. 1. Add a null-able column to result to indicate the job it came from. This could be a URI or just UUID so long as the final URI could be computed from whatever is stored in this new column.

If we go this way, I'd rather add the whole URL, instead of just UUID - the UUID thing is quite taskotron specific IMHO, but something like "exec_url" (? I wish I had a better name) could be useful in a more general sense.

...

2. Change "group" to "tag" and plan on it being used for the grouping of results by/for humans. This isn't something that we'd be making use of right away but it seems like a logical feature to add given where things are going.

After thinking about it today, I'd rather keep the groups as they are now - this is mostly about semantics, and on the practical level, I'd expect that "tag" would be unique, and identified by name (the same as testcase). The groups, on the other hand, are identified by UUID, which is not a nice UID for tag, from the semantical point of view, IMO. I could, of course, do the changes, and make the Group (Tag) name-identified, but it is not a minor change, and would take considerable effort to do. The groups, as they are now, can have a description set (it might be a good idea to change it to 'name' though, to express what's it supposed to be in a better way), and thus we can effectively do the same as we would with tags. I also feel, that for other uses than Taskotron (OpenQA, Testdays) - it's easier, and more spot-on to have the groups as they are now - grouping by tag would be possible, but coming up with unique names, that also are meaningful, programatically is tough, and unnecessarily complicated. Generating UUID, and setting a reasonable name is not, on the other hand. What do you guys think? This would mean that we can find the job that every result came from

...

without having to worry about grouping them at submission time. I can think of use cases where there either be no need for a job UUID/URI or one would not exist, hence the suggestion that the column could be empty.

If grouping at submission is the concern here, then it would be more than easy to do - the idea here (maybe I did not communicate it properly) was to use the ExecDB generated UUID as the identifier, the same way we do in the whole stack. Since the Groups can be created "on the fly" (meaning, that if you submit a result, with a group-uuid that is not yet in the database, it is created for you), we would not need to worry about it at all. If we wanted to be a bit more descriptive, we could create the Group, and set the Name/Description during trigger time (probably as a part of creating the execdb job). This would, of course, lead to having the 'exec_url' set in the Group's 'ref_url' and thus having the back-reference "by convention", as we have it now, with Job. I don't think that either of the options (exec_url in Result, or using group "by convention") is necessarily better than the other, it's mostly about what semantics we want to have. The underlying reason, I brought this up, is that some of our tests create "unecessary" Jobs/Groups (i.e. 1 job to 1 result) at the moment (rpmlint, abicheck, dockerautotest), and whether we think we should handle it differenty. But I think that with what is coming, we'll be adding more of the 1 job X results stuff (distigt tasks, basically), so it is not that big of a deal. The last question to ask is, whether the "execution grouping" is even something usefull - what do we (would we) use the information that "these X results come from the same execution"? Is it even something we care about? I use the Job overview to have a better idea of which tasks were run (e.g. "when did scratch.dockerautotest run lately"), but that is what ExecDB is for, ultimately. It's just that it's handily available in the resultsdb, so I'm not looking into execdb at all, but I would like to belive, that when we have the dashboard, I'll just use the dashboard, and will forget about both execdb, and resultsdb. So, even though I made it unnecessarily complicated (and I diverged from the original problem, of course), the question I see is: What is the actual motivation to switch from the "current" schema (have a group in resultsdb per execution "unit", use that to have exec_url "by convention"), to "have an 'exec_url' column in the Results verbatim. I don't think that having a "This was executed together" group that big of a deal, the problem I see (and I'm not sure it's really a problem) is that if we have Result in multiple Groups, then linking from ResultsDB to ExecDB can not be done programatically (which group is the "this was executed together", and which are "something else) without relying to convention (like beginning the name of the "exec" group with "Taskotron Execution"). Linking from ExecDB to ResultsDB would be fine, though, as ExecDB has the "execution-unique" UUID. Thoughts? :)

Tim Flink

4:51 p.m.

On Thu, 15 Sep 2016 19:10:56 +0200 Josef Skladanka <jskladan(a)redhat.com> wrote:

...

On Thu, Sep 15, 2016 at 4:20 PM, Tim Flink <tflink(a)redhat.com> wrote: > On Mon, 15 Aug 2016 22:48:38 +0200 > Josef Skladanka <jskladan(a)redhat.com> wrote: > > > Hey gang, > > > > I spent most of today working on the new API docs for ResultsDB, > > making use of the even better Apiary.io tool. > > > > Before I put even more hours into it, please let me know, whether > > you think it's fine at all - I'm yet to find a better tool for > > describing APIs, so I'm definitely biased, but since it's the > > Documentation, it needs to also be useful. > > > > http://docs.resultsdb20.apiary.io/ > > > > I am also trying to put more work towards documenting the > > attributes and the "usual" queries, so please try and think about > > this aspect of the docs too. > > After the conversation about resultsdb yesterday, I have a proposal > for a change to ResultsDB and clarification about how we'd be using > it in Taskotron to answer some of the questions I asked earlier. > > 1. Add a null-able column to result to indicate the job it came > from. This could be a URI or just UUID so long as the final URI > could be computed from whatever is stored in this new column. > > If we go this way, I'd rather add the whole URL, instead of just UUID - the UUID thing is quite taskotron specific IMHO, but something like "exec_url" (? I wish I had a better name) could be useful in a more general sense.

Either one is fine with me. As far as names go, what about ref_url to point at execdb and something like log_url, output_url or artifact_url for what's currently called ref_url and used to be log_url?

...

> 2. Change "group" to "tag" and plan on it being used for the > grouping of results by/for humans. This isn't something that we'd > be making use of right away but it seems like a logical feature to > add given where things are going. > > After thinking about it today, I'd rather keep the groups as they are now - this is mostly about semantics, and on the practical level, I'd expect that "tag" would be unique, and identified by name (the same as testcase). The groups, on the other hand, are identified by UUID, which is not a nice UID for tag, from the semantical point of view, IMO. I could, of course, do the changes, and make the Group (Tag) name-identified, but it is not a minor change, and would take considerable effort to do. The groups, as they are now, can have a description set (it might be a good idea to change it to 'name' though, to express what's it supposed to be in a better way), and thus we can effectively do the same as we would with tags. I also feel, that for other uses than Taskotron (OpenQA, Testdays) - it's easier, and more spot-on to have the groups as they are now - grouping by tag would be possible, but coming up with unique names, that also are meaningful, programatically is tough, and unnecessarily complicated. Generating UUID, and setting a reasonable name is not, on the other hand. What do you guys think?

You have a point about the uniqueness issue with names vs UIDs. I'm fine with keeping the groups as they are.

...

This would mean that we can find the job that every result came from > without having to worry about grouping them at submission time. I > can think of use cases where there either be no need for a job > UUID/URI or one would not exist, hence the suggestion that the > column could be empty. > If grouping at submission is the concern here, then it would be more than easy to do - the idea here (maybe I did not communicate it properly) was to use the ExecDB generated UUID as the identifier, the same way we do in the whole stack. Since the Groups can be created "on the fly" (meaning, that if you submit a result, with a group-uuid that is not yet in the database, it is created for you), we would not need to worry about it at all. If we wanted to be a bit more descriptive, we could create the Group, and set the Name/Description during trigger time (probably as a part of creating the execdb job). This would, of course, lead to having the 'exec_url' set in the Group's 'ref_url' and thus having the back-reference "by convention", as we have it now, with Job. I don't think that either of the options (exec_url in Result, or using group "by convention") is necessarily better than the other, it's mostly about what semantics we want to have.

Storing the exec_url with the result seems simpler to me which is the main reason why I suggested it.

...

The underlying reason, I brought this up, is that some of our tests create "unecessary" Jobs/Groups (i.e. 1 job to 1 result) at the moment (rpmlint, abicheck, dockerautotest), and whether we think we should handle it differenty. But I think that with what is coming, we'll be adding more of the 1 job X results stuff (distigt tasks, basically), so it is not that big of a deal. The last question to ask is, whether the "execution grouping" is even something usefull - what do we (would we) use the information that "these X results come from the same execution"? Is it even something we care about? I use the Job overview to have a better idea of which tasks were run (e.g. "when did scratch.dockerautotest run lately"), but that is what ExecDB is for, ultimately. It's just that it's handily available in the resultsdb, so I'm not looking into execdb at all, but I would like to belive, that when we have the dashboard, I'll just use the dashboard, and will forget about both execdb, and resultsdb.

The primary use case I can think of for "execution grouping" is for triage. If I see something odd in a result, one of my first questions is whether other results from this execution show the same issue or if it's isolated to this single result. In the currently deployed system, the link from resultsdb to execdb is done by linking to the job, looking it up by the UUID from execdb. There is no other data in execdb which points to resultsdb. One of the use cases for execdb is for the folks maintaining Taskotron to have a single place to track a job from initial trigger to final result and execution status. I'd like to see that link maintained but if it's not stored in resultsdb anymore, where do we put it? I suppose this brings up the question of whether its wise to continue keeping resultsdb_frontend as the thing users interface with instead of putting a similar (or better) frontend in front of execdb but that's way out of scope for this discussion.

...

So, even though I made it unnecessarily complicated (and I diverged from the original problem, of course), the question I see is: What is the actual motivation to switch from the "current" schema (have a group in resultsdb per execution "unit", use that to have exec_url "by convention"), to "have an 'exec_url' column in the Results verbatim.

The biggest one for me is along the lines of "Explicit is better than implicit" but to be honest, I'm more interested in the functionality than the exact implementation.

...

I don't think that having a "This was executed together" group that big of a deal, the problem I see (and I'm not sure it's really a problem) is that if we have Result in multiple Groups, then linking from ResultsDB to ExecDB can not be done programatically (which group is the "this was executed together", and which are "something else) without relying to convention (like beginning the name of the "exec" group with "Taskotron Execution"). Linking from ExecDB to ResultsDB would be fine, though, as ExecDB has the "execution-unique" UUID.

Like I said for both of the things above, I'm not all that picky about how things are implemented in this case as long as they work the way we need them to work. I would like to at least have a link from execdb to the results that came out of that execution, being able to find all the results in one grouped execution from resultsdb is less important to me because if all else fails, we can find the global UUID in the artifacts path. Tim

Kamil Paral

Tuesday, 27 September Tue, 27 Sep

11:06 a.m.

...

This would, of course, lead to having the 'exec_url' set in the Group's 'ref_url' and thus having the back-reference "by convention", as we have it now, with Job. I don't think that either of the options (exec_url in Result, or using group "by convention") is necessarily better than the other, it's mostly about what semantics we want to have.

Having the link to execdb in a group (as we do it right now) seems as a fine approach to me.

...

The last question to ask is, whether the "execution grouping" is even something usefull - what do we (would we) use the information that "these X results come from the same execution"? Is it even something we care about?

Currently it is the only way to display dist.rpmgrill* (including subchecks) results for a NVR. Also, it is useful when e.g. looking at what a single upgradepath run reported - I can quickly and easily see everything that was reported during that single task execution.

...

Using conventions is fine in my POV. Either name prefix, or arbitrary tag on the group, etc. Tim wrote:

...

This would mean that we can find the job that every result came from without having to worry about grouping them at submission time. I can think of use cases where there either be no need for a job UUID/URI or one would not exist, hence the suggestion that the column could be empty.

What are the use cases? I can think of one - yesterday Adam mentioned he would like to save manual test results into resultsdb (using a frontend). That would have no ExecDB entry (no UUID). Is that a problem in the current design? This also means we would probably not create a group for this result - is that also OK?

Josef Skladanka

Thursday, 29 September Thu, 29 Sep

12:31 a.m.

On Tue, Sep 27, 2016 at 6:06 PM, Kamil Paral <kparal(a)redhat.com> wrote:

...

... What are the use cases? I can think of one - yesterday Adam mentioned he would like to save manual test results into resultsdb (using a frontend). That would have no ExecDB entry (no UUID). Is that a problem in the current design? This also means we would probably not create a group for this result - is that also OK?

Having no ExecDB entry is not a problem, although it provides global UUID for our execution, the UUID from ExecDB is not necessary at all for ResultsDB (or the manual-testing-frontend). The point of ExecDB's UUID is to be able to tie together the whole automated run from the point of Trigger to the ResultsDB. But ResultsDB can (and does, if used that way) create Group UUIDs on its own. So we could still create a groups for the manual tests - e.g. per build - if we wanted to, the groups are made to be more usable (and easier to use) than the old jobs. But we definitely could do without them, just selecting the right results would (IMHO) be a bit more complicated without the groups. The thing here (which I guess is not that obvious) is, that there are different kinds of UUIDS, and that you can generate "non-random" ones, based on namespace and name- this is what we're going to use in OpenQA, for example, where we struggled with the "old"design of ResultsDB (you needed to create the Job during trigger time, and then propagate the id, so it's available in the end, at report time). We are going to use something like `uuid.uuid3("OpenQA in Fedora", "Build Fedora-Rawhide-20160928.n.0")` (pseudocode to some extent), to create the same group UUID for the same build. This approach can be easily replicated anywhere, to provide canonical UUIDs, if needed. Hope that I was at least a bit on topic :) j.

Josef Skladanka

Monday, 3 October Mon, 3 Oct

3:35 a.m.

So, what's the decision? I know I can "guesstimate", but I'd like to see a group consensus before I actually start coding. On Thu, Sep 29, 2016 at 7:31 AM, Josef Skladanka <jskladan(a)redhat.com> wrote:

...

On Tue, Sep 27, 2016 at 6:06 PM, Kamil Paral <kparal(a)redhat.com> wrote: > ... > What are the use cases? I can think of one - yesterday Adam mentioned he > would like to save manual test results into resultsdb (using a frontend). > That would have no ExecDB entry (no UUID). Is that a problem in the current > design? This also means we would probably not create a group for this > result - is that also OK? > Having no ExecDB entry is not a problem, although it provides global UUID for our execution, the UUID from ExecDB is not necessary at all for ResultsDB (or the manual-testing-frontend). The point of ExecDB's UUID is to be able to tie together the whole automated run from the point of Trigger to the ResultsDB. But ResultsDB can (and does, if used that way) create Group UUIDs on its own. So we could still create a groups for the manual tests - e.g. per build - if we wanted to, the groups are made to be more usable (and easier to use) than the old jobs. But we definitely could do without them, just selecting the right results would (IMHO) be a bit more complicated without the groups. The thing here (which I guess is not that obvious) is, that there are different kinds of UUIDS, and that you can generate "non-random" ones, based on namespace and name- this is what we're going to use in OpenQA, for example, where we struggled with the "old"design of ResultsDB (you needed to create the Job during trigger time, and then propagate the id, so it's available in the end, at report time). We are going to use something like `uuid.uuid3("OpenQA in Fedora", "Build Fedora-Rawhide-20160928.n.0")` (pseudocode to some extent), to create the same group UUID for the same build. This approach can be easily replicated anywhere, to provide canonical UUIDs, if needed. Hope that I was at least a bit on topic :) j.

Kamil Paral

Tuesday, 4 October Tue, 4 Oct

9:58 a.m.

...

On Tue, Sep 27, 2016 at 6:06 PM, Kamil Paral < kparal(a)redhat.com > wrote:

...

> ... > What are the use cases? I can think of one - yesterday Adam mentioned he > would like to save manual test results into resultsdb (using a frontend). > That would have no ExecDB entry (no UUID). Is that a problem in the current > design? This also means we would probably not create a group for this > result > - is that also OK?

...

The thing here (which I guess is not that obvious) is, that there are different kinds of UUIDS, and that you can generate "non-random" ones, based on namespace and name- this is what we're going to use in OpenQA, for example, where we struggled with the "old"design of ResultsDB (you needed to create the Job during trigger time, and then propagate the id, so it's available in the end, at report time). We are going to use something like `uuid.uuid3("OpenQA in Fedora", "Build Fedora-Rawhide-20160928.n.0")` (pseudocode to some extent), to create the same group UUID for the same build. This approach can be easily replicated anywhere, to provide canonical UUIDs, if needed.

...

Hope that I was at least a bit on topic :)

Very much. Thanks for an exhaustive answer.

...

So, what's the decision? I know I can "guesstimate", but I'd like to see a group consensus before I actually start coding.

I'll just summarize here that we discussed this during Monday's qa-devel meeting and reached consensus that keeping ref_url in the group (as it used to be) is the current way forward.

2755

days inactive

2805

days old

qa-devel@lists.fedoraproject.org

Manage subscription

13 comments

4 participants

tags (0)

participants (4)

Josef Skladanka
Kamil Paral
Randy Barlow
Tim Flink

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

Resultsdb v2.0 - API docs