On Mon, Sep 12, 2016 at 11:39 PM, Tim Flink tflink@redhat.com wrote:
I think we talked about this in person earlier but I didn't write any notes about it and I don't recall the details.
How exactly are we going to be using Groups? The first thing that comes to mind is to group results by execution so that there would be a group of results which were all produced from the same run of the same task. That's kinda what we're using Job for in resultsdb 1.0 right now, anyways.
I realize that the docs for resultsdb are supposed to be not-specific-to-taskotron but was there anything else we thought the Group might be useful for?
Also, what do we want to do about a link to execdb? If we're planning to have a group for each execution's results, that could be the group's ref_url but that relies on convention which could change if Group is used for more than just grouping results by execution.
I see two (maybe three) options - either we'll be using the groups in the same way we used Jobs - to group results per execution as we do now, and use the group's ref_url to point to execdb. If we ever need to use the groups for more than that, then we could just have the result in more than one group, and set meaningful descriptions.
The other way would be to not use Groups at all, and just store the execdb's UUID in the key-value store. Those would then be rendered to an URL in the frontend.
Last option would be a combination of both - we'd be using Groups as we do now to group by execution, but instead of the "default" resultsdb_frontend, we'd use something tailored for taskotron - we could show links to execdb in the results "view", and either disregard the existence of groups alltogether, or just have a special description (like "ExecDB related Group - ....") that would get filtered out in the default "group" view.
I don't really see one being directly better than the others it's just what we want to do. I did not put much thought to it, as I just expected us to keep it basically the same.
Do you have any ideas?
I assume that the new API will also help fix some of the slowness we've been seeing? IIRC, there were some schema changes which would probably help with query time.
Tim
Yep, most of what was really slow should be solved now - or at least it seemed so from my tests. The only thing we still have troubles are the really sparse results - the issue which we thought could get solved by the new Postgres, but wasn't.
On the other hand, it is a non-issue, as long as the query is limited by datetime range. If you only care about results that are "newer than X" the amount of data really gets cut down, and the queries are fast, even for the sparse results, since the DB does not need to crawl the whole dataset to be sure there's only LIMIT-n results. This is how Bodhi queries the ResultsDB now - they use 'submitted' timestamp as a constraint.
If we communicate this behaviour, I think we'll be fine. I would almost go as far as setting a default time-constraint to (and I'm just thinking out loud here, no reasons for the number whatsoever) three months, and be done with it - if you ever want older results, just set the time-constraint yourself, and be avare that it probably will take time. I don't see a reason we (as in FedoraQA and the related processes) would need to regularly access results older than that anyway.
J.