On Thu, Mar 30, 2017 at 5:03 PM, Adam Williamson <adamwill@fedoraproject.org> wrote:

On Wed, 2017-03-22 at 09:07 -0400, Kamil Paral wrote:

> I know this has been a long email, but, feedback needed! :-)

So I just read through this again.

I'm wondering if we can do something simpler and within resultsdb,
though it still *is* adding complexity: basically, add de-duplication
by specified fields as a feature to resultsdb

Sure, can be done. It is not "simple" though, and still requires traversing all the data in database, plus it means more CPU and memory load on the web server (as it will be doing the de-duplication) - this all meaning slower response times than what the middleware could have. I'm not saying it's not doable, just that having reasonable HW is becoming more of a hard requirement, if we decide to go this way.
The key differences between OpenQA and ResultsDB here are

1) OpenQA operates on few orders of magnitude less data

2) OpenQA can be optimized for that one special use case - if the de-duplication is done on one key, and always on that one key, you can easily make a DB optimization to cater for that use-case. Since ResultsDB is supposed to be data-agnostic, we can't really do that (well, we can, actually, but I am strongly against having special-usecase-solving code [like optimizing for deduplication based on one special non-required piece of data basically is] in the codebase, so...). This is what the middleware can do - have a DB structure + code optimized for that one special use-case, plus it can have different data-retention policies than ResultsDB, and so on (like you could easily prune data from a thing where you know stuff older than 6 months does not need to be cared about).

Once again - not saying this necessarily needs to be done one way or another. Just for the record - I really do believe that for the most cases, doing it from the Bodhi's frontend, like Kamil described, and you initially wanted to (IIRC), is a sane way. At least for now :)