On Thu, Mar 30, 2017 at 5:03 PM, Adam Williamson <adamwill(a)fedoraproject.org
wrote:
On Wed, 2017-03-22 at 09:07 -0400, Kamil Paral wrote:
> I know this has been a long email, but, feedback needed! :-)
So I just read through this again.
I'm wondering if we can do something simpler and within resultsdb,
though it still *is* adding complexity: basically, add de-duplication
by specified fields as a feature to resultsdb
Sure, can be done. It is not "simple" though, and still requires traversing
all the data in database, plus it means more CPU and memory load on the web
server (as it will be doing the de-duplication) - this all meaning slower
response times than what the middleware could have. I'm not saying it's not
doable, just that having reasonable HW is becoming more of a hard
requirement, if we decide to go this way.
The key differences between OpenQA and ResultsDB here are
1) OpenQA operates on few orders of magnitude less data
2) OpenQA can be optimized for that one special use case - if the
de-duplication is done on one key, and always on that one key, you can
easily make a DB optimization to cater for that use-case. Since ResultsDB
is supposed to be data-agnostic, we can't really do that (well, we can,
actually, but I am strongly against having special-usecase-solving code
[like optimizing for deduplication based on one special non-required piece
of data basically is] in the codebase, so...). This is what the middleware
can do - have a DB structure + code optimized for that one special
use-case, plus it can have different data-retention policies than
ResultsDB, and so on (like you could easily prune data from a thing where
you know stuff older than 6 months does not need to be cared about).
Once again - not saying this necessarily needs to be done one way or
another. Just for the record - I really do believe that for the most cases,
doing it from the Bodhi's frontend, like Kamil described, and you initially
wanted to (IIRC), is a sane way. At least for now :)
J.