On Thu, May 11, 2017 at 10:13 AM, Josef Skladanka <jskladan@redhat.com> wrote:

Bah, do you also hate it, when you hit ctrl+enter instead of shift+enter, and the email gets sent?

Nope, I configured my email client (gmail) to be able to undo sending emails in such case (20-second delay in my case). (I don't even know what shift+enter is for, it doesn't seem to do anything different than a standard enter).

... Following up with that last sentence:

I sure agree, that having thousands of Depcheck results is nonsense. And in this special case, I really agree that pruning the data is a good idea (if we have a very well defined way of devising the "this is just a duplicate" thing, that is). I also agree, that we don't need to have the whole history of all the results in one place, that is fast-read access. But it still makes sense (to me at least) to just have an archiving policy, rather than deletion policy, as the first step.

I didn't realize that, but that's conceptually no different from what I proposed, right? It can still be an external process that prunes the last day's worth of data during night, just instead of throwing those away completely, it saves them to a secondary database. So it can be still implemented outside of resultsdb. Correct?

Also, this could also be OK for RHEL folks, they could have different policies for archiving, and the old results would still be available, just in a different database (and with slower access).

And to be honest, I don't think that the policy for data retention has anything to do with result de-duplication.

Nope.

Sure it makes the "stupid" undefined approach easier - we could be going on saying "just download it all, and decide for yourself" a bit longer, and effectively stall the "NASA-level" decision making.

Yes, that's how it was meant. If not for depcheck/upgradepath style of reporting, we might not have had this discussion at all, because all the other tasks report a very reasonable amount of results (single digits) for each item.

And once again - I'm not saying that it is necessarily a wrong step to take now. But we must be very consentient about what we are doing, and why. Because having less data in the database does not really help with deciding "what is the latest result in this usecase".

More brainpower would be welcome on this topic :) If we can't make it auto-resolving, we can still go back to defining a huge list of all testcases in the database and their available extra args values, so that consumers can send very specific queries.

So, to sum it up a bit - I am the last person to go against making our stuff a bit (a lot) more tailored to the usecases we have. But I need to have those usecases defined, and ideally have policies in place to support those usecases, before I feel it is a good idea to do it.

J.