On Thu, May 11, 2017 at 9:39 AM, Kamil Paral <kparal@redhat.com> wrote:

On Wed, May 10, 2017 at 4:30 PM, Josef Skladanka <jskladan@redhat.com> wrote:

On Wed, May 10, 2017 at 4:23 PM, Kamil Paral <kparal@redhat.com> wrote:
tl;dr; but:
... The simple and fast solution is making the database schema directly tailored to our Fedora use cases, making scenario searches fast....

Not really simple, honestly, especially if by "scenario searches" you mean "retrieving a list of results deduplicated by scenario" - as it is the same as that middleware thing.

I didn't really mean that when I wrote it. I meant it would be fast to ask for all scenarios available for $item, and then you could ask for the latest result for each scenario. But I admit I don't know much about databases, so...

Honestly, this is just a different angle on the same thing - if you want to store item->ANY(scenario) mappings, then you can just go and store the item->LATEST(scenario), and do what you wanted to do in the first place. I can see why this can seem different to you, but it really is not. As I said many times - each time you want to have "all of something" it means either having very specific table structure, or just traversing the whole database.

as long as there is no rigorous specification,

Hahahah. This is not NASA, Josef, this is open source! :o) Even if you received a rigorous specification, the requirements would change in 6 months :)

No, it is not. But this sofware engineering - knowing _what_, and even more importantly _why_ you want to do something is an integral part of the development process. We sure can have a steaming pile of quick hacks and workarounds, but if that is the desired/expected outcome, I don't really see a point of having the "design conversations".

But I would like to know your thoughts on my last paragraph in my previous email.

Ad pruning - sure, why not. Even though I don't think that it is the right way of tackling this, should we decide to do it, my preferred way would be "move to archive" instead "delete".
But we are getting back to those NASA-like specifications and policies here, so... You still need to define "what is unique", "what is latest", "how long do we keep stuff" - just like with any other solution. The only difference here is, that instead of adding a layer on top of a datastore (resultsdb), you decided to just prune the data in said data store.

I'm not trying to get personal, or petty, but in almost three months, we were not able to agree even on what the "scenario" should be. I came from "this is nonsense" to "OK, if used systematically, this can be helpful", Adam is on the path of switching from "Why do you guys need to make everything so overcomplicated, this is but a string!" to "Well, some of what you said actually makes sense, now that I think about it a bit more", but that's about it. We do "something" now, but we don't do it in a known, defined way, so while this works for some cases, it does not for others. And more than that - we are not even really sure where it does and does not work.

Yes, this was slightly hyperbolic, but I hope you get the point. And I might be a data-freak, but for some reason, I don't think it is a good idea to sort "we have too much data" by deleting it haphazardly.