[Resultsdb-users] Re: De-duplicating test results: 'scenarios'

Tuesday, 23 May 2017

Excerpts from Kamil Paral's message of 2017-05-11 10:35 +02:00:
...
 On Thu, May 11, 2017 at 10:13 AM, Josef Skladanka
<jskladan(a)redhat.com&gt;
 wrote:
 > ... Following up with that last sentence:
 >
 > I sure agree, that having thousands of Depcheck results is nonsense. And
 > in this special case, I really agree that pruning the data is a good idea
 > (if we have a very well defined way of devising the "this is just a
 > duplicate" thing, that is). I also agree, that we don't need to have the
 > whole history of all the results in one place, that is fast-read access.
 > But it still makes sense (to me at least) to just have an archiving policy,
 > rather than deletion policy, as the first step.
 >

 I didn't realize that, but that's conceptually no different from what I
 proposed, right? It can still be an external process that prunes the last
 day's worth of data during night, just instead of throwing those away
 completely, it saves them to a secondary database. So it can be still
 implemented outside of resultsdb. Correct?

 Also, this could also be OK for RHEL folks, they could have different
 policies for archiving, and the old results would still be available, just
 in a different database (and with slower access). 
Unfortunately this will be basically a non-starter. Imagine that you 
need to go back to an old Bodhi update that shipped three years ago for 
some kind of auditing purpose. Bodhi needs to be able to show the same 
results, waivers, and decisions today as it did three years ago.

Okay, maybe with Bodhi it is okay to just wing it and say "sorry the 
data is gone now" but that doesn't really fly if you imagine Errata Tool 
in place of Bodhi.

So I think relying on deleting/moving/archiving data out of the 
ResultsDB database just to make it perform well, is not a viable option. 
There really has to be a proper solution to the "how do I find the 
latest relevant results" problem.

(Doing some one-off pruning to handle pathological cases like old 
depcheck results is a different story -- that would be more just about 
reducing the size of Fedora's ResultsDB a little, rather than being 
crucial to making ResultsDB perform well.)

-- 
Dan Callaghan <dcallagh(a)redhat.com&gt;
Senior Software Engineer, Products & Technologies Operations
Red Hat

2024

2023

2022

2021

2020

2019

2018

2017

[Resultsdb-users] Re: De-duplicating test results: 'scenarios'