Now that we've had a second flavor of this issue (running out of inodes on a buildmaster) hit us, it's probably time to address log data retention.
At the moment, we don't have a log data retention policy which has lead to filling up disks with logs. We need some policy for how long we're going to keep this data but I don't want to just decide something without some form of discussion/documentation.
When we had this problem with AutoQA, we implemented a cronjob that would delete logs older than 30 days but we also had a lot less disk to work with back then.
There are 2 forms of log data that this new policy would affect: the artifacts created by task execution and the build logs/data stored by the buildmaster. Both are relatively simple file-based data which can be removed without any additional consequences than no longer being available.
The questions raised so far are:
1. How long is long enough to keep log and execution data?
2. Should be be cleaning up anything that references builds/artifacts (like links in resultsdb) before we delete them?
3. Do we want to put resources into figuring out whether the result was a PASS or FAIL before deleting it?
4. Should fesco be involved in this decision?
Thoughts or Suggestions? I really don't want to spend much time on this but that statement does seem to come out of me when we're about to spend too much time on a topic (at least some of which ends up being my fault) :)
Tim
On Mon, 28 Sep 2015 12:20:03 -0600 Tim Flink tflink@redhat.com wrote:
I wonder if this ends up being the email list equivalent of talking to myself :)
It just seemed like a cleaner way to separate my answers from the questions
<snip>
The questions raised so far are:
- How long is long enough to keep log and execution data?
6-12 months should be more than enough but it might be worth trying to keep a release-lifetime of logs (~18 months, including pre-release)
- Should be be cleaning up anything that references builds/artifacts
(like links in resultsdb) before we delete them?
Ideally, yes but I don't think it's worth more than a day's effort for one person if we have proper 404 processing on the machine hosting the artifacts.
If it does become an issue in the future, it wouldn't be difficult to go back and change dead links if we needed to.
- Do we want to put resources into figuring out whether the result
was a PASS or FAIL before deleting it?
No, it's not worth the effort - I'd rather just store more logs than put much dev time into deciding which logs to delete
- Should fesco be involved in this decision?
Either way - I suspect that they're not going to have much of an opinion and it adds bureaucracy to the process but I suppose that the decision would be a bit more "official" if we asked them.
Tim
On Mon, Sep 28, 2015 at 12:24:12PM -0600, Tim Flink wrote:
On Mon, 28 Sep 2015 12:20:03 -0600 Tim Flink tflink@redhat.com wrote:
I wonder if this ends up being the email list equivalent of talking to myself :)
It just seemed like a cleaner way to separate my answers from the questions
<snip> > The questions raised so far are: > > 1. How long is long enough to keep log and execution data?
6-12 months should be more than enough but it might be worth trying to keep a release-lifetime of logs (~18 months, including pre-release)
I would think we keep log/execution data for current milestone-1, so if we're in Beta, we keep Beta and Alpha logs. It's not very common that something breaks in Final and we need to look all the way back to Alpha, is it? As for packages not in the frozen set, I'd think 60-90 days would be enough.
- Should be be cleaning up anything that references builds/artifacts
(like links in resultsdb) before we delete them?
Ideally, yes but I don't think it's worth more than a day's effort for one person if we have proper 404 processing on the machine hosting the artifacts.
If it does become an issue in the future, it wouldn't be difficult to go back and change dead links if we needed to.
That makes sense to me.
- Do we want to put resources into figuring out whether the result
was a PASS or FAIL before deleting it?
No, it's not worth the effort - I'd rather just store more logs than put much dev time into deciding which logs to delete
- Should fesco be involved in this decision?
Either way - I suspect that they're not going to have much of an opinion and it adds bureaucracy to the process but I suppose that the decision would be a bit more "official" if we asked them.
I don't know that we need fesco for this decision, but it might be a good thing to make sure is documented somewhere should someone ask.
Tim
----- Original Message -----
From: "Tim Flink" tflink@redhat.com To: qa-devel@lists.fedoraproject.org Sent: Monday, September 28, 2015 8:24:12 PM Subject: Re: Log Data Retention
On Mon, 28 Sep 2015 12:20:03 -0600 Tim Flink tflink@redhat.com wrote:
I wonder if this ends up being the email list equivalent of talking to myself :)
It just seemed like a cleaner way to separate my answers from the questions
<snip> > The questions raised so far are: > > 1. How long is long enough to keep log and execution data?
6-12 months should be more than enough but it might be worth trying to keep a release-lifetime of logs (~18 months, including pre-release)
I guess that depends on how much storage we have but it seems to me that 6 months is more than enough. What are the use cases for going back to more than 6 months old task results?
- Should be be cleaning up anything that references builds/artifacts
(like links in resultsdb) before we delete them?
Ideally, yes but I don't think it's worth more than a day's effort for one person if we have proper 404 processing on the machine hosting the artifacts.
If it does become an issue in the future, it wouldn't be difficult to go back and change dead links if we needed to.
Proper 404 is good enough for me.
- Do we want to put resources into figuring out whether the result
was a PASS or FAIL before deleting it?
No, it's not worth the effort - I'd rather just store more logs than put much dev time into deciding which logs to delete
Agreed, we have more important tasks we could use our limited resources for.
- Should fesco be involved in this decision?
Either way - I suspect that they're not going to have much of an opinion and it adds bureaucracy to the process but I suppose that the decision would be a bit more "official" if we asked them.
Not sure if we need to ask them but I suppose asking wouldn't hurt.
Thanks, Martin
- How long is long enough to keep log and execution data?
6-12 months should be more than enough but it might be worth trying to keep a release-lifetime of logs (~18 months, including pre-release)
Those are higher numbers than I expected to be realistic. For me, I'd see 3 months as the required minimum, we sometimes need to go back and debug some issues. 6 months is great. Anything above doesn't hurt of course, but I wouldn't mind losing it.
This question regards "log and execution data". Where is a similar question regarding task artifacts? For those, I think we should try to keep at least 6 months of results.
- Should be be cleaning up anything that references builds/artifacts
(like links in resultsdb) before we delete them?
Ideally, yes but I don't think it's worth more than a day's effort for one person if we have proper 404 processing on the machine hosting the artifacts.
What's the benefit of removing links from the database, does it increase storage space or speed it up? Because otherwise I see it the other way round, if the resultsdb page contains a link to an artifact and it goes to a 404 page saying "sorry, this is probably too old and already deleted, we usually keep files around for XYZ time", the user has learned everything important. If the resultsdb page does not contain any link, the user will wonder "where's the check log? why is it missing?" which is a more confusing scenario.
- Do we want to put resources into figuring out whether the result
was a PASS or FAIL before deleting it?
No, it's not worth the effort - I'd rather just store more logs than put much dev time into deciding which logs to delete
Agreed.
- Should fesco be involved in this decision?
Either way - I suspect that they're not going to have much of an opinion and it adds bureaucracy to the process but I suppose that the decision would be a bit more "official" if we asked them.
I don't think it's important enough to bother them, and is mainly affected by our storage capacity anyways, not by "management decision".
On Mon, 28 Sep 2015 12:20:03 -0600 Tim Flink tflink@redhat.com wrote:
Now that we've had a second flavor of this issue (running out of inodes on a buildmaster) hit us, it's probably time to address log data retention.
We're almost out of disk space for artifacts on taskotron-dev again (~200M free at the moment) so it's time to get this figured out :)
The consensus here seems to be that 3-ish months should be sufficient. In the next couple of days, I'm planning to put a daily cron job on dev to delete all artifacts older than 4 months. If you have objections, speak up now.
Tim
qa-devel@lists.fedoraproject.org