We have (almost) all instances and volumes properly tagged. Now let check Snapshots.
OMG - there are A LOT of them. The list has 97k lines! Because of the size I will not attach it and instead provide link to download it: https://k00.fr/8p59mvcw
If you help me to identify something, I can either delete or tag it for you.
Few things I spotted: * snapshots of volumes that no longer exists. Can it be deleted? * lots of snapshots like fedora-coreos-36.20221030.2.3-aarch64 - do we still need 36 and older? * Fedora-Cloud-Base-29-20190729.0.x86_64-hvm-us-east-1-standard-0 - is this snapshots used to generate AMIs for getfedora.org? Do we still need it?
If you have snapshots that are important, please check that it have tag FedoraGroup=*
On 11/2/23 05:32, Miroslav Suchý wrote:
We have (almost) all instances and volumes properly tagged. Now let check Snapshots.
OMG - there are A LOT of them. The list has 97k lines! Because of the size I will not attach it and instead provide link to download it: https://k00.fr/8p59mvcw
If you help me to identify something, I can either delete or tag it for you.
Few things I spotted:
- snapshots of volumes that no longer exists. Can it be deleted?
- lots of snapshots like fedora-coreos-36.20221030.2.3-aarch64 - do we still need 36 and older?
We'll clean up the ones we do not need soon. I'm sorry it is taking longer than we thought it would to get garbage collection implemented.
- Fedora-Cloud-Base-29-20190729.0.x86_64-hvm-us-east-1-standard-0 - is this snapshots used to generate AMIs for
getfedora.org? Do we still need it?
If you have snapshots that are important, please check that it have tag FedoraGroup=*
Hmm. Is that required? If so, is it new? When we (Fedora CoreOS Group) create AMIs we don't add tags like this to our AMIs.
Dusty
Dne 03. 11. 23 v 2:06 Dusty Mabe napsal(a):
We'll clean up the ones we do not need soon. I'm sorry it is taking longer than we thought it would to get garbage collection implemented.
Thank you.
- Fedora-Cloud-Base-29-20190729.0.x86_64-hvm-us-east-1-standard-0 - is this snapshots used to generate AMIs for
getfedora.org? Do we still need it?
If you have snapshots that are important, please check that it have tag FedoraGroup=*
Hmm. Is that required? If so, is it new?
It is in our SOP for ages
https://docs.fedoraproject.org/en-US/infra/sysadmin_guide/aws-access/#_ec2
but so far no ones checked it and enforced.
When we (Fedora CoreOS Group) create AMIs we don't add tags like this to our AMIs.
I am now focusing on Snapshots, but please add it to any resource.
When we started we have few resources there and it was easy to clean up leftovers and identify owners. Now we have there thousands resources in dozen regions. And it is hard to garbage collect resources and identify who is responsible for consuming expensive resources. While this account is sponsored by AWS it does not mean that we can misuse it and leave garbage behind and let Amazon pay for that.
Miroslav
On 11/3/23 03:00, Miroslav Suchý wrote:
Dne 03. 11. 23 v 2:06 Dusty Mabe napsal(a):
We'll clean up the ones we do not need soon. I'm sorry it is taking longer than we thought it would to get garbage collection implemented.
Thank you.
- Fedora-Cloud-Base-29-20190729.0.x86_64-hvm-us-east-1-standard-0 - is this snapshots used to generate AMIs for
getfedora.org? Do we still need it?
If you have snapshots that are important, please check that it have tag FedoraGroup=*
Hmm. Is that required? If so, is it new?
It is in our SOP for ages
https://docs.fedoraproject.org/en-US/infra/sysadmin_guide/aws-access/#_ec2
but so far no ones checked it and enforced.
Does that mean we should set ours to FedoraGroup=coreos ?
When we (Fedora CoreOS Group) create AMIs we don't add tags like this to our AMIs.
I am now focusing on Snapshots, but please add it to any resource.
When we started we have few resources there and it was easy to clean up leftovers and identify owners. Now we have there thousands resources in dozen regions. And it is hard to garbage collect resources and identify who is responsible for consuming expensive resources. While this account is sponsored by AWS it does not mean that we can misuse it and leave garbage behind and let Amazon pay for that.
Miroslav
-- Miroslav Suchy, RHCA Red Hat, Manager, Packit and CPT, #brno, #fedora-buildsys
infrastructure mailing list -- infrastructure@lists.fedoraproject.org To unsubscribe send an email to infrastructure-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@lists.fedorapro... Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
Dne 03. 11. 23 v 16:15 Dusty Mabe napsal(a):
Does that mean we should set ours to FedoraGroup=coreos ?
Yes. The values are not defined. I use the values to group the resources together. If you want you can even use two labels: coreos-dev, coreos-ci. Or just coreos. It is up to you.
On Thu, Nov 02, 2023 at 10:32:20AM +0100, Miroslav Suchý wrote:
We have (almost) all instances and volumes properly tagged. Now let check Snapshots.
Thanks for continuing to drive this forward. ;)
OMG - there are A LOT of them. The list has 97k lines! Because of the size I will not attach it and instead provide link to download it: https://k00.fr/8p59mvcw
If you help me to identify something, I can either delete or tag it for you.
Few things I spotted:
- snapshots of volumes that no longer exists. Can it be deleted?
- lots of snapshots like fedora-coreos-36.20221030.2.3-aarch64 - do we still need 36 and older?
- Fedora-Cloud-Base-29-20190729.0.x86_64-hvm-us-east-1-standard-0 - is this
snapshots used to generate AMIs for getfedora.org? Do we still need it?
If you have snapshots that are important, please check that it have tag FedoraGroup=*
So, if the non coreos ones are mostly fedimg, it doesn't tag things. ;( It predates our tagging setup entirely...
I've not dug into it, but yeah, I think it uses snapshots to make the ami's... but it's unclear to me if it does or should clean those up after the ami is made?
https://github.com/fedora-infra/fedimg/blob/develop/docs/services/ec2.md
I'm not sure how we can tell which of these are fedimg related and wich aren't. Can we tell when something was created? I guess we could mount them on a instance and see whats in them, but that doesn't seem practical for 97k snapshots. ;)
Can we get what volume they are snapshots of? Perhaps the volume name would help us figure things out?
Open to ideas on how to clean it up.
kevin
Dne 06. 11. 23 v 20:45 Kevin Fenzi napsal(a):
Can we get what volume they are snapshots of? Perhaps the volume name would help us figure things out?
Most of the 6GiB volumes like snap-098326d474a07f706 is snapshot of vol-ffffffff which does not exist (this snapshot is from 2018)
Even if I take
snap-0fdf88e3527a6ca6e (fedora-coreos-39.20231101.1.0-x86_64) that was created
Fri Nov 03 2023 04:12:53 GMT+0100
with description Copied for DestinationAmi ami-0e62f1adedc546f4d from SourceAmi ami-0b9d8baf52b75e62c for SourceSnapshot snap-033116129e665e380. Task created on 1,698,981,171,355.Copied for DestinationAmi ami-0e62f1adedc546f4d from SourceAmi ami-0b9d8baf52b75e62c for SourceSnapshot snap-033116129e665e380. Task created on 1,698,981,171,355.Copied for DestinationAmi ami-0e62f1adedc546f4d from SourceAmi ami-0b9d8baf52b75e62c for SourceSnapshot snap-033116129e665e380. Task created on 1,698,981,171,355.Copied for DestinationAmi ami-0e62f1adedc546f4d from SourceAmi ami-0b9d8baf52b75e62c for SourceSnapshot snap-033116129e665e380. Task created on 1,698,981,171,355. as snapshot of vol-ffffffff that does not exists.
Hmm, fromhttps://docs.aws.amazon.com/AWSEC2/latest/UserGuide/creating-an-ami-ebs.html :
During the AMI-creation process, Amazon EC2 creates snapshots of your instance's root volume and any other EBS volumes attached to your instance. You're charged for the snapshots until you deregister the AMI https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/deregister-ami.html and delete the snapshots. If any volumes attached to the instance are encrypted, the new AMI only launches successfully on instances that support Amazon EBS encryption https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSEncryption.html.
On Mon, Nov 06, 2023 at 10:22:57PM +0100, Miroslav Suchý wrote:
Dne 06. 11. 23 v 20:45 Kevin Fenzi napsal(a):
Can we get what volume they are snapshots of? Perhaps the volume name would help us figure things out?
Most of the 6GiB volumes like snap-098326d474a07f706 is snapshot of vol-ffffffff which does not exist (this snapshot is from 2018)
Even if I take
snap-0fdf88e3527a6ca6e (fedora-coreos-39.20231101.1.0-x86_64) that was created
Fri Nov 03 2023 04:12:53 GMT+0100
with description Copied for DestinationAmi ami-0e62f1adedc546f4d from SourceAmi ami-0b9d8baf52b75e62c for SourceSnapshot snap-033116129e665e380. Task created on 1,698,981,171,355.Copied for DestinationAmi ami-0e62f1adedc546f4d from SourceAmi ami-0b9d8baf52b75e62c for SourceSnapshot snap-033116129e665e380. Task created on 1,698,981,171,355.Copied for DestinationAmi ami-0e62f1adedc546f4d from SourceAmi ami-0b9d8baf52b75e62c for SourceSnapshot snap-033116129e665e380. Task created on 1,698,981,171,355.Copied for DestinationAmi ami-0e62f1adedc546f4d from SourceAmi ami-0b9d8baf52b75e62c for SourceSnapshot snap-033116129e665e380. Task created on 1,698,981,171,355. as snapshot of vol-ffffffff that does not exists.
Hmm, fromhttps://docs.aws.amazon.com/AWSEC2/latest/UserGuide/creating-an-ami-ebs.html :
During the AMI-creation process, Amazon EC2 creates snapshots of your instance's root volume and any other EBS volumes attached to your instance. You're charged for the snapshots until you deregister the AMI https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/deregister-ami.html and delete the snapshots. If any volumes attached to the instance are encrypted, the new AMI only launches successfully on instances that support Amazon EBS encryption https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSEncryption.html.
yeah, I am not sure here. I guess we could check fedimg code, and/or ask any subject matter experts to chime in.
Well, actually, we should probibly check in on the thing thats cleaning up the amis? and confirm that it is deleting the snapshots?
I think that is this: roles/fedimg/templates/clean-amis.py in ansible.
and it does delete the snapshot... so, perhaps indeed all these ones with vol-ffffffff are some mistake or some other amis?
kevin
Dne 09. 11. 23 v 20:39 Kevin Fenzi napsal(a):
Well, actually, we should probibly check in on the thing thats cleaning up the amis? and confirm that it is deleting the snapshots?
I think that is this: roles/fedimg/templates/clean-amis.py in ansible.
and it does delete the snapshot... so, perhaps indeed all these ones with vol-ffffffff are some mistake or some other amis?
I had time to investigate it a bit:
I deleted one of the ancient snapshot (from 2018) and AWS did not object. So it is not base image for current AMI (otherwise AWS would refuse to delete it).
The snapshots has Description like:
Copied for DestinationAmi ami-052b0ac13b1043c97 from SourceAmi ami-0d9943288750067d3 for SourceSnapshot snap-0a92565926bd815be. Task created on 1,700,729,192,462.
I **think** this description is made when you copy snapshot between regions.
I investigated one of today's such snapshot:
https://ap-south-1.console.aws.amazon.com/ec2/home?region=ap-south-1#Snapsho...
(the description of this snapshot is the one cited above)
and the associated AMI exists. It is
https://ap-south-1.console.aws.amazon.com/ec2/home?region=ap-south-1#ImageDe...
with name
Fedora-Cloud-Base-Rawhide-20231123.n.0.aarch64-hvm-ap-south-1-gp3-0
So this images are really leftover from creating nightly AMIs.
I checked the
roles/fedimg/templates/clean-amis.py and I think it does not work at all. For two reasons: 1) We have active AMIs that have DeprecationTime se to 2022/08/11 and they are not deleted. So this is likely a date when deleting AMIs stopped working. But the snapshots deleting likely never worked. 2) The code query AMIs withFilters=[{"Name": "tag-key", "Values": ["LaunchPermissionRevoked"]}] but as I see this is not tag, but different attribute. But anyway the snapshots were not deleted anyway. There is likely a bug I do not see now.
I tried to delete one of the old snapshots that is still used as base for active AMI (F27) and AWS refused with message:
Failed to delete snapshot. snap-0b271f1b25a3f9b47: The snapshot snap-0b271f1b25a3f9b47 is currently in use by ami-4ba98e24
Based on this founding I propose:
1) Delete **all** snapshots without FedoraGroup tag older than - let say - 2021. This way we can actually review if there are some snapshots other than leftovers form clean-amis that is worth preserving. But right now I am unable to review manually anything. If the snapshot will be linked to live AMI then AWS refuse to delete it and I will ignore such errors. If there will be no objection I will top post this as separate headsup email.
2) Open ticket that owners of fedimg should fix the tooling to delete the snapshots
3) Open tickets that owners of fedimg should delete cleanup AMIs with Deprecation time lower than todays date.
On Thu, Nov 23, 2023 at 04:38:01PM +0100, Miroslav Suchý wrote:
I had time to investigate it a bit:
Thanks for digging into it.
...snip...
Based on this founding I propose:
- Delete **all** snapshots without FedoraGroup tag older than - let say -
- This way we can actually review if there are some snapshots other than
leftovers form clean-amis that is worth preserving. But right now I am unable to review manually anything. If the snapshot will be linked to live AMI then AWS refuse to delete it and I will ignore such errors. If there will be no objection I will top post this as separate headsup email.
Sounds pretty reasonable to me.
Open ticket that owners of fedimg should fix the tooling to delete the snapshots
Open tickets that owners of fedimg should delete cleanup AMIs with Deprecation time lower than todays date.
"Owner of fedimg" is... us I guess? but as far as I know, no one is doing anything with it.
The plan was that the cloud-sig was going to look at a new, better tool to manage uploading. I am not sure what the status of that is.
kevin
On Mon, Nov 27, 2023 at 12:14 PM Kevin Fenzi kevin@scrye.com wrote:
On Thu, Nov 23, 2023 at 04:38:01PM +0100, Miroslav Suchý wrote:
I had time to investigate it a bit:
Thanks for digging into it.
...snip...
Based on this founding I propose:
- Delete **all** snapshots without FedoraGroup tag older than - let say -
- This way we can actually review if there are some snapshots other than
leftovers form clean-amis that is worth preserving. But right now I am unable to review manually anything. If the snapshot will be linked to live AMI then AWS refuse to delete it and I will ignore such errors. If there will be no objection I will top post this as separate headsup email.
Sounds pretty reasonable to me.
Open ticket that owners of fedimg should fix the tooling to delete the snapshots
Open tickets that owners of fedimg should delete cleanup AMIs with Deprecation time lower than todays date.
"Owner of fedimg" is... us I guess? but as far as I know, no one is doing anything with it.
The plan was that the cloud-sig was going to look at a new, better tool to manage uploading. I am not sure what the status of that is.
David Duncan (who I've CC'd to this) has been working on writing a new ansible-based uploader supporting all our cloud targets. He can provide an update on that.
On Thu, 2023-11-23 at 16:38 +0100, Miroslav Suchý wrote: Dne 09. 11. 23 v 20:39 Kevin Fenzi napsal(a):
Well, actually, we should probibly check in on the thing thats cleaning up the amis? and confirm that it is deleting the snapshots?
I think that is this: roles/fedimg/templates/clean-amis.py in ansible.
and it does delete the snapshot... so, perhaps indeed all these ones with vol-ffffffff are some mistake or some other amis?
I had time to investigate it a bit:
I deleted one of the ancient snapshot (from 2018) and AWS did not object. So it is not base image for current AMI (otherwise AWS would refuse to delete it).
The vol-ffffffff is just a placeholder created during the import-snapshot action. It's safe to ignore.
infrastructure@lists.fedoraproject.org