Greetings.
As some of you may know, our fedora_koji volume is hitting up against some limits (namely the netapp 100TB per volume limit). If it hits 100TB used, the netapp folks tell me it will go offline and we will need to do special things to free up any space and get it working again. Obviously, we wish to avoid that.
So, I think we can move /mnt/fedora_koji/koji/compose with minimal disruption and give us a bunch of room and actually make things faster.
Here's my tenative plan:
* create ~15-20TB volume on one of our ssd aggregates. * rsync all of /mnt/fedora_koji/koji/compose/ to it. * Schedule a changeover time/date. * Make sure no composes or updates pushes are running. (This should be possible after branched/rawhide, but before updates and before we are making rc's) * Do another sync of content so the new copy is up to date. (I am not sure how long a rsync will take, but we can figure it out) * move the old directory to compose.old * mount the new space on koji01/02, kojipkgs01/02, all compose channel builders, compose-x86-01. Nothing else should need it. * Wait a short while * delete compose.old
This should free up about 13TB or so on the main volume, reduce snapshot churn on it, make composes faster because they will be on ssd instead of sas drives, and all around be nicer.
I think this can be done during some day without really causing much outage. Because the koji space is so tight I would like to do it soon, and I think it best to do it before we are too close to release. So, later this week or early next week?
Thoughts? +1s? alternative ideas?
kevin
On Wed, 22 Feb 2023 at 12:51, Kevin Fenzi kevin@scrye.com wrote:
Greetings.
As some of you may know, our fedora_koji volume is hitting up against some limits (namely the netapp 100TB per volume limit). If it hits 100TB used, the netapp folks tell me it will go offline and we will need to do special things to free up any space and get it working again. Obviously, we wish to avoid that.
So, I think we can move /mnt/fedora_koji/koji/compose with minimal disruption and give us a bunch of room and actually make things faster.
Here's my tenative plan:
- create ~15-20TB volume on one of our ssd aggregates.
- rsync all of /mnt/fedora_koji/koji/compose/ to it.
- Schedule a changeover time/date.
- Make sure no composes or updates pushes are running.
(This should be possible after branched/rawhide, but before updates and before we are making rc's)
- Do another sync of content so the new copy is up to date.
(I am not sure how long a rsync will take, but we can figure it out)
- move the old directory to compose.old
- mount the new space on koji01/02, kojipkgs01/02, all compose channel
builders, compose-x86-01. Nothing else should need it.
- Wait a short while
- delete compose.old
This should free up about 13TB or so on the main volume, reduce snapshot churn on it, make composes faster because they will be on ssd instead of sas drives, and all around be nicer.
I think this can be done during some day without really causing much outage. Because the koji space is so tight I would like to do it soon, and I think it best to do it before we are too close to release. So, later this week or early next week?
Thoughts? +1s? alternative ideas?
+1. Do you have snapshot administration rights to remove the giant one which will happen after deleting compose.old?
On 2/22/23 12:51, Kevin Fenzi wrote:
Greetings.
As some of you may know, our fedora_koji volume is hitting up against some limits (namely the netapp 100TB per volume limit). If it hits 100TB used, the netapp folks tell me it will go offline and we will need to do special things to free up any space and get it working again. Obviously, we wish to avoid that.
So, I think we can move /mnt/fedora_koji/koji/compose with minimal disruption and give us a bunch of room and actually make things faster.
Here's my tenative plan:
- create ~15-20TB volume on one of our ssd aggregates.
- rsync all of /mnt/fedora_koji/koji/compose/ to it.
- Schedule a changeover time/date.
- Make sure no composes or updates pushes are running.
(This should be possible after branched/rawhide, but before updates and before we are making rc's)
- Do another sync of content so the new copy is up to date.
(I am not sure how long a rsync will take, but we can figure it out)
- move the old directory to compose.old
- mount the new space on koji01/02, kojipkgs01/02, all compose channel
builders, compose-x86-01. Nothing else should need it.
- Wait a short while
- delete compose.old
This should free up about 13TB or so on the main volume, reduce snapshot churn on it, make composes faster because they will be on ssd instead of sas drives, and all around be nicer.
I think this can be done during some day without really causing much outage. Because the koji space is so tight I would like to do it soon, and I think it best to do it before we are too close to release. So, later this week or early next week?
Thoughts? +1s? alternative ideas?
I just want to make sure our ostree use cases are considered here. I think we are already on our own separate volume, so maybe this has no impact, but I do know at least the mount paths include `compose` in them so I'll list out what we do and the desire for it to continue to work:
1. pungi composes - composing into compose/ostree/repo 2. coreos-ostree-importer - importing into /mnt/koji/compose/ostree/repo - https://github.com/coreos/fedora-coreos-releng-automation/blob/main/coreos-o... 3. fedora-ostree-pruner - pruning /mnt/koji/compose/ostree/repo - https://github.com/coreos/fedora-coreos-releng-automation/blob/main/fedora-o... 4. the `fedora-compose` ostree repo (accessible via clients for testing purposes) - https://src.fedoraproject.org/rpms/fedora-repos/blob/rawhide/f/fedora-compos...
Dusty
On Wed, 22 Feb 2023 at 15:37, Dusty Mabe dusty@dustymabe.com wrote:
On 2/22/23 12:51, Kevin Fenzi wrote:
Greetings.
As some of you may know, our fedora_koji volume is hitting up against some limits (namely the netapp 100TB per volume limit). If it hits 100TB used, the netapp folks tell me it will go offline and we will need to do special things to free up any space and get it working again. Obviously, we wish to avoid that.
So, I think we can move /mnt/fedora_koji/koji/compose with minimal disruption and give us a bunch of room and actually make things faster.
Here's my tenative plan:
- create ~15-20TB volume on one of our ssd aggregates.
- rsync all of /mnt/fedora_koji/koji/compose/ to it.
- Schedule a changeover time/date.
- Make sure no composes or updates pushes are running.
(This should be possible after branched/rawhide, but before updates and before we are making rc's)
- Do another sync of content so the new copy is up to date.
(I am not sure how long a rsync will take, but we can figure it out)
- move the old directory to compose.old
- mount the new space on koji01/02, kojipkgs01/02, all compose channel
builders, compose-x86-01. Nothing else should need it.
- Wait a short while
- delete compose.old
This should free up about 13TB or so on the main volume, reduce snapshot churn on it, make composes faster because they will be on ssd instead of sas drives, and all around be nicer.
I think this can be done during some day without really causing much outage. Because the koji space is so tight I would like to do it soon, and I think it best to do it before we are too close to release. So, later this week or early next week?
Thoughts? +1s? alternative ideas?
I just want to make sure our ostree use cases are considered here. I think we are already on our own separate volume, so maybe this has no impact, but I do know at least the mount paths include `compose` in them so I'll list out what we do and the desire for it to continue to work:
- pungi composes - composing into compose/ostree/repo
- coreos-ostree-importer - importing into /mnt/koji/compose/ostree/repo
https://github.com/coreos/fedora-coreos-releng-automation/blob/main/coreos-o... 3. fedora-ostree-pruner - pruning /mnt/koji/compose/ostree/repo - https://github.com/coreos/fedora-coreos-releng-automation/blob/main/fedora-o...
these three look to be on Filesystem Size Used Avail Use% Mounted on ntap-iad2-c02-fedora01-nfs01a:/fedora_ostree_content/compose/ostree 5.5T 5.3T 279G 96% /mnt/fedora_koji/koji/compose/ostree
- the `fedora-compose` ostree repo (accessible via clients for testing
purposes) - https://src.fedoraproject.org/rpms/fedora-repos/blob/rawhide/f/fedora-compos...
Dusty _______________________________________________ infrastructure mailing list -- infrastructure@lists.fedoraproject.org To unsubscribe send an email to infrastructure-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@lists.fedorapro... Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
On Wed, Feb 22, 2023 at 03:37:28PM -0500, Dusty Mabe wrote:
I just want to make sure our ostree use cases are considered here. I think we are already on our own separate volume, so maybe this has no impact, but I do
Correct. You are on other volumes and will only be impacted in that you shouldn't do any composes when we move things around.
know at least the mount paths include `compose` in them so I'll list out what we do and the desire for it to continue to work:
- pungi composes - composing into compose/ostree/repo
- coreos-ostree-importer - importing into /mnt/koji/compose/ostree/repo
- fedora-ostree-pruner - pruning /mnt/koji/compose/ostree/repo
- the `fedora-compose` ostree repo (accessible via clients for testing purposes)
Right, when we move the compose dir we will need to pause these things. But hopefully thats just a short cutover.
Thanks for the input!
kevin
Kevin Fenzi píše v St 22. 02. 2023 v 09:51 -0800:
Greetings.
As some of you may know, our fedora_koji volume is hitting up against some limits (namely the netapp 100TB per volume limit). If it hits 100TB used, the netapp folks tell me it will go offline and we will need to do special things to free up any space and get it working again. Obviously, we wish to avoid that.
So, I think we can move /mnt/fedora_koji/koji/compose with minimal disruption and give us a bunch of room and actually make things faster.
Here's my tenative plan:
- create ~15-20TB volume on one of our ssd aggregates.
- rsync all of /mnt/fedora_koji/koji/compose/ to it.
- Schedule a changeover time/date.
- Make sure no composes or updates pushes are running.
(This should be possible after branched/rawhide, but before updates and before we are making rc's)
- Do another sync of content so the new copy is up to date.
(I am not sure how long a rsync will take, but we can figure it out)
- move the old directory to compose.old
- mount the new space on koji01/02, kojipkgs01/02, all compose
channel builders, compose-x86-01. Nothing else should need it.
- Wait a short while
- delete compose.old
This should free up about 13TB or so on the main volume, reduce snapshot churn on it, make composes faster because they will be on ssd instead of sas drives, and all around be nicer.
I think this can be done during some day without really causing much outage. Because the koji space is so tight I would like to do it soon, and I think it best to do it before we are too close to release. So, later this week or early next week?
Thoughts? +1s? alternative ideas?
Hello,
are hardlinks taken into account the measurement?
In the current setup the composes are stored on the same volume that Koji is using, right? That allows pungi to hardlink RPMs instead of copying them around.
If the volumes are separate, it will have to copy the data over. This should work automatically, but may negate some of the benefits of having faster storage.
Lubomír
kevin _______________________________________________ infrastructure mailing list -- infrastructure@lists.fedoraproject.org To unsubscribe send an email to infrastructure-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@lists.fedorapro... Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
On Thu, Feb 23, 2023 at 08:33:41AM +0100, Lubomír Sedlář wrote:
Hello,
are hardlinks taken into account the measurement?
In the current setup the composes are stored on the same volume that Koji is using, right? That allows pungi to hardlink RPMs instead of copying them around.
If the volumes are separate, it will have to copy the data over. This should work automatically, but may negate some of the benefits of having faster storage.
Yeah, it would prevent the hardlinking there indeed. Of course branched (if available) and rawhide could hardlink on the new volume (and/or the netapp could de-duplicate identical blocks)
I suppose I could see about coming up with a test to see how much that might affect things.
kevin
The lack of hardlinking for composes could be pretty anoying.
An alternative to moving /compose off might be to move off:
/work (failed build logs from the past week/in progress builds) /repos (all the buildroot repos) /scratch (all scratch builds)
I think that would move a similar amount of space (~15TB).
However, we would definitely need an outage for that. No builds could be going when we move things. ;(
I've dropped an email to koji developers for advice...
So, stay tuned.
At least right now we have about ~11TB free, so hopefully we can last a while longer before we have to do anything.
kevin
infrastructure@lists.fedoraproject.org