Summary: I think we can fix the ELN side-tag rebuild problems and make the composes more reliable if we change the mechanism for kicking off rebuilds. I'm soliciting feedback to help identify potential issues with this proposed approach.
## Background Information ## Currently in ELN, merging a side-tag into Rawhide results in all of the packages from that side-tag being rebuilt concurrently in ELN. This leads to two problems:
1. Side-tags containing large numbers of package builds will trigger many ELN builds at the same time, possibly overwhelming available resources on the ELN automation systems. 2. Many (most?) side-tags exist to ensure that packages are built in a particular order so as to ensure that they are built after their dependencies. Launching all the rebuilds concurrently means that many of the builds may succeed *and still be wrong* (such as if they are built against an older soname).
## Proposed Solution ## I had a discussion with Miro Hrončok this morning where we tackled this problem and may have come up with a workable solution for 99% of cases. Instead of treating side-tags as a special event and trying to sort the builds such that they are built in the same order, we can instead tag in the Rawhide packages first, then issue the rebuilds together. With the Rawhide packages available, we won't need to worry about the ordering, because the dependencies will already be present in a sufficiently-recent version. As a bonus, we'll reduce the likelihood of broken ELN composes, since if an ELN rebuild fails, the Rawhide version will still be present to satisfy dependencies.
In greater detail:
Whenever a build is tagged into the 'f35' tag (later, whatever tag matches Rawhide), ELN automation would take the following steps:
* Identify whether this package is on the list of packages that ELN rebuilds[1] * Tag the Rawhide build into the 'eln' tag (so it is now tagged with both 'f35' and 'eln') * Enqueue a Koji build against the 'eln' target from the same Git commit
The queue mentioned above should be maintained in a separate thread and used to submit tasks in batches to avoid overloading the infrastructure. If the Koji build against the 'eln' target fails, the Rawhide build will remain as the most-recently-tagged version of the package in ELN and become part of the compose until the ELN rebuild can be fixed.
Note that this process would apply to ALL builds in Rawhide, not just those coming from side-tags. There would be no difference in behavior between standard direct builds and side-tag merged builds.
## Known potential issues ##
* Some packages may auto-detect functionality based on functionality made available by one of their dependencies. If the Rawhide and ELN versions of that dependency differ in visible functionality, then building an ELN package with a Rawhide version of its dependency could result in unexpected behavior. I believe this issue to be rare and generally best handled by the packager as the subject matter expert. They'd just need to bump the release number and rebuild the package in ELN. Alternatively, if this is known to be regularly problematic for a package, the maintainer can opt out of the automatic rebuild and work out a strategy with the ELN SIG for dealing with it.
[1] This will be the set of packages provided by https://tiny.distro.builders/view--view-eln.html minus any packages that have opted out of automatic rebuilds (they perform manual rebuilds for ELN).
On Mon, Jun 28, 2021 at 11:55:21AM -0400, Stephen Gallagher wrote:
Summary: I think we can fix the ELN side-tag rebuild problems and make the composes more reliable if we change the mechanism for kicking off rebuilds. I'm soliciting feedback to help identify potential issues with this proposed approach.
I wonder if it would be better/possible to take builds in the order in which they were built in the side tag with wait-repos between?
ie, chain build the builds from the side tag based on when they were tagged into it? Unless maintainers make some mistake they would do things in the order they need to build, no?
I guess the downside is that this would be linear, and that could take a long time on a large sidetag, but it should work without having to tag in the f34 build?
kevin
On Mon, Jun 28, 2021 at 3:00 PM Kevin Fenzi kevin@scrye.com wrote:
On Mon, Jun 28, 2021 at 11:55:21AM -0400, Stephen Gallagher wrote:
Summary: I think we can fix the ELN side-tag rebuild problems and make the composes more reliable if we change the mechanism for kicking off rebuilds. I'm soliciting feedback to help identify potential issues with this proposed approach.
I wonder if it would be better/possible to take builds in the order in which they were built in the side tag with wait-repos between?
ie, chain build the builds from the side tag based on when they were tagged into it? Unless maintainers make some mistake they would do things in the order they need to build, no?
I guess the downside is that this would be linear, and that could take a long time on a large sidetag, but it should work without having to tag in the f34 build?
This was the original idea I was pursuing, but it has some significant drawbacks, not least of which is that it would take a very long time (and therefore be vulnerable to race-condition issues where other packages are built in ELN in the meantime).
Tagging in the F34 builds is actually desirable here, rather than a drawback; it means that even if the ELN build fails, the compose will maintain installability and dependency validity until the issue is corrected. This in turn means that Content Resolver will be able to continue functioning. Speaking of Content Resolver, this will also solve the issue we have today where adding a new dependency on a package can cause Content Resolver to fail due to the dependent package not yet being in the ELN compose. With this approach, we will still have the Rawhide version available.
On Tue, Jun 29, 2021 at 08:36:36AM -0400, Stephen Gallagher wrote:
On Mon, Jun 28, 2021 at 3:00 PM Kevin Fenzi kevin@scrye.com wrote:
On Mon, Jun 28, 2021 at 11:55:21AM -0400, Stephen Gallagher wrote:
Summary: I think we can fix the ELN side-tag rebuild problems and make the composes more reliable if we change the mechanism for kicking off rebuilds. I'm soliciting feedback to help identify potential issues with this proposed approach.
I wonder if it would be better/possible to take builds in the order in which they were built in the side tag with wait-repos between?
ie, chain build the builds from the side tag based on when they were tagged into it? Unless maintainers make some mistake they would do things in the order they need to build, no?
I guess the downside is that this would be linear, and that could take a long time on a large sidetag, but it should work without having to tag in the f34 build?
This was the original idea I was pursuing, but it has some significant drawbacks, not least of which is that it would take a very long time (and therefore be vulnerable to race-condition issues where other packages are built in ELN in the meantime).
Tagging in the F34 builds is actually desirable here, rather than a drawback; it means that even if the ELN build fails, the compose will maintain installability and dependency validity until the issue is corrected. This in turn means that Content Resolver will be able to continue functioning. Speaking of Content Resolver, this will also solve the issue we have today where adding a new dependency on a package can cause Content Resolver to fail due to the dependent package not yet being in the ELN compose. With this approach, we will still have the Rawhide version available.
Would these rawhide builds go out in ELN composes? Or would you block composes until you had only eln rpms in it?
kevin
On 29. 06. 21 18:59, Kevin Fenzi wrote:
Would these rawhide builds go out in ELN composes?
I suppose they would got here, see below why I think it would be necessary. The amount of "true ELN" builds in ELN compose would be the general "healthiness" factor. When 100 %: perfect. When 90 %: quite good. When less: not great, not terrible. And if the ELN compose is 99 % pure rawhide for years, it is a signal to maybe reconsider the effort.
Or would you block composes until you had only eln rpms in it?
Juts like the rawhide composes are (almost?) never finished complete, I don't think blocking the ELN compose on being 100 % pure ELN is reasonable. It would only make the compose harder to actually consume because it would have tendency to be very old.
On Tue, Jun 29, 2021 at 12:59 PM Kevin Fenzi kevin@scrye.com wrote:
Would these rawhide builds go out in ELN composes? Or would you block composes until you had only eln rpms in it?
They would go out in the composes until they are successfully rebuilt. This is to ensure that Content Resolver continues to have a fresh compose to check dependencies.
Hi,
On Mon, Jun 28, 2021 at 11:55:21AM -0400, Stephen Gallagher wrote:
Summary: I think we can fix the ELN side-tag rebuild problems and make the composes more reliable if we change the mechanism for kicking off rebuilds. I'm soliciting feedback to help identify potential issues with this proposed approach.
did you consider mirroring the rawhide side-tag for eln directly from the beginning, so that a build for the rawhide side-tag triggers a rebuild for the eln side-tag and then the eln side-tag can be merged at a similar time as the rawhide side-tag. This way the build order would be the same (except if there are wait-repo delays that are not visible for the eln automation) and the build load would be distributed.
Cheers Till
On Mon, Jun 28, 2021 at 3:14 PM Till Maas opensource@till.name wrote:
Hi,
On Mon, Jun 28, 2021 at 11:55:21AM -0400, Stephen Gallagher wrote:
Summary: I think we can fix the ELN side-tag rebuild problems and make the composes more reliable if we change the mechanism for kicking off rebuilds. I'm soliciting feedback to help identify potential issues with this proposed approach.
did you consider mirroring the rawhide side-tag for eln directly from the beginning, so that a build for the rawhide side-tag triggers a rebuild for the eln side-tag and then the eln side-tag can be merged at a similar time as the rawhide side-tag. This way the build order would be the same (except if there are wait-repo delays that are not visible for the eln automation) and the build load would be distributed.
Yes, and that was the prevailing implementation idea until we came up with the proposal above. As I noted in the original message, one of the major benefits is that we don't have to have special handling for side-tags; they'll behave the same way as non-side-tag builds.
The real issue there is that the wait-repo delays are impossible to know, so the only option is as Kevin Fenzi noted up-thread: we'd have to do a wait-repo between all builds. For side-tags with many packages, this could take days or more to complete.
On Mon, 28 Jun 2021 11:55:21 -0400 Stephen Gallagher sgallagh@redhat.com wrote:
Summary: I think we can fix the ELN side-tag rebuild problems and make the composes more reliable if we change the mechanism for kicking off rebuilds. I'm soliciting feedback to help identify potential issues with this proposed approach.
have you considered re-using koji-shadow? It might already know everything you need ...
Dan
On Mon, Jun 28, 2021 at 3:33 PM Dan Horák dan@danny.cz wrote:
On Mon, 28 Jun 2021 11:55:21 -0400 Stephen Gallagher sgallagh@redhat.com wrote:
Summary: I think we can fix the ELN side-tag rebuild problems and make the composes more reliable if we change the mechanism for kicking off rebuilds. I'm soliciting feedback to help identify potential issues with this proposed approach.
have you considered re-using koji-shadow? It might already know everything you need ...
But it requires another koji instance that needs maintenance.
Dan
devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
On Mon, 28 Jun 2021 15:43:48 -0400 Mohan Boddu mboddu@bhujji.com wrote:
On Mon, Jun 28, 2021 at 3:33 PM Dan Horák dan@danny.cz wrote:
On Mon, 28 Jun 2021 11:55:21 -0400 Stephen Gallagher sgallagh@redhat.com wrote:
Summary: I think we can fix the ELN side-tag rebuild problems and make the composes more reliable if we change the mechanism for kicking off rebuilds. I'm soliciting feedback to help identify potential issues with this proposed approach.
have you considered re-using koji-shadow? It might already know everything you need ...
But it requires another koji instance that needs maintenance.
that's the default use case from the past (same tag, different koji instances), but probably with a little effort it could use different tags, but same koji instance, which is what ELN needs.
But in any case, what will happen if a rebuild in ELN fails? For proper handling of side tags / soname bumps / bootstrapped packages someone must decide what is right action. Can I safely use an older build? Do I need to fix the failure first? Can I safely rebuild a newer build? This was the kind of baby-sitting we have to do to keep the shadowed arches up-to-date and as-close-possible. Oh, the memories :-)
Dan
On Tue, Jun 29, 2021 at 4:09 AM Dan Horák dan@danny.cz wrote:
On Mon, 28 Jun 2021 15:43:48 -0400 Mohan Boddu mboddu@bhujji.com wrote:
On Mon, Jun 28, 2021 at 3:33 PM Dan Horák dan@danny.cz wrote:
On Mon, 28 Jun 2021 11:55:21 -0400 Stephen Gallagher sgallagh@redhat.com wrote:
Summary: I think we can fix the ELN side-tag rebuild problems and make the composes more reliable if we change the mechanism for kicking off rebuilds. I'm soliciting feedback to help identify potential issues with this proposed approach.
have you considered re-using koji-shadow? It might already know everything you need ...
Yes, I enquired about koji-shadow and was told (rather vociferously) to avoid using it if at all possible.
But it requires another koji instance that needs maintenance.
We don't have the available resources (both physical and human) to support this.
that's the default use case from the past (same tag, different koji instances), but probably with a little effort it could use different tags, but same koji instance, which is what ELN needs.
We don't have the resources to rewrite it either.
But in any case, what will happen if a rebuild in ELN fails? For proper handling of side tags / soname bumps / bootstrapped packages someone must decide what is right action. Can I safely use an older build? Do I need to fix the failure first? Can I safely rebuild a newer build? This was the kind of baby-sitting we have to do to keep the shadowed arches up-to-date and as-close-possible. Oh, the memories :-)
Thank you for succinctly listing all the reasons why we consigned koji-shadow to the Void.
Dealing with the results if an ELN build failed is one of the strong points to the approach I proposed. If the ELN rebuild fails, we fall back to leaving the Rawhide version tagged into ELN. This will keep us from ending up with broken dependency chains as well as not having ELN fall behind Rawhide in terms of functionality. Our current situation is that sometimes a failed build (for example: a rebase) goes unnoticed for some time, since not everyone is monitoring their packages for ELN.
On 6/28/21 5:55 PM, Stephen Gallagher wrote:
Summary: I think we can fix the ELN side-tag rebuild problems and make the composes more reliable if we change the mechanism for kicking off rebuilds. I'm soliciting feedback to help identify potential issues with this proposed approach.
How do you handle packages that need bootstrapping? Several Go packages must be built in a certain order with bootstrapping on, on a virgin branch. It takes auite a lot of time.
Best regards,
Robert-André
On Tue, Jun 29, 2021 at 08:34:06PM +0200, Robert-André Mauchin wrote:
On 6/28/21 5:55 PM, Stephen Gallagher wrote:
Summary: I think we can fix the ELN side-tag rebuild problems and make the composes more reliable if we change the mechanism for kicking off rebuilds. I'm soliciting feedback to help identify potential issues with this proposed approach.
How do you handle packages that need bootstrapping? Several Go packages must be built in a certain order with bootstrapping on, on a virgin branch. It takes auite a lot of time.
After reading the proposal, I assume the following: after the bootstrap is done, you can rebuild any of the packages involved freely. So with the updated package merged into the buildroot from rawhide, you can just rebuild the packages in eln in any order.
Zbyszek
On Tue, Jun 29, 2021 at 2:34 PM Robert-André Mauchin zebob.m@gmail.com wrote:
On 6/28/21 5:55 PM, Stephen Gallagher wrote:
Summary: I think we can fix the ELN side-tag rebuild problems and make the composes more reliable if we change the mechanism for kicking off rebuilds. I'm soliciting feedback to help identify potential issues with this proposed approach.
How do you handle packages that need bootstrapping? Several Go packages must be built in a certain order with bootstrapping on, on a virgin branch. It takes auite a lot of time.
Would they not be able to build atop the Fedora versions that have already been bootstrapped? I'm not sure I understand the situation.
Most bootstrapping scenarios that I'm aware of are essentially:
PackageB needs an updated PackageA to build, but PackageA also needs an updated PackageB to build. PackageA can be bootstrapped (by building it in some special manner, such as from a prebuilt upstream binary), allowing PackageB to be built and then rebuilding PackageA with the updated PackageB.
In the scenario I'm discussing, we would take those final PackageA and PackageB from Fedora and have them in the buildroot for the ELN builds. That would mean that the bootstrap step wouldn't be needed. If there's a case you know of that this won't work for, I'd really like to hear it (preferably with real package names).
On 6/30/21 2:38 PM, Stephen Gallagher wrote:
In the scenario I'm discussing, we would take those final PackageA and PackageB from Fedora and have them in the buildroot for the ELN builds. That would mean that the bootstrap step wouldn't be needed. If there's a case you know of that this won't work for, I'd really like to hear it (preferably with real package names).
Ah ok, I didn't know that ELN had already the Fedora packages as a base. I thought everything was rebuilt from scratch.
Best regards,
Robert-André
On Mon, Jun 28, 2021 at 9:21 AM Stephen Gallagher sgallagh@redhat.com wrote:
Summary: I think we can fix the ELN side-tag rebuild problems and make the composes more reliable if we change the mechanism for kicking off rebuilds. I'm soliciting feedback to help identify potential issues with this proposed approach.
## Background Information ## Currently in ELN, merging a side-tag into Rawhide results in all of the packages from that side-tag being rebuilt concurrently in ELN. This leads to two problems:
- Side-tags containing large numbers of package builds will trigger
many ELN builds at the same time, possibly overwhelming available resources on the ELN automation systems. 2. Many (most?) side-tags exist to ensure that packages are built in a particular order so as to ensure that they are built after their dependencies. Launching all the rebuilds concurrently means that many of the builds may succeed *and still be wrong* (such as if they are built against an older soname).
## Proposed Solution ## I had a discussion with Miro Hrončok this morning where we tackled this problem and may have come up with a workable solution for 99% of cases. Instead of treating side-tags as a special event and trying to sort the builds such that they are built in the same order, we can instead tag in the Rawhide packages first, then issue the rebuilds together. With the Rawhide packages available, we won't need to worry about the ordering, because the dependencies will already be present in a sufficiently-recent version. As a bonus, we'll reduce the likelihood of broken ELN composes, since if an ELN rebuild fails, the Rawhide version will still be present to satisfy dependencies.
In greater detail:
Whenever a build is tagged into the 'f35' tag (later, whatever tag matches Rawhide), ELN automation would take the following steps:
- Identify whether this package is on the list of packages that ELN
rebuilds[1]
- Tag the Rawhide build into the 'eln' tag (so it is now tagged with
both 'f35' and 'eln')
- Enqueue a Koji build against the 'eln' target from the same Git commit
The queue mentioned above should be maintained in a separate thread and used to submit tasks in batches to avoid overloading the infrastructure. If the Koji build against the 'eln' target fails, the Rawhide build will remain as the most-recently-tagged version of the package in ELN and become part of the compose until the ELN rebuild can be fixed.
Note that this process would apply to ALL builds in Rawhide, not just those coming from side-tags. There would be no difference in behavior between standard direct builds and side-tag merged builds.
## Known potential issues ##
- Some packages may auto-detect functionality based on functionality
made available by one of their dependencies. If the Rawhide and ELN versions of that dependency differ in visible functionality, then building an ELN package with a Rawhide version of its dependency could result in unexpected behavior. I believe this issue to be rare and generally best handled by the packager as the subject matter expert. They'd just need to bump the release number and rebuild the package in ELN. Alternatively, if this is known to be regularly problematic for a package, the maintainer can opt out of the automatic rebuild and work out a strategy with the ELN SIG for dealing with it.
[1] This will be the set of packages provided by https://tiny.distro.builders/view--view-eln.html minus any packages that have opted out of automatic rebuilds (they perform manual rebuilds for ELN).
Two issues I see deal with failed builds and new dependencies. 1 - failed builds. Will there be an easy way for the ELN SIG (or whoever) to see what the failed builds are? Or are all of these builds fire and forget?
2 - new dependencies. Package foo (in ELN list) get's a new dependency bar (not in ELN list). bar will already be built when foo gets updated and built in rawhide and ELN. bar will eventually get put on the ELN list. But with your proposal, bar has the potential to not be built in ELN for 6 months. It would be nice if there was still something like ELN periodic that checked what packages haven't been built and attempts to rebuild them. I know we've had a problem in the past with it spamming due to retrying failed builds multiple times. But it is there for a reason.
Troy
On 29. 06. 21 21:10, Troy Dawson wrote:
Two issues I see deal with failed builds and new dependencies. 1 - failed builds. Will there be an easy way for the ELN SIG (or whoever) to see what the failed builds are? Or are all of these builds fire and forget?
Previously, failed build resulted in outdated ELN content and outdated ELN builroot and hence more failed (or "wrong") builds.
When nobody has time to deal with the failures, we eventually end up with a very old rawhide snapshot where nothing builds. Once the human operator gets to it, they need to manually rebbootstrap everything or essentially use the current rawhide to populate ELN with "fresh" content once again.
With this proposal, failed builds result in more rawhide content in ELN buildroot which does not degrade over time. Worst case scenario where no ELN builds are possible for a long time, we'll end up with... 100 % rawhide content. Once the failure is fixed, the content starts to be more and more % ELN gradually over time.
2 - new dependencies. Package foo (in ELN list) get's a new dependency bar (not in ELN list). bar will already be built when foo gets updated and built in rawhide and ELN. bar will eventually get put on the ELN list. But with your proposal, bar has the potential to not be built in ELN for 6 months.
I don't understand how is this different than the status quo. Current ELN koji buildroot already "sees" all Rawhide packages that aren't in ELN.
It would be nice if there was still something like ELN periodic that checked what packages haven't been built and attempts to rebuild them. I know we've had a problem in the past with it spamming due to retrying failed builds multiple times. But it is there for a reason.
That is still necessary with this proposal (although it does not need to be that aggressive).