On Mon, Jun 28, 2021 at 9:21 AM Stephen Gallagher <sgallagh@redhat.com> wrote:
Summary: I think we can fix the ELN side-tag rebuild problems and make
the composes more reliable if we change the mechanism for kicking off
rebuilds. I'm soliciting feedback to help identify potential issues
with this proposed approach.


## Background Information ##
Currently in ELN, merging a side-tag into Rawhide results in all of
the packages from that side-tag being rebuilt concurrently in ELN.
This leads to two problems:

 1. Side-tags containing large numbers of package builds will trigger
many ELN builds at the same time, possibly overwhelming available
resources on the ELN automation systems.
 2. Many (most?) side-tags exist to ensure that packages are built in
a particular order so as to ensure that they are built after their
dependencies. Launching all the rebuilds concurrently means that many
of the builds may succeed *and still be wrong* (such as if they are
built against an older soname).

## Proposed Solution ##
I had a discussion with Miro Hrončok this morning where we tackled
this problem and may have come up with a workable solution for 99% of
cases. Instead of treating side-tags as a special event and trying to
sort the builds such that they are built in the same order, we can
instead tag in the Rawhide packages first, then issue the rebuilds
together. With the Rawhide packages available, we won't need to worry
about the ordering, because the dependencies will already be present
in a sufficiently-recent version. As a bonus, we'll reduce the
likelihood of broken ELN composes, since if an ELN rebuild fails, the
Rawhide version will still be present to satisfy dependencies.

In greater detail:

Whenever a build is tagged into the 'f35' tag (later, whatever tag
matches Rawhide), ELN automation would take the following steps:

 * Identify whether this package is on the list of packages that ELN rebuilds[1]
 * Tag the Rawhide build into the 'eln' tag (so it is now tagged with
both 'f35' and 'eln')
 * Enqueue a Koji build against the 'eln' target from the same Git commit

The queue mentioned above should be maintained in a separate thread
and used to submit tasks in batches to avoid overloading the
infrastructure. If the Koji build against the 'eln' target fails, the
Rawhide build will remain as the most-recently-tagged version of the
package in ELN and become part of the compose until the ELN rebuild
can be fixed.

Note that this process would apply to ALL builds in Rawhide, not just
those coming from side-tags. There would be no difference in behavior
between standard direct builds and side-tag merged builds.


## Known potential issues ##

 * Some packages may auto-detect functionality based on functionality
made available by one of their dependencies. If the Rawhide and ELN
versions of that dependency differ in visible functionality, then
building an ELN package with a Rawhide version of its dependency could
result in unexpected behavior. I believe this issue to be rare and
generally best handled by the packager as the subject matter expert.
They'd just need to bump the release number and rebuild the package in
ELN. Alternatively, if this is known to be regularly problematic for a
package, the maintainer can opt out of the automatic rebuild and work
out a strategy with the ELN SIG for dealing with it.



[1] This will be the set of packages provided by
https://tiny.distro.builders/view--view-eln.html minus any packages
that have opted out of automatic rebuilds (they perform manual
rebuilds for ELN).


Two issues I see deal with failed builds and new dependencies.
1 - failed builds.
Will there be an easy way for the ELN SIG (or whoever) to see what the failed builds are?  Or are all of these builds fire and forget?

2 - new dependencies.
Package foo (in ELN list) get's a new dependency bar (not in ELN list).  bar will already be built when foo gets updated and built in rawhide and ELN. bar will eventually get put on the ELN list.  But with your proposal, bar has the potential to not be built in ELN for 6 months.
It would be nice if there was still something like ELN periodic that checked what packages haven't been built and attempts to rebuild them.  I know we've had a problem in the past with it spamming due to retrying failed builds multiple times.  But it is there for a reason.

Troy