On Tuesday, February 09, 2016 01:57:04 PM Josh Boyer wrote:
Hi All,
In a panel session at devconf this past weekend, Dennis mentioned some
possible plans to change how secondary architectures work. Primarily,
the builds for every arch included in a package's set would be
required to build successfully, even if the arch is a secondary arch.
However, the compose of that architecture could fail and the rest
would still be pushed. I hope I summarized that correctly.
Builds completing across all arches is fundamental to how koji works. Ensuring
the same nvrs across arches is really the only sane way to build and ship. We
have been talking about changing things for over a year now. It has been
something in my talks on what we are doing in release engineering land for at
least the last two devconfs and last years flock.
That leaves me a bit confused. The major distinction from a
developer's perspective today is that build failures on a secondary
architecture do not fail the build on primary. The compose of a
secondary architecture is even one step further removed from their
workflow. With the proposed change, there is very much no distinction
between primary and secondary architectures for a package maintainer.
In most of
cases it will not matter at all. Most of the core systems the
people doing the work in fedora do so also in RHEL, and it is easier for them
to do all arches at once. Jakub for instance does test builds of gcc across
all the kojis, in this world we have made his life simpler. Same for all the
tool chain people.
The assumption here seems to be that they can ExcludeArch a failing
architecture and then resubmit the build. That is certainly possible,
particularly with the proposed notification to the secondary
architecture maintainers helping. However, for packages which take a
significant amount of time to build in general, this is going to have
an impact. Even if we assume there is no difference between arches in
terms of build performance, waiting 3 or 4 hours for a build to fail
because a secondary arch fails is really irritating.
While it doesn't solve the overall irritation factor, I have a
small
suggestion. Today, if an architecture fails to build the remainder of
the builds are immediately canceled. If this proposed change to koji
happens, I would like to suggest we not do that. Instead, I would
suggest letting all builds on the various architectures run to their
natural completion and if one fails, send a failure notification on a
per arch basis as soon as that task fails. This allows the maintainer
to verify which arches a package builds on and which it does not. If
they wish to cancel the build upon an arch failure notification, they
still can do so with koji cancel. The build as a whole could still be
failed, but only after all arch tasks are complete.
I think this a reasonable RFE
for koji. Given that computing costs today are
less than they were 10 years ago when koji was developed I think it brings
more benefit than cost.
This might not seem like an issue to most packages, but I do know
that
in the kernel we hit different failures on different arches at
different points in a single build quite often. E.g. a driver will
fail to build on arm and cancel the whole build early. Then perf will
fail to build on i686 but work on x86_64, which comes much later in
the build process. In a theoretical world of expanding architecture
support, I very much don't want to rinse and repeat a build any more
than is necessary. Allowing us to see how each arch fares
individually helps avoid that problem.
The kernel seems to be a special case and I
think that we should find ways to
make development and testing simpler, I also think we need to find ways to get
more people helping the kernel team.
As an aside, I'm not fully convinced this koji change is a great
idea.
ExcludeArch is the hammer that will get used most to "fix" failures,
and that isn't helping resolve the underlying issues. For things like
the kernel, gcc, or glibc, it isn't even really an option. Yes, we
can use ExcludeArch but if we do so then there is no possibility of
doing a successful _useful_ compose anyway. However, maybe it won't
be so bad. I prefer to focus on my suggested idea above for now, so
just log this paragraph as a note of caution perhaps.
I think you are over estimating the amount of times it will be used. And there
are some cases where it will not be possible, the kernel is one of them,
without kernel-headers we can not build a single thing. But outside of the
kernel everyone else in this boat needs to ensure it works for other reasons,
knowing straight away they still have the context of the changes in their
brain and do not need to context shift a week or two later if a secondary has
fallen behind for some reason.
Another benefit is that the 2-3 people running koji-shadow and mirroring
processes will be freed from those tasks over time, with significantly less
work immediately . We will still need to run koji-shadow and do updates on the
secondary kojis until releases shipped from them have gone EOL. This will free
up a lot of cycles for people to help with fixing issues rather than
babysitting processes.
Dennis