Hi All,
In a panel session at devconf this past weekend, Dennis mentioned some possible plans to change how secondary architectures work. Primarily, the builds for every arch included in a package's set would be required to build successfully, even if the arch is a secondary arch. However, the compose of that architecture could fail and the rest would still be pushed. I hope I summarized that correctly.
That leaves me a bit confused. The major distinction from a developer's perspective today is that build failures on a secondary architecture do not fail the build on primary. The compose of a secondary architecture is even one step further removed from their workflow. With the proposed change, there is very much no distinction between primary and secondary architectures for a package maintainer.
The assumption here seems to be that they can ExcludeArch a failing architecture and then resubmit the build. That is certainly possible, particularly with the proposed notification to the secondary architecture maintainers helping. However, for packages which take a significant amount of time to build in general, this is going to have an impact. Even if we assume there is no difference between arches in terms of build performance, waiting 3 or 4 hours for a build to fail because a secondary arch fails is really irritating.
While it doesn't solve the overall irritation factor, I have a small suggestion. Today, if an architecture fails to build the remainder of the builds are immediately canceled. If this proposed change to koji happens, I would like to suggest we not do that. Instead, I would suggest letting all builds on the various architectures run to their natural completion and if one fails, send a failure notification on a per arch basis as soon as that task fails. This allows the maintainer to verify which arches a package builds on and which it does not. If they wish to cancel the build upon an arch failure notification, they still can do so with koji cancel. The build as a whole could still be failed, but only after all arch tasks are complete.
This might not seem like an issue to most packages, but I do know that in the kernel we hit different failures on different arches at different points in a single build quite often. E.g. a driver will fail to build on arm and cancel the whole build early. Then perf will fail to build on i686 but work on x86_64, which comes much later in the build process. In a theoretical world of expanding architecture support, I very much don't want to rinse and repeat a build any more than is necessary. Allowing us to see how each arch fares individually helps avoid that problem.
As an aside, I'm not fully convinced this koji change is a great idea. ExcludeArch is the hammer that will get used most to "fix" failures, and that isn't helping resolve the underlying issues. For things like the kernel, gcc, or glibc, it isn't even really an option. Yes, we can use ExcludeArch but if we do so then there is no possibility of doing a successful _useful_ compose anyway. However, maybe it won't be so bad. I prefer to focus on my suggested idea above for now, so just log this paragraph as a note of caution perhaps.
josh
On Tuesday, February 09, 2016 01:57:04 PM Josh Boyer wrote:
Hi All,
In a panel session at devconf this past weekend, Dennis mentioned some possible plans to change how secondary architectures work. Primarily, the builds for every arch included in a package's set would be required to build successfully, even if the arch is a secondary arch. However, the compose of that architecture could fail and the rest would still be pushed. I hope I summarized that correctly.
Builds completing across all arches is fundamental to how koji works. Ensuring the same nvrs across arches is really the only sane way to build and ship. We have been talking about changing things for over a year now. It has been something in my talks on what we are doing in release engineering land for at least the last two devconfs and last years flock.
That leaves me a bit confused. The major distinction from a developer's perspective today is that build failures on a secondary architecture do not fail the build on primary. The compose of a secondary architecture is even one step further removed from their workflow. With the proposed change, there is very much no distinction between primary and secondary architectures for a package maintainer.
In most of cases it will not matter at all. Most of the core systems the people doing the work in fedora do so also in RHEL, and it is easier for them to do all arches at once. Jakub for instance does test builds of gcc across all the kojis, in this world we have made his life simpler. Same for all the tool chain people.
The assumption here seems to be that they can ExcludeArch a failing architecture and then resubmit the build. That is certainly possible, particularly with the proposed notification to the secondary architecture maintainers helping. However, for packages which take a significant amount of time to build in general, this is going to have an impact. Even if we assume there is no difference between arches in terms of build performance, waiting 3 or 4 hours for a build to fail because a secondary arch fails is really irritating.
While it doesn't solve the overall irritation factor, I have a small suggestion. Today, if an architecture fails to build the remainder of the builds are immediately canceled. If this proposed change to koji happens, I would like to suggest we not do that. Instead, I would suggest letting all builds on the various architectures run to their natural completion and if one fails, send a failure notification on a per arch basis as soon as that task fails. This allows the maintainer to verify which arches a package builds on and which it does not. If they wish to cancel the build upon an arch failure notification, they still can do so with koji cancel. The build as a whole could still be failed, but only after all arch tasks are complete.
I think this a reasonable RFE for koji. Given that computing costs today are less than they were 10 years ago when koji was developed I think it brings more benefit than cost.
This might not seem like an issue to most packages, but I do know that in the kernel we hit different failures on different arches at different points in a single build quite often. E.g. a driver will fail to build on arm and cancel the whole build early. Then perf will fail to build on i686 but work on x86_64, which comes much later in the build process. In a theoretical world of expanding architecture support, I very much don't want to rinse and repeat a build any more than is necessary. Allowing us to see how each arch fares individually helps avoid that problem.
The kernel seems to be a special case and I think that we should find ways to make development and testing simpler, I also think we need to find ways to get more people helping the kernel team.
As an aside, I'm not fully convinced this koji change is a great idea. ExcludeArch is the hammer that will get used most to "fix" failures, and that isn't helping resolve the underlying issues. For things like the kernel, gcc, or glibc, it isn't even really an option. Yes, we can use ExcludeArch but if we do so then there is no possibility of doing a successful _useful_ compose anyway. However, maybe it won't be so bad. I prefer to focus on my suggested idea above for now, so just log this paragraph as a note of caution perhaps.
I think you are over estimating the amount of times it will be used. And there are some cases where it will not be possible, the kernel is one of them, without kernel-headers we can not build a single thing. But outside of the kernel everyone else in this boat needs to ensure it works for other reasons, knowing straight away they still have the context of the changes in their brain and do not need to context shift a week or two later if a secondary has fallen behind for some reason.
Another benefit is that the 2-3 people running koji-shadow and mirroring processes will be freed from those tasks over time, with significantly less work immediately . We will still need to run koji-shadow and do updates on the secondary kojis until releases shipped from them have gone EOL. This will free up a lot of cycles for people to help with fixing issues rather than babysitting processes.
Dennis
rel-eng@lists.fedoraproject.org