== Summary == Fedora currently uses the original K8 micro-architecture (without 3DNow! and other AMD-specific parts) as the baseline for its <code>x86_64</code> architecture. This baseline dates back to 2003 and has not been updated since. As a result, performance of Fedora is not as good as it could be on current CPUs.
This change to update the micro-architecture level for the architecture to something more recent.
== Owner == * Name: [[User:fweimer| Florian Weimer]] * Email: []
== Detailed Description ==
After preliminary discussions with CPU vendors, we propose AVX2 as the new baseline. AVX2 support was introduced into CPUs from 2013 to 2015. See [ CPUs with AVX2].
Along with AVX2, it makes sense to enable certain other CPU features which are not strictly implied by AVX2, such as CMPXCHG16B, FMA, and earlier vector extensions such as SSE 4.2. Details are still being worked out.
A test rebuild of a distribution largely based on Fedora 28 showed that there is only a small number of build failures due to the baseline switch. Very few packages are confused about the availability of the CMPXCHG16B instruction, leading to linking failures related to <code>-latomic</code>, and there are some hard-coded floating point results that could change due to vectorization. (The latter is within bounds of the usual cross-architecture variation for such tests.)
== Benefit to Fedora ==
Fedora will use current CPUs more efficiently, increasing performance and reducing power consumption.
Moreover, when Fedora is advertised as a distribution by a compute service provider, users can be certain that their AVX2-optimized software will run in this environment.
== Scope == * Proposal owners: Update the <code>gcc</code> and <code>redhat-rpm-config</code> package to implement the new compiler flags. It is expected that the new baseline will be implemented in a new GCC <code>-march=</code> option for convenience.
* Other developers: Other developers may have to adjust test suites which expect exact floating point results, and correct linking with <code>libatomic</code>. They will also have to upgrade their x86-64 machines to something that can execute AVX2 instructions.
* Release engineering: [ #8513] ** All Fedora builders need to be AVX2-capable. ** Infrastructure ticket: [ #7968] * Policies and guidelines: No guidelines need to be changed. * Trademark approval: N/A (not needed for this Change)
== Upgrade/compatibility impact == Fedora installations on systems with CPUs which are not able to execute AVX2 instructions will not be able to upgrade.
== How To Test == General system testing will provide test coverage for this change.
== User Experience == User should observe improved performance and, likely, battery life. Developers will benefit from the knowledge that code with AVX2 optimizations will run wherever Fedora runs.
== Dependencies == There are no direct dependencies on this change at this time.
== Contingency Plan == It is possible to not implement this change, or implement a smaller subset of it (adopting the CMPXCHG16B instruction only, for example).
* Contingency mechanism: Mass rebuild with different/previous compiler glags. * Contingency deadline: Final mass rebuild. * Blocks release? No. * Blocks product? No.
== Documentation == The new micro-architecture baseline and the resulting requirements need to be documented.
== Release Notes == Release notes must mention how users can determine whether their system supports AVX2 prior to upgrading, for example by running <code>grep avx2 /proc/cpuinfo</code>.
On 22/07/2019 19:51, Ben Cotton wrote:
After preliminary discussions with CPU vendors, we propose AVX2 as the new baseline. AVX2 support was introduced into CPUs from 2013 to 2015. See [ CPUs with AVX2].
Going all the way to AVX2 seems like it would be, in the words of Sir Humphrey, a very bold idea.
I just checked the five machines I run Fedora 30 on at home and exactly one of them is AVX2 capable, namely my laptop.
My desktop is about six years old, and doesn't have it, though that is likely to be updated soon.
My PVR machine is a similar age and I had no particular plans to upgrade that currently.
My firewall is brand new, built a few months ago to replace a 32 bit machine because Fedora was deprecating that! Yet it is a low end Celeron CPU and has no AVX2 support.
The final one is a VM from a cloud provider and even updating to their latest hardware profile doesn't get me an AVX2 capable system.
I will need to check but I suspect there will be a fair few production systems at work that are missing support as well.
On 22/07/2019 20:20, Tom Hughes wrote:
My firewall is brand new, built a few months ago to replace a 32 bit machine because Fedora was deprecating that! Yet it is a low end Celeron CPU and has no AVX2 support.
The final one is a VM from a cloud provider and even updating to their latest hardware profile doesn't get me an AVX2 capable system.
By the way these two don't even have AVX never mind AVX2.
I will need to check but I suspect there will be a fair few production systems at work that are missing support as well.
Out of 31 machines running F29 or F30 at work there are only 9 with AVX2 support and only 18 with any AVX support.
On 22/07/2019 20:42, Tom Hughes wrote:
On 22/07/2019 20:20, Tom Hughes wrote:
I will need to check but I suspect there will be a fair few production systems at work that are missing support as well.
Out of 31 machines running F29 or F30 at work there are only 9 with AVX2 support and only 18 with any AVX support.
Out of interest I checked the OpenStreetMap servers - they are actually running Ubuntu not Fedora but I figured it's an interesting set of randomish production machines.
We have 85 servers enrolled in chef of which 34 have the basic AVX and only 19 have AVX2 support.
"BC" == Ben Cotton writes:
BC> * Other developers: Other developers may have to adjust test suites BC> which expect exact floating point results, and correct linking with BC> <code>libatomic</code>. They will also have to upgrade their x86-64 BC> machines to something that can execute AVX2 instructions.
BC> == Upgrade/compatibility impact ==
BC> Fedora installations on systems with CPUs which are not able to BC> execute AVX2 instructions will not be able to upgrade.
Wow. I understand progress, but I have to say that it's not really cool to toss this bomb out there without some more detailed breakdown of the impact.
For my part, I try to keep my equipment relatively up to date but I don't want to throw something away if it's still perfectly useful. And, let's see, I'd have to toss out five desktops (which isn't too bad, I guess) and probably forty perfectly functional servers, some of which aren't really even all that old. Heck, a dozen computational servers would be on the block. Even requiring avx would force me to toss a pretty big pile of stuff.
Basically, this would force me to use something other than Fedora. I'd have no choice, since it wouldn't work. I don't want to be that guy with the 20mhz 386 that still wants others to make sure his stuff works, but still, this seems like it's going more than just a bit too far.
- J<
"JLT" == Jason L Tibbitts writes:
JLT> And, let's see, I'd have to toss out five desktops (which isn't too JLT> bad, I guess)
I was wrong. It would be 36 desktops. Being charitable requires me to assume this was proposed without adequate consideration of just how much hardware is involved here.
- J<
My entire involvement around Fedora is based on the fact that I was able to use machines that had been thrown away because they were deemed ‘too old’. I have several servers and multiple laptops that run Fedora perfectly and none of them would meet this requirement, effectively ending any chance of using Fedora going forward.
Perhaps as a compromise there could be a ‘regular’ 64-bit and a 64-bit-optimized-for-machines-made-after-2013 version?
On 22 Jul 2019, at 14:23, Jason L Tibbitts III wrote:
"BC" == Ben Cotton writes:
BC> * Other developers: Other developers may have to adjust test suites BC> which expect exact floating point results, and correct linking with BC> <code>libatomic</code>. They will also have to upgrade their x86-64 BC> machines to something that can execute AVX2 instructions.
BC> == Upgrade/compatibility impact ==
BC> Fedora installations on systems with CPUs which are not able to BC> execute AVX2 instructions will not be able to upgrade.
Wow. I understand progress, but I have to say that it's not really cool to toss this bomb out there without some more detailed breakdown of the impact.
For my part, I try to keep my equipment relatively up to date but I don't want to throw something away if it's still perfectly useful. And, let's see, I'd have to toss out five desktops (which isn't too bad, I guess) and probably forty perfectly functional servers, some of which aren't really even all that old. Heck, a dozen computational servers would be on the block. Even requiring avx would force me to toss a pretty big pile of stuff.
Basically, this would force me to use something other than Fedora. I'd have no choice, since it wouldn't work. I don't want to be that guy with the 20mhz 386 that still wants others to make sure his stuff works, but still, this seems like it's going more than just a bit too far.
- J<
devel mailing list -- To unsubscribe send an email to Fedora Code of Conduct: List Guidelines: List Archives:
On Mon, Jul 22, 2019 at 03:05:15PM -0500, Ron Olson wrote:
Perhaps as a compromise there could be a ‘regular’ 64-bit and a 64-bit-optimized-for-machines-made-after-2013 version?
It's not as simple as a "CPU newer than date X" cutoff -- Intel limits AVX support to their Xeon and Core brands only. Their current-gen lower-end stuff (sold as Pentium, Celeron and Atom) doesn't support AVX of any flavor. And they sell a _lot_ of those.
I think AMD's situation is a bit better, with all of their current processors supporting AVX, and only one (Family 16h) lacking AVX2.
- Solomon
Right, I was making a ha-ha-only-serious thought that perhaps there could be a spin that is specifically highly optimized for latest-n-greatest architectures, and if packagers want to maintain two different versions of x64, that’d be their choice, otherwise fallback to the ‘regular’ one. It certainly wouldn’t be the most popular, but for the folks who could stand to benefit from it, they’d know where to find this particular spin.
On 22 Jul 2019, at 15:19, Solomon Peachy wrote:
On Mon, Jul 22, 2019 at 03:05:15PM -0500, Ron Olson wrote:
Perhaps as a compromise there could be a ‘regular’ 64-bit and a 64-bit-optimized-for-machines-made-after-2013 version?
It's not as simple as a "CPU newer than date X" cutoff -- Intel limits AVX support to their Xeon and Core brands only. Their current-gen lower-end stuff (sold as Pentium, Celeron and Atom) doesn't support AVX of any flavor. And they sell a _lot_ of those.
I think AMD's situation is a bit better, with all of their current processors supporting AVX, and only one (Family 16h) lacking AVX2.
- Solomon
-- Solomon Peachy pizza at shaftnet dot org High Springs, FL ^^ (email/xmpp) ^^ Quidquid latine dictum sit, altum videtur. _______________________________________________ devel mailing list -- To unsubscribe send an email to Fedora Code of Conduct: List Guidelines: List Archives:
== Upgrade/compatibility impact == Fedora installations on systems with CPUs which are not able to execute AVX2 instructions will not be able to upgrade.
Time for me to switch to LinuxMint as I'm not going to be forced into hardware updates I can't afford.
On Mon, Jul 22, 2019 at 02:51:27PM -0400, Ben Cotton wrote:
After preliminary discussions with CPU vendors, we propose AVX2 as the new baseline. AVX2 support was introduced into CPUs from 2013 to 2015. See [ CPUs with AVX2].
While one can make a reasonable argument to bump the baseline to something newer than the very first K8 implmentation, jumping all the way up to AVX2 is insane, because it will render Fedora useless on all but the most recent generations of CPUs.
Because _introduced_ is long, long way from _deployed_.
Even today Intel continues to sell modern-but-non-AVX-capable CPUs under their Pentium and Celeron brands. Are we going to exclude all of those too?
As an example, of the what, dozen or so systems I have Fedora installed on today, only one supports AVX2. Three of them don't even support AVX1 (AMD 10h and 14h). This means that come Fedora 32, I'll be faced with the choice of replacing nearly every bit of kit I own.
But since anectdote != data, are there any sort of deployment numbers out there that show how many Fedora deployments are on AVX[2]-capable hardware?
Fedora will use current CPUs more efficiently, increasing performance and reducing power consumption.
I think we need to see some actual benchmarks demonstrating this. For the core kernel and system libraries rather than microbenchmarks or specific applications that already sport AVX[2] codepaths.
BTW, it wasn't until mid-late-2018 that AAA games started to require AVX1.
Moreover, when Fedora is advertised as a distribution by a compute service provider, users can be certain that their AVX2-optimized software will run in this environment.
So.. hosting providers and their users are now the sole Fedora audience?
- Other developers: Other developers may have to adjust test suites
which expect exact floating point results, and correct linking with <code>libatomic</code>. They will also have to upgrade their x86-64 machines to something that can execute AVX2 instructions.
Yeah, "they just have to upgrade their systems".
- Solomon
On Tue, Jul 23, 2019 at 5:46 AM Solomon Peachy wrote:
On Mon, Jul 22, 2019 at 02:51:27PM -0400, Ben Cotton wrote:
After preliminary discussions with CPU vendors, we propose AVX2 as the new baseline. AVX2 support was introduced into CPUs from 2013 to 2015. See [ CPUs with AVX2].
Can we just kill this and pretend it was never suggested.
From a desktop point of view this would be a really bad idea, I think you should expand your preliminary discussions to CPU vendors who aren't Intel server chips.
Am 22.07.19 um 21:52 schrieb David Airlie:
On Mon, Jul 22, 2019 at 02:51:27PM -0400, Ben Cotton wrote:
After preliminary discussions with CPU vendors, we propose AVX2 as the new baseline. AVX2 support was introduced into CPUs from 2013 to 2015. See [ CPUs with AVX2].
Can we just kill this and pretend it was never suggested.
Yes, please.
Besides low-end Intel machines there are also virtual machines which might not have AVX2.
I think the voiced opposition here should suffice to burry that proposal right now.
On Mon, Jul 22, 2019 at 3:45 PM Solomon Peachy wrote:
But since anectdote != data, are there any sort of deployment numbers out there that show how many Fedora deployments are on AVX[2]-capable hardware?
There are no stats available that could be considered defensible. At best, we could come up with some estimates based on the stats from other sources that we might assume have a similar profile as Fedora. I'm not sure if that data exists anywhere, though.
My main personal machine also lacks AVX2-capable hardware, so from a personal perspective, I'm not super keen on this change. I'm privileged enough to be able to upgrade my hardware if required, but I recognize that it's not a reasonable request for others.
Fedora will use current CPUs more efficiently, increasing performance and reducing power consumption.
I think we need to see some actual benchmarks demonstrating this. For the core kernel and system libraries rather than microbenchmarks or specific applications that already sport AVX[2] codepaths.
I agree. It would be good to see some more specifics about what the benefit will be. That's the only way we can decide if it's worth the cost.
On Tue, Jul 23, 2019 at 5:58 AM Ben Cotton wrote:
On Mon, Jul 22, 2019 at 3:45 PM Solomon Peachy wrote:
But since anectdote != data, are there any sort of deployment numbers out there that show how many Fedora deployments are on AVX[2]-capable hardware?
There are no stats available that could be considered defensible. At best, we could come up with some estimates based on the stats from other sources that we might assume have a similar profile as Fedora. I'm not sure if that data exists anywhere, though.
My main personal machine also lacks AVX2-capable hardware, so from a personal perspective, I'm not super keen on this change. I'm privileged enough to be able to upgrade my hardware if required, but I recognize that it's not a reasonable request for others.
Fedora will use current CPUs more efficiently, increasing performance and reducing power consumption.
I think we need to see some actual benchmarks demonstrating this. For the core kernel and system libraries rather than microbenchmarks or specific applications that already sport AVX[2] codepaths.
I agree. It would be good to see some more specifics about what the benefit will be. That's the only way we can decide if it's worth the cost.
I think we don't need to bother, there is way too much hardware still being sold by CPU vendors that don't meet this baseline.
We aren't Apple. If you want to add avx2 optimised binaries to the system work out how to do that, create fat binary support for Linux, add a second set of packages for cases that it might matter etc.
Just unilaterally removing a whole chunk of the x86 architecture support isn't a plan, benchmarks or stats won't help.
On Tue, 2019-07-23 at 06:01 +1000, David Airlie wrote:
On Tue, Jul 23, 2019 at 5:58 AM Ben Cotton wrote:
On Mon, Jul 22, 2019 at 3:45 PM Solomon Peachy wrote:
But since anectdote != data, are there any sort of deployment numbers out there that show how many Fedora deployments are on AVX[2]-capable hardware?
There are no stats available that could be considered defensible. At best, we could come up with some estimates based on the stats from other sources that we might assume have a similar profile as Fedora. I'm not sure if that data exists anywhere, though.
My main personal machine also lacks AVX2-capable hardware, so from a personal perspective, I'm not super keen on this change. I'm privileged enough to be able to upgrade my hardware if required, but I recognize that it's not a reasonable request for others.
Fedora will use current CPUs more efficiently, increasing performance and reducing power consumption.
I think we need to see some actual benchmarks demonstrating this. For the core kernel and system libraries rather than microbenchmarks or specific applications that already sport AVX[2] codepaths.
I agree. It would be good to see some more specifics about what the benefit will be. That's the only way we can decide if it's worth the cost.
I think we don't need to bother, there is way too much hardware still being sold by CPU vendors that don't meet this baseline.
+1. As I wrote on the talk page for this Change I see it as a complete non-starter. I don't think we need detailed data to just say that a change which means Fedora won't work on all CPUs made prior to 2013 (and apparently quite a lot made since then) is a complete non-starter.
Dropping i686 is a Change I can get behind, but this one is being proposed about a decade too early.
(For anyone collecting anecdata, though: of the 5 PCs running Fedora in this room, I think only one would be AVX2-capable).
On Mon, Jul 22, 2019 at 03:54:41PM -0400, Ben Cotton wrote:
There are no stats available that could be considered defensible. At best, we could come up with some estimates based on the stats from other sources that we might assume have a similar profile as Fedora. I'm not sure if that data exists anywhere, though.
But without that data, as you put it, it's hard to come up with a cost-benefit analysis.
One sorce of data is Steam's hardware survey. Unfortunately they don't include AVX2, but their most recent stats show that 88.6% of their overall userbase has a CPU supporting AVX1. Limiting that to Linux users the number drops to 87.2%.
My main personal machine also lacks AVX2-capable hardware, so from a personal perspective, I'm not super keen on this change. I'm privileged enough to be able to upgrade my hardware if required, but I recognize that it's not a reasonable request for others.
If it was just a matter of one machine, that would be manageable, but pretty much everyone who's spoken up here would be faced with having to replace multiple systems all at once, or stop using Fedora altogether.
I agree. It would be good to see some more specifics about what the benefit will be. That's the only way we can decide if it's worth the cost.
Honestly the proposal should have come with these benchmarks already, especially since they'd already rebuilt Fedora with the new flags...
- Solomon
Solomon Peachy wrote:
One sorce of data is Steam's hardware survey. Unfortunately they don't include AVX2, but their most recent stats show that 88.6% of their overall userbase has a CPU supporting AVX1. Limiting that to Linux users the number drops to 87.2%.
A survey among Steam's customers is presumably heavily biased towards beefy gaming machines with brand-new high-end processors. If as many as 11% lack even AVX1 in that dataset, then there must be a much higher percentage without the newer AVX2 among non-gaming desktops and laptops.
Björn Persson
On Monday, 22 July 2019 at 20:51, Ben Cotton wrote:
== Upgrade/compatibility impact == Fedora installations on systems with CPUs which are not able to execute AVX2 instructions will not be able to upgrade.
And that's a lot of hardware. Half of my machines don't support AVX2. If you dropped back to SSSE3 then I wouldn't complain as that would just scrap my 32-bit only machines, but requiring AVX2 is definitely going too far.
Anyone who wants to build a library with AVX can already do so even if the library doesn't support runtime detection. You just build twice, once with and once without and put the AVX-enabled version in %{_libdir}/haswell.
Regards, Dominik
On Mon, Jul 22, 2019 at 2:35 PM Dominik 'Rathann' Mierzejewski wrote:
Anyone who wants to build a library with AVX can already do so even if the library doesn't support runtime detection. You just build twice, once with and once without and put the AVX-enabled version in %{_libdir}/haswell.
This possibility has been mentioned before, but nothing in Fedora owns %{_libdir}/haswell. Should the filesystem package own it?
Dominik 'Rathann' Mierzejewski wrote:
And that's a lot of hardware. Half of my machines don't support AVX2. If you dropped back to SSSE3 then I wouldn't complain as that would just scrap my 32-bit only machines, but requiring AVX2 is definitely going too far.
Requiring SSSE3 would also work for me personally, though I am not convinced at all that the performance win over SSE2 would be worth dropping 2 generations of CPUs (SSE2, SSE3).
Requiring anything newer would exclude half (SSE4, AVX1) or all (AVX2) my computers.
Anyone who wants to build a library with AVX can already do so even if the library doesn't support runtime detection. You just build twice, once with and once without and put the AVX-enabled version in %{_libdir}/haswell.
To be precise, Haswell is actually AVX2.
Kevin Kofler
On Mon, Jul 22, 2019 at 3:27 PM Ben Cotton wrote:
== Summary == Fedora currently uses the original K8 micro-architecture (without 3DNow! and other AMD-specific parts) as the baseline for its <code>x86_64</code> architecture. This baseline dates back to 2003 and has not been updated since. As a result, performance of Fedora is not as good as it could be on current CPUs.
This change to update the micro-architecture level for the architecture to something more recent.
== Owner ==
- Name: [[User:fweimer| Florian Weimer]]
- Email: []
== Detailed Description ==
After preliminary discussions with CPU vendors, we propose AVX2 as the new baseline. AVX2 support was introduced into CPUs from 2013 to 2015. See [ CPUs with AVX2].
Would it be possible to include some basic instructions or a script for people to run on their systems to see if they are AVX2 compliant? That would help them assess the impact.
Along with AVX2, it makes sense to enable certain other CPU features which are not strictly implied by AVX2, such as CMPXCHG16B, FMA, and earlier vector extensions such as SSE 4.2. Details are still being worked out.
A test rebuild of a distribution largely based on Fedora 28 showed that there is only a small number of build failures due to the baseline switch. Very few packages are confused about the availability of the CMPXCHG16B instruction, leading to linking failures related to <code>-latomic</code>, and there are some hard-coded floating point results that could change due to vectorization. (The latter is within bounds of the usual cross-architecture variation for such tests.)
== Benefit to Fedora ==
Fedora will use current CPUs more efficiently, increasing performance and reducing power consumption.
Moreover, when Fedora is advertised as a distribution by a compute service provider, users can be certain that their AVX2-optimized software will run in this environment.
== Scope ==
- Proposal owners: Update the <code>gcc</code> and
<code>redhat-rpm-config</code> package to implement the new compiler flags. It is expected that the new baseline will be implemented in a new GCC <code>-march=</code> option for convenience.
- Other developers: Other developers may have to adjust test suites
which expect exact floating point results, and correct linking with <code>libatomic</code>. They will also have to upgrade their x86-64 machines to something that can execute AVX2 instructions.
- Release engineering: [ #8513]
** All Fedora builders need to be AVX2-capable. ** Infrastructure ticket: [ #7968]
- Policies and guidelines: No guidelines need to be changed.
- Trademark approval: N/A (not needed for this Change)
== Upgrade/compatibility impact == Fedora installations on systems with CPUs which are not able to execute AVX2 instructions will not be able to upgrade.
== How To Test == General system testing will provide test coverage for this change.
== User Experience == User should observe improved performance and, likely, battery life. Developers will benefit from the knowledge that code with AVX2 optimizations will run wherever Fedora runs.
== Dependencies == There are no direct dependencies on this change at this time.
== Contingency Plan == It is possible to not implement this change, or implement a smaller subset of it (adopting the CMPXCHG16B instruction only, for example).
- Contingency mechanism: Mass rebuild with different/previous compiler glags.
- Contingency deadline: Final mass rebuild.
- Blocks release? No.
- Blocks product? No.
== Documentation == The new micro-architecture baseline and the resulting requirements need to be documented.
== Release Notes == Release notes must mention how users can determine whether their system supports AVX2 prior to upgrading, for example by running <code>grep avx2 /proc/cpuinfo</code>.
-- Ben Cotton He / Him / His Fedora Program Manager Red Hat TZ=America/Indiana/Indianapolis _______________________________________________ devel mailing list -- To unsubscribe send an email to Fedora Code of Conduct: List Guidelines: List Archives:
On Tue, Jul 23, 2019 at 6:03 AM Josh Boyer wrote:
On Mon, Jul 22, 2019 at 3:27 PM Ben Cotton wrote:
== Summary == Fedora currently uses the original K8 micro-architecture (without 3DNow! and other AMD-specific parts) as the baseline for its <code>x86_64</code> architecture. This baseline dates back to 2003 and has not been updated since. As a result, performance of Fedora is not as good as it could be on current CPUs.
This change to update the micro-architecture level for the architecture to something more recent.
== Owner ==
- Name: [[User:fweimer| Florian Weimer]]
- Email: []
== Detailed Description ==
After preliminary discussions with CPU vendors, we propose AVX2 as the new baseline. AVX2 support was introduced into CPUs from 2013 to 2015. See [ CPUs with AVX2].
Would it be possible to include some basic instructions or a script for people to run on their systems to see if they are AVX2 compliant? That would help them assess the impact.
They did below, grep avx2 /proc/cpuinfo.
But don't bother, this will just get embarrassing and turn into a pile on. I think this should be retracted before it ends up being a phoronix article making the project look bad.
On Mon, Jul 22, 2019 at 4:43 PM David Airlie wrote:
On Tue, Jul 23, 2019 at 6:03 AM Josh Boyer wrote:
On Mon, Jul 22, 2019 at 3:27 PM Ben Cotton wrote:
== Summary == Fedora currently uses the original K8 micro-architecture (without 3DNow! and other AMD-specific parts) as the baseline for its <code>x86_64</code> architecture. This baseline dates back to 2003 and has not been updated since. As a result, performance of Fedora is not as good as it could be on current CPUs.
This change to update the micro-architecture level for the architecture to something more recent.
== Owner ==
- Name: [[User:fweimer| Florian Weimer]]
- Email: []
== Detailed Description ==
After preliminary discussions with CPU vendors, we propose AVX2 as the new baseline. AVX2 support was introduced into CPUs from 2013 to 2015. See [ CPUs with AVX2].
Would it be possible to include some basic instructions or a script for people to run on their systems to see if they are AVX2 compliant? That would help them assess the impact.
They did below, grep avx2 /proc/cpuinfo.
Ah, I completely glossed over that. My apologies.
But don't bother, this will just get embarrassing and turn into a pile on. I think this should be retracted before it ends up being a phoronix article making the project look bad.
Hm. I don't think it needs to turn into that. Perhaps it can lead to a productive conversation in another way.
...I think this should be retracted before it ends up being a phoronix article making the project look bad.
I 100% agree... but too late:
* Josh Boyer [22/07/2019 15:56] :
Would it be possible to include some basic instructions or a script for people to run on their systems to see if they are AVX2 compliant? That would help them assess the impact.
you can find your cpu model by running the command:
$ grep 'model name' /proc/cpuinfo
From there, it's a simple lookup on to see what features it has.
Fedora will use current CPUs more efficiently, increasing performance and reducing power consumption.
I hope the energy usage involved in having to buy new hardware (including manufacturing and shipping) is taken into account. This proposed change is incompatible with all 3 of my 64-bit machines (which are all refurbished, isn't that supposed to be good for the environment?), including one I bought just last year.
On Mon, Jul 22, 2019 at 2:52 PM Ben Cotton wrote:
== Summary == Fedora currently uses the original K8 micro-architecture (without 3DNow! and other AMD-specific parts) as the baseline for its <code>x86_64</code> architecture. This baseline dates back to 2003 and has not been updated since. As a result, performance of Fedora is not as good as it could be on current CPUs.
This change to update the micro-architecture level for the architecture to something more recent.
- Other developers: Other developers may have to adjust test suites
which expect exact floating point results, and correct linking with <code>libatomic</code>. They will also have to upgrade their x86-64 machines to something that can execute AVX2 instructions.
I think it's clear just from the initial replies to this thread that even the most involved Fedorans (AKA the ones most likely to have access to newer hardware) would be unable to run a sizeable percentage of their systems if the minimum requirement became AVX2+
With my FESCo hat on, I can't support this action as currently stated. I think I'd be more inclined to consider it if the Change was proposed as a new architecture bring-up. Effectively, this would be a whole new architecture that would just happen to be largely compatible with x86_64.
On Mon, Jul 22, 2019 at 04:11:32PM -0400, Stephen Gallagher wrote:
With my FESCo hat on, I can't support this action as currently stated. I think I'd be more inclined to consider it if the Change was proposed as a new architecture bring-up. Effectively, this would be a whole new architecture that would just happen to be largely compatible with x86_64.
Now that approach makes a lot more sense!
And we could easily do some apples-to-apples system benchmarks to see if there's any meaningful improvements to be had.
- Solomon
On Jul 22, 2019, at 1:21 PM, Solomon Peachy wrote:
On Mon, Jul 22, 2019 at 04:11:32PM -0400, Stephen Gallagher wrote: With my FESCo hat on, I can't support this action as currently stated. I think I'd be more inclined to consider it if the Change was proposed as a new architecture bring-up. Effectively, this would be a whole new architecture that would just happen to be largely compatible with x86_64.
Now that approach makes a lot more sense!
And we could easily do some apples-to-apples system benchmarks to see if there's any meaningful improvements to be had.
IMO this approach is wrong. There’s nothing special about AVX1. There are plenty of packages that can be built with various CPU extensions on or off but that don’t have runtime detection. and Fedora should have the ability to straightforwardly build multiple variants. Sure, this might involve some rpm and dnf fiddling, but that’s nothing compared to creating a whole new architecture.
I can imagine this working in multiple ways. There could be something akin to modules or maybe sub-architectures, where there are multiple non-parallel-installable versions of various packages along with tooling to say "optimize for this machine" or "optimize for the following baseline". There could also be good tooling to build dynamically-selected libraries (/lib/hw/avx2/ Fat binaries for actual executables or some equivalent (multiple binaries with /usr/bin/foo choosing which one to exec) could work. Some combination could make sense.
On Mon, Jul 22, 2019 at 4:47 PM Stephen Gallagher wrote:
On Mon, Jul 22, 2019 at 2:52 PM Ben Cotton wrote:
== Summary == Fedora currently uses the original K8 micro-architecture (without 3DNow! and other AMD-specific parts) as the baseline for its <code>x86_64</code> architecture. This baseline dates back to 2003 and has not been updated since. As a result, performance of Fedora is not as good as it could be on current CPUs.
This change to update the micro-architecture level for the architecture to something more recent.
- Other developers: Other developers may have to adjust test suites
which expect exact floating point results, and correct linking with <code>libatomic</code>. They will also have to upgrade their x86-64 machines to something that can execute AVX2 instructions.
I think it's clear just from the initial replies to this thread that even the most involved Fedorans (AKA the ones most likely to have access to newer hardware) would be unable to run a sizeable percentage of their systems if the minimum requirement became AVX2+
With my FESCo hat on, I can't support this action as currently stated. I think I'd be more inclined to consider it if the Change was proposed as a new architecture bring-up. Effectively, this would be a whole new architecture that would just happen to be largely compatible with x86_64.
That is one way. I think there might be others.
Depending on exactly what the target usecase is, we might be able to accommodate this under the Fedora project umbrella in a way that doesn't require moving the entire distro immediately here.
On Mon, Jul 22, 2019 at 05:11:23PM -0400, Josh Boyer wrote:
I think I'd be more inclined to consider it if the Change was proposed as a new architecture bring-up. Effectively, this would be a whole new architecture that would just happen to be largely compatible with x86_64.
That is one way. I think there might be others.
Depending on exactly what the target usecase is, we might be able to accommodate this under the Fedora project umbrella in a way that doesn't require moving the entire distro immediately here.
I am very much interested in finding such an approach.
* Stephen Gallagher:
With my FESCo hat on, I can't support this action as currently stated. I think I'd be more inclined to consider it if the Change was proposed as a new architecture bring-up. Effectively, this would be a whole new architecture that would just happen to be largely compatible with x86_64.
Can we make this happen at the RPM level? So that third-party RPMs install just fine even though the operating system is something else (not x86_64 anymore)? I do not see many explicit dependencies on anything “x86_64” in Fedora 30, so perhaps this is doable, assuming that packages of the other architecture would continue to provide …(…)(64bit) for soname dependencies.
Could we rebuild x86_64 Fedora with a different dist tag and different compiler flags, and release that as a new spin? And retain the x86_64 for that architecture?
Regarding doing something like the old i686 packages when we had an i586 baseline (or the ppc64p7 work that was perhaps never upstreamed to Fedora), I'm a bit worried about increasing the complexity of composes. We already see upgrade issues doe to i686 packages come and go, and that could potentially multiply them. The advantage is that packaging changes themselves will be relatively minor, once we have agreemeent which packages should do this.
ELF multilib DSOs inside RPMs result in code deduplication, affecting container image size. Packaging changes are *not* minor for this approach. It can be tricky to ensure full testing coverage if both DSOs are installed. Currently, there is no dynamic loader support for selecting an AVX2 baseline. Fixing this requires complete agreement among all involved parties what the actual CPU requirements are (currently, not even glibc and GCC agree what “haswell” means, the closest we have to an AVX2 baseline). But similar fixes are required for any baseline update.
Thanks, Florian
On Wed, Jul 24, 2019 at 1:07 PM Florian Weimer wrote:
- Stephen Gallagher:
With my FESCo hat on, I can't support this action as currently stated. I think I'd be more inclined to consider it if the Change was proposed as a new architecture bring-up. Effectively, this would be a whole new architecture that would just happen to be largely compatible with x86_64.
Can we make this happen at the RPM level? So that third-party RPMs install just fine even though the operating system is something else (not x86_64 anymore)? I do not see many explicit dependencies on anything “x86_64” in Fedora 30, so perhaps this is doable, assuming that packages of the other architecture would continue to provide …(…)(64bit) for soname dependencies.
This depends on RPM and libsolv "archpolicy". So yes, as long as dependencies do not change, it is fine. We need to keep %{?_isa} to provide x86_64.
Could we rebuild x86_64 Fedora with a different dist tag and different compiler flags, and release that as a new spin? And retain the x86_64 for that architecture?
Yes, that was my proposal.
Regarding doing something like the old i686 packages when we had an i586 baseline (or the ppc64p7 work that was perhaps never upstreamed to Fedora), I'm a bit worried about increasing the complexity of composes. We already see upgrade issues doe to i686 packages come and go, and that could potentially multiply them. The advantage is that packaging changes themselves will be relatively minor, once we have agreemeent which packages should do this.
ELF multilib DSOs inside RPMs result in code deduplication, affecting container image size. Packaging changes are *not* minor for this approach. It can be tricky to ensure full testing coverage if both DSOs are installed. Currently, there is no dynamic loader support for selecting an AVX2 baseline. Fixing this requires complete agreement among all involved parties what the actual CPU requirements are (currently, not even glibc and GCC agree what “haswell” means, the closest we have to an AVX2 baseline). But similar fixes are required for any baseline update.
Thanks, Florian _______________________________________________ devel mailing list -- To unsubscribe send an email to Fedora Code of Conduct: List Guidelines: List Archives:
"FW" == Florian Weimer writes:
FW> ELF multilib DSOs inside RPMs result in code deduplication, FW> affecting container image size.
I think it's important to quantify this kind of thing. I think we can all agree that there is very little benefit to duplicating every single library, so extra space usage would come only from libraries which meet all of:
* Compiling with AVX2 (or whatever) provides benefit * Special runtime detection code isn't included * Function multiversioning or the fancy target_clones attribute isn't used
And by implementing the latter two, the set can shrink.
So, really, how much space are we really talking about here?
FW> Currently, there is no dynamic loader FW> support for selecting an AVX2 baseline. Fixing this requires FW> complete agreement among all involved parties what the actual CPU FW> requirements are (currently, not even glibc and GCC agree what FW> “haswell” means, the closest we have to an AVX2 baseline). But FW> similar fixes are required for any baseline update.
I have a hard time believing that solving that would be somehow less preferable than either making Fedora unusable on a whole class of hardware or splitting off a completely new architecture.
- J<
- Stephen Gallagher:
Can we make this happen at the RPM level? So that third-party RPMs install just fine even though the operating system is something else (not x86_64 anymore)? I do not see many explicit dependencies on anything “x86_64” in Fedora 30, so perhaps this is doable, assuming that packages of the other architecture would continue to provide …(…)(64bit) for soname dependencies.
Could we rebuild x86_64 Fedora with a different dist tag and different compiler flags, and release that as a new spin? And retain the x86_64 for that architecture?
Regarding doing something like the old i686 packages when we had an i586 baseline (or the ppc64p7 work that was perhaps never upstreamed to Fedora), I'm a bit worried about increasing the complexity of composes. We already see upgrade issues doe to i686 packages come and go, and that could potentially multiply them. The advantage is that packaging changes themselves will be relatively minor, once we have agreemeent which packages should do this.
ELF multilib DSOs inside RPMs result in code deduplication, affecting container image size. Packaging changes are *not* minor for this approach. It can be tricky to ensure full testing coverage if both DSOs are installed. Currently, there is no dynamic loader support for selecting an AVX2 baseline. Fixing this requires complete agreement among all involved parties what the actual CPU requirements are (currently, not even glibc and GCC agree what “haswell” means, the closest we have to an AVX2 baseline). But similar fixes are required for any baseline update.
Thanks, Florian
- Stephen Gallagher:
Can we make this happen at the RPM level? So that third-party RPMs install just fine even though the operating system is something else (not x86_64 anymore)? I do not see many explicit dependencies on anything “x86_64” in Fedora 30, so perhaps this is doable, assuming that packages of the other architecture would continue to provide …(…)(64bit) for soname dependencies.
Could we rebuild x86_64 Fedora with a different dist tag and different compiler flags, and release that as a new spin? And retain the x86_64 for that architecture?
Regarding doing something like the old i686 packages when we had an i586 baseline (or the ppc64p7 work that was perhaps never upstreamed to Fedora), I'm a bit worried about increasing the complexity of composes. We already see upgrade issues doe to i686 packages come and go, and that could potentially multiply them. The advantage is that packaging changes themselves will be relatively minor, once we have agreemeent which packages should do this.
ELF multilib DSOs inside RPMs result in code deduplication, affecting container image size. Packaging changes are *not* minor for this approach. It can be tricky to ensure full testing coverage if both DSOs are installed. Currently, there is no dynamic loader support for selecting an AVX2 baseline. Fixing this requires complete agreement among all involved parties what the actual CPU requirements are (currently, not even glibc and GCC agree what “haswell” means, the closest we have to an AVX2 baseline). But similar fixes are required for any baseline update.
Thanks, Florian
Something like this sounds like a good idea to me.
Would it be possible to work this out into a standardized way to provide support for multiple ISA levels, e.g. avx, avx2, avx512 or whatever might come up in the future? That way fedora could stay up to date with recent cpu developments without regularly having discussions like this one.
Instead of providing multiple x86_64 spins, where each user has to figure out on their own which one to use, it might be better to only deliver installers with the baseline and let them figure out the right optimized packages at install time.
On 2019-07-22 2:51 p.m., Ben Cotton wrote:
After preliminary discussions with CPU vendors, we propose AVX2 as the new baseline. AVX2 support was introduced into CPUs from 2013 to 2015. See [ CPUs with AVX2].
Here's a JSON file of all Intel CPUs that do not have any AVX extension support that were released after January 1, 2013:$filter=not%20su...
There are Pentium and Celeron CPUs being released today that don't even support AVX.
On 2019-07-22 13:12, Felix Kaechele via devel wrote:
On 2019-07-22 2:51 p.m., Ben Cotton wrote:
After preliminary discussions with CPU vendors, we propose AVX2 as the new baseline. AVX2 support was introduced into CPUs from 2013 to 2015. See [ CPUs with AVX2].
Here's a JSON file of all Intel CPUs that do not have any AVX extension support that were released after January 1, 2013:$filter=not%20su...
There are Pentium and Celeron CPUs being released today that don't even support AVX.
I may have to turn in my nerd card for not being able to pull this myself, but what would this list look like if the baseline was SSSE3? Just curious.
Joseph D. Wagner
On Mon, Jul 22, 2019 at 02:40:29PM -0700, Joseph D. Wagner wrote:
I may have to turn in my nerd card for not being able to pull this myself, but what would this list look like if the baseline was SSSE3? Just curious.
Steam claims 97.8% of their userbase has a processor supporting SSSE3 (vs 88.6% for AVX..)
On the AMD side, requiring SSSE3 is nearly equivalent to requiring AVX, in other words, post-2011 CPUs only.
Now *SS*E3 is another matter, as only the 1st-gen single-core K8 parts lack support, and every Intel x86_64-capable CPU supports it.
- Solomon
On Mon, Jul 22, 2019 at 3:27 PM Ben Cotton wrote:
== Upgrade/compatibility impact == Fedora installations on systems with CPUs which are not able to execute AVX2 instructions will not be able to upgrade.
I have only two computers at home with AVX2, all the rest don't have it.
Why are we considering this vs just making it so we have AVX2 run-code that is used when the instruction is available? This seems unnecessarily bloody...
-- 真実はいつも一つ!/ Always, there's only one truth!
On Mon, 2019-07-22 at 14:51 -0400, Ben Cotton wrote:
After preliminary discussions with CPU vendors, we propose AVX2 as the new baseline. AVX2 support was introduced into CPUs from 2013 to 2015. See [ CPUs with AVX2].
This is not what I'd call a good idea. I've had to shoot it down several times on internal mailing lists for RHEL, I think it's even less a good idea for Fedora.
Skylake Pentium and Celeron models - dating from 2015 - don't have AVX at all. Why do we want to break them? Has Intel promised they're not going to pull a trick like that again?
If we really want to chase after Clear Linux benchmarks then fix to know that avx2 is a capability (like we could for i686 + sse2). Moving the baseline like this is far, far too aggressive.
- ajax
After preliminary discussions with CPU vendors, we propose AVX2 as the new baseline. AVX2 support was introduced into CPUs from 2013 to 2015. See [ CPUs with AVX2].
This is not what I'd call a good idea. I've had to shoot it down several times on internal mailing lists for RHEL, I think it's even less a good idea for Fedora.
Skylake Pentium and Celeron models - dating from 2015 - don't have AVX at all. Why do we want to break them? Has Intel promised they're not going to pull a trick like that again?
If we really want to chase after Clear Linux benchmarks then fix to know that avx2 is a capability (like we could for i686 + sse2). Moving the baseline like this is far, far too aggressive.
IBM did something to run optimised Power9 binaries on Power8, if I remember correctly it was IFUNC in glibc, so they could have optimised paths so I don't see why the same sort of thing couldn't be used for the AVX2.
If Intel is so determined to make their processors look remotely competitive again with the likes of Spectre and Meltown maybe they should look into things like that?
Looking at the Intel x86 devices we're supporting for IoT, even the latest available, none of them report any form of AVX.
On 23/07/2019 10:40, Peter Robinson wrote:
After preliminary discussions with CPU vendors, we propose AVX2 as the new baseline. AVX2 support was introduced into CPUs from 2013 to 2015. See [ CPUs with AVX2].
This is not what I'd call a good idea. I've had to shoot it down several times on internal mailing lists for RHEL, I think it's even less a good idea for Fedora.
Skylake Pentium and Celeron models - dating from 2015 - don't have AVX at all. Why do we want to break them? Has Intel promised they're not going to pull a trick like that again?
If we really want to chase after Clear Linux benchmarks then fix to know that avx2 is a capability (like we could for i686 + sse2). Moving the baseline like this is far, far too aggressive.
IBM did something to run optimised Power9 binaries on Power8, if I remember correctly it was IFUNC in glibc, so they could have optimised paths so I don't see why the same sort of thing couldn't be used for the AVX2.
It absolutely can, as discussed extensively on IRC last night...
There's a pretty good summary here but broadly speaking you can either use the target_clones attribute on a function to have gcc compile multiple versions for different targets along with an ifunc to choose one at run time, or if you have your own hand rolled implementations then mark them with the target attribute and let gcc create the ifunc.
Ben Cotton wrote:
Fedora currently uses the original K8 micro-architecture (without 3DNow! and other AMD-specific parts) as the baseline for its <code>x86_64</code> architecture. This baseline dates back to 2003 and has not been updated since. As a result, performance of Fedora is not as good as it could be on current CPUs.
This is the price of compatibility.
This change to update the micro-architecture level for the architecture to something more recent.
I don't see a practical benefit to requiring anything more recent than SSE2 (what we currently assume) as long as upstreams still support that baseline. Surely a few percent of gained performance are not worth tossing out entire generations of hardware!
we propose AVX2 as the new baseline. AVX2 support was introduced into CPUs from 2013 to 2015. See [ CPUs with AVX2].
This is absolutely unacceptable and would force me to look for a new distribution. Not even my Sandy Bridge Core i7 supports AVX2, not to mention my Core 2 Duo notebook that still runs Fedora perfectly fine right now. Sure, the notebook is 11 years old and the desktop 8 years, but those machines work perfectly fine and the desktop doesn't even perform that badly. If I have to choose between replacing the computers or replacing the distribution, my choice will be made fairly quickly.
My desktop's CPU only supports AVX 1, my notebook's CPU only up to SSSE3 (no SSE4 nor AVX).
After preliminary discussions with CPU vendors,
And that pretty much says it all! Planned obsolescence anyone? No thanks!
Kevin Kofler
On Monday, 22 July 2019 20:51:27 CEST Ben Cotton wrote:
== Summary == Fedora currently uses the original K8 micro-architecture (without 3DNow! and other AMD-specific parts) as the baseline for its <code>x86_64</code> architecture. This baseline dates back to 2003 and has not been updated since. As a result, performance of Fedora is not as good as it could be on current CPUs.
This change to update the micro-architecture level for the architecture to something more recent.
== Owner ==
- Name: [[User:fweimer| Florian Weimer]]
- Email: []
== Detailed Description ==
After preliminary discussions with CPU vendors, we propose AVX2 as the new baseline. AVX2 support was introduced into CPUs from 2013 to 2015. See [ CPUs with AVX2].
I'm in the minority here as I have only one computer which supports AVX2, so I would benefit from it, but even then, I wouldn't want people not being able to update anymore because their hardware is too old. Maybe dial the proposal back a bit, for example, target instructions supported by Celerons and Pentiums?
Best regards,
Fedora installations on systems with CPUs which are not able to execute AVX2 instructions will not be able to upgrade.
So it looks like Fedora would no longer work on my laptop from 2013. I could probably switch the laptop over to CentOS, but that would restrict my ability to work on Fedora stuff. A significant part of my Fedora work over the years has been done on that laptop. With this change I would only be able to do Fedora work at home on my new workstation. When a good Internet connection is available I suppose I could SSH home and perform some tasks, but that would be more cumbersome, which would reduce the likelihood that I'd get the work done.
And no, I'm not going to dump a perfectly serviceable laptop in a landfill just to accommodate Fedora.
Björn Persson
Once upon a time, Ben Cotton said:
After preliminary discussions with CPU vendors, we propose AVX2 as the new baseline. AVX2 support was introduced into CPUs from 2013 to 2015. See [ CPUs with AVX2].
There are still new systems being sold that don't support AVX, much less AVX2. I installed an Intel Atom C3758 system today, and it doesn't support AVX. With the end of i686 (which I think is reasonable), this would kill Fedora on a significant amount of hardware.
After preliminary discussions with CPU vendors...
CPU vendors want to sell CPUs, while there are still plenty of running Sandy/Ivy bridge expensive high-end machines running that would not be upgradable. Not supporting machines that are 16 years old is ok, but restricting to < 6 years (7 years when Fedora 32 will be released) is crazy. Requiring AVX would be more reasonable, as it will extend the maximum machine age to 9 years.
Plenty has already been said here about why we should not do this (and OMFG we should NOT do this), and I am in complete agreement.
(I have 0 machines, of 3 in my personal network, with AVX2 support. My current desktop I only bought a year and a half ago, and it's not AVX2 capable! My laptop and fileserver are 5 and 12 years old, respectively, so good luck with that.)
But even if we WERE to do this, then this:
== Release Notes == Release notes must mention how users can determine whether their system supports AVX2 prior to upgrading, for example by running <code>grep avx2 /proc/cpuinfo</code>. not an acceptable WAY to do this.
grep /proc/cpuinfo!? Are you kidding me?
Unless it includes mechanisms in the install and upgrade process that would automatically prevent existing Fedora 31 users from upgrading their machines to a release that won't run on them, this proposal is doubly outrageous.
(I assume the F32 installer wouldn't even boot, in this proposed scenario, so at least new (non-)users would be covered. "...Yay.")
A brief survey of my hardware and less than half of it supports avx2.
I can't find a single enthusiastic endorsement for this proposal in this thread, so far; but if this proposal ends up being adopted, I hope that this gets announced well in advance, including a big fat banner on, to give everyone in the community plenty of time to make their own arrangements, in liue of that decision.
Hi Florian,
On Mon, Jul 22, 2019 at 9:28 PM Ben Cotton wrote:
== Summary == Fedora currently uses the original K8 micro-architecture (without 3DNow! and other AMD-specific parts) as the baseline for its <code>x86_64</code> architecture. This baseline dates back to 2003 and has not been updated since. As a result, performance of Fedora is not as good as it could be on current CPUs.
This change to update the micro-architecture level for the architecture to something more recent.
== Owner ==
- Name: [[User:fweimer| Florian Weimer]]
- Email: []
== Detailed Description ==
After preliminary discussions with CPU vendors, we propose AVX2 as the new baseline. AVX2 support was introduced into CPUs from 2013 to 2015. See [ CPUs with AVX2].
Along with AVX2, it makes sense to enable certain other CPU features which are not strictly implied by AVX2, such as CMPXCHG16B, FMA, and earlier vector extensions such as SSE 4.2. Details are still being worked out.
It seems that Intel is still manufacturing CPUs without AVX support (not even talking about AVX2) in 2019. So this is clearly no-go for me.
But I do want to see some refreshments in this area! There are multiple options how to proceed I think:
1. Lower requirement to something like SSE4 and select other CPU features which are available in most of CPUs for last decade. 2. Build every package on x86_64 twice (one for compatible set and one for this new-features set), possibly by introducting sub-architecture in koji or using koji-shadow (that's just implementation detail. Produce an official spin which is built from these packages. 3. Invent some mechanism for selecting appropriate feature set in runtime (somebody mentioned fat binaries in this thread).
These options can be combined.
A test rebuild of a distribution largely based on Fedora 28 showed that there is only a small number of build failures due to the baseline switch. Very few packages are confused about the availability of the CMPXCHG16B instruction, leading to linking failures related to <code>-latomic</code>, and there are some hard-coded floating point results that could change due to vectorization. (The latter is within bounds of the usual cross-architecture variation for such tests.)
== Benefit to Fedora ==
Fedora will use current CPUs more efficiently, increasing performance and reducing power consumption.
Moreover, when Fedora is advertised as a distribution by a compute service provider, users can be certain that their AVX2-optimized software will run in this environment.
== Scope ==
- Proposal owners: Update the <code>gcc</code> and
<code>redhat-rpm-config</code> package to implement the new compiler flags. It is expected that the new baseline will be implemented in a new GCC <code>-march=</code> option for convenience.
- Other developers: Other developers may have to adjust test suites
which expect exact floating point results, and correct linking with <code>libatomic</code>. They will also have to upgrade their x86-64 machines to something that can execute AVX2 instructions.
- Release engineering: [ #8513]
** All Fedora builders need to be AVX2-capable. ** Infrastructure ticket: [ #7968]
- Policies and guidelines: No guidelines need to be changed.
- Trademark approval: N/A (not needed for this Change)
== Upgrade/compatibility impact == Fedora installations on systems with CPUs which are not able to execute AVX2 instructions will not be able to upgrade.
== How To Test == General system testing will provide test coverage for this change.
== User Experience == User should observe improved performance and, likely, battery life. Developers will benefit from the knowledge that code with AVX2 optimizations will run wherever Fedora runs.
== Dependencies == There are no direct dependencies on this change at this time.
== Contingency Plan == It is possible to not implement this change, or implement a smaller subset of it (adopting the CMPXCHG16B instruction only, for example).
- Contingency mechanism: Mass rebuild with different/previous compiler glags.
- Contingency deadline: Final mass rebuild.
- Blocks release? No.
- Blocks product? No.
== Documentation == The new micro-architecture baseline and the resulting requirements need to be documented.
== Release Notes == Release notes must mention how users can determine whether their system supports AVX2 prior to upgrading, for example by running <code>grep avx2 /proc/cpuinfo</code>.
-- Ben Cotton He / Him / His Fedora Program Manager Red Hat TZ=America/Indiana/Indianapolis _______________________________________________ devel-announce mailing list -- To unsubscribe send an email to Fedora Code of Conduct: List Guidelines: List Archives:
On Tue, Jul 23, 2019 at 4:31 AM Igor Gnatenko wrote:
Hi Florian,
On Mon, Jul 22, 2019 at 9:28 PM Ben Cotton wrote:
== Summary == Fedora currently uses the original K8 micro-architecture (without 3DNow! and other AMD-specific parts) as the baseline for its <code>x86_64</code> architecture. This baseline dates back to 2003 and has not been updated since. As a result, performance of Fedora is not as good as it could be on current CPUs.
This change to update the micro-architecture level for the architecture to something more recent.
== Owner ==
- Name: [[User:fweimer| Florian Weimer]]
- Email: []
== Detailed Description ==
After preliminary discussions with CPU vendors, we propose AVX2 as the new baseline. AVX2 support was introduced into CPUs from 2013 to 2015. See [ CPUs with AVX2].
Along with AVX2, it makes sense to enable certain other CPU features which are not strictly implied by AVX2, such as CMPXCHG16B, FMA, and earlier vector extensions such as SSE 4.2. Details are still being worked out.
It seems that Intel is still manufacturing CPUs without AVX support (not even talking about AVX2) in 2019. So this is clearly no-go for me.
But I do want to see some refreshments in this area! There are multiple options how to proceed I think:
- Lower requirement to something like SSE4 and select other CPU
features which are available in most of CPUs for last decade. 2. Build every package on x86_64 twice (one for compatible set and one for this new-features set), possibly by introducting sub-architecture in koji or using koji-shadow (that's just implementation detail. Produce an official spin which is built from these packages.
Thinking about this even more, it should not be very hard thing to do:
* Define new architecture in RPM/libsolv (let's call it "haswell" or "x86_64modern") * Define set of capabilities it should have, write appropriate check in RPM/libdnf * Add new architecture in Fedora Koji * Once bootstrapped, create composes * At some point in future, merge this arch back to x86_64 and move forward
What do you think?
- Invent some mechanism for selecting appropriate feature set in
runtime (somebody mentioned fat binaries in this thread).
These options can be combined.
A test rebuild of a distribution largely based on Fedora 28 showed that there is only a small number of build failures due to the baseline switch. Very few packages are confused about the availability of the CMPXCHG16B instruction, leading to linking failures related to <code>-latomic</code>, and there are some hard-coded floating point results that could change due to vectorization. (The latter is within bounds of the usual cross-architecture variation for such tests.)
== Benefit to Fedora ==
Fedora will use current CPUs more efficiently, increasing performance and reducing power consumption.
Moreover, when Fedora is advertised as a distribution by a compute service provider, users can be certain that their AVX2-optimized software will run in this environment.
== Scope ==
- Proposal owners: Update the <code>gcc</code> and
<code>redhat-rpm-config</code> package to implement the new compiler flags. It is expected that the new baseline will be implemented in a new GCC <code>-march=</code> option for convenience.
- Other developers: Other developers may have to adjust test suites
which expect exact floating point results, and correct linking with <code>libatomic</code>. They will also have to upgrade their x86-64 machines to something that can execute AVX2 instructions.
- Release engineering: [ #8513]
** All Fedora builders need to be AVX2-capable. ** Infrastructure ticket: [ #7968]
- Policies and guidelines: No guidelines need to be changed.
- Trademark approval: N/A (not needed for this Change)
== Upgrade/compatibility impact == Fedora installations on systems with CPUs which are not able to execute AVX2 instructions will not be able to upgrade.
== How To Test == General system testing will provide test coverage for this change.
== User Experience == User should observe improved performance and, likely, battery life. Developers will benefit from the knowledge that code with AVX2 optimizations will run wherever Fedora runs.
== Dependencies == There are no direct dependencies on this change at this time.
== Contingency Plan == It is possible to not implement this change, or implement a smaller subset of it (adopting the CMPXCHG16B instruction only, for example).
- Contingency mechanism: Mass rebuild with different/previous compiler glags.
- Contingency deadline: Final mass rebuild.
- Blocks release? No.
- Blocks product? No.
== Documentation == The new micro-architecture baseline and the resulting requirements need to be documented.
== Release Notes == Release notes must mention how users can determine whether their system supports AVX2 prior to upgrading, for example by running <code>grep avx2 /proc/cpuinfo</code>.
-- Ben Cotton He / Him / His Fedora Program Manager Red Hat TZ=America/Indiana/Indianapolis _______________________________________________ devel-announce mailing list -- To unsubscribe send an email to Fedora Code of Conduct: List Guidelines: List Archives:
Le mar. 23 juil. 2019 à 08:30, Igor Gnatenko a écrit :
On Tue, Jul 23, 2019 at 4:31 AM Igor Gnatenko wrote:
Hi Florian,
On Mon, Jul 22, 2019 at 9:28 PM Ben Cotton wrote:
== Summary == Fedora currently uses the original K8 micro-architecture (without 3DNow! and other AMD-specific parts) as the baseline for its <code>x86_64</code> architecture. This baseline dates back to 2003 and has not been updated since. As a result, performance of Fedora is not as good as it could be on current CPUs.
This change to update the micro-architecture level for the architecture to something more recent.
== Owner ==
- Name: [[User:fweimer| Florian Weimer]]
- Email: []
== Detailed Description ==
After preliminary discussions with CPU vendors, we propose AVX2 as the new baseline. AVX2 support was introduced into CPUs from 2013 to 2015. See [ CPUs with AVX2].
Along with AVX2, it makes sense to enable certain other CPU features which are not strictly implied by AVX2, such as CMPXCHG16B, FMA, and earlier vector extensions such as SSE 4.2. Details are still being worked out.
It seems that Intel is still manufacturing CPUs without AVX support (not even talking about AVX2) in 2019. So this is clearly no-go for me.
But I do want to see some refreshments in this area! There are multiple options how to proceed I think:
- Lower requirement to something like SSE4 and select other CPU
features which are available in most of CPUs for last decade. 2. Build every package on x86_64 twice (one for compatible set and one for this new-features set), possibly by introducting sub-architecture in koji or using koji-shadow (that's just implementation detail. Produce an official spin which is built from these packages.
Thinking about this even more, it should not be very hard thing to do:
- Define new architecture in RPM/libsolv (let's call it "haswell" or
x86_64avx2 ? or even avx2 ?
- Define set of capabilities it should have, write appropriate check
in RPM/libdnf
- Add new architecture in Fedora Koji
Do we really need a whole separate architecture ? I expect that enabling few selected packages to have a second (a third) optimized build will be enough. koji already support this. Is this the sub-architecture the proposal is referring to ? (using koji add-pkg f31 glibc --extra-arches=EXTRA_ARCHES ... ). The list of packages having a second optimized build can be as large as the packages provided by the server spin and any additional packages that would opt-in.
Personally, I would like to see some "numbers" of the performance with avx optimized build (using copr repo on few key packages ). And I expect optimizing some packages would have low impact, so maybe a
Nicolas (kwizart)
Le mar. 23 juil. 2019 à 09:38, Nicolas Chauvet a écrit :
Le mar. 23 juil. 2019 à 08:30, Igor Gnatenko a écrit :
On Tue, Jul 23, 2019 at 4:31 AM Igor Gnatenko wrote:
Hi Florian,
On Mon, Jul 22, 2019 at 9:28 PM Ben Cotton wrote:
== Summary == Fedora currently uses the original K8 micro-architecture (without 3DNow! and other AMD-specific parts) as the baseline for its <code>x86_64</code> architecture. This baseline dates back to 2003 and has not been updated since. As a result, performance of Fedora is not as good as it could be on current CPUs.
This change to update the micro-architecture level for the architecture to something more recent.
== Owner ==
- Name: [[User:fweimer| Florian Weimer]]
- Email: []
== Detailed Description ==
After preliminary discussions with CPU vendors, we propose AVX2 as the new baseline. AVX2 support was introduced into CPUs from 2013 to 2015. See [ CPUs with AVX2].
Along with AVX2, it makes sense to enable certain other CPU features which are not strictly implied by AVX2, such as CMPXCHG16B, FMA, and earlier vector extensions such as SSE 4.2. Details are still being worked out.
It seems that Intel is still manufacturing CPUs without AVX support (not even talking about AVX2) in 2019. So this is clearly no-go for me.
But I do want to see some refreshments in this area! There are multiple options how to proceed I think:
- Lower requirement to something like SSE4 and select other CPU
features which are available in most of CPUs for last decade. 2. Build every package on x86_64 twice (one for compatible set and one for this new-features set), possibly by introducting sub-architecture in koji or using koji-shadow (that's just implementation detail. Produce an official spin which is built from these packages.
Thinking about this even more, it should not be very hard thing to do:
- Define new architecture in RPM/libsolv (let's call it "haswell" or
x86_64avx2 ? or even avx2 ?
- Define set of capabilities it should have, write appropriate check
in RPM/libdnf
- Add new architecture in Fedora Koji
Do we really need a whole separate architecture ? I expect that enabling few selected packages to have a second (a third) optimized build will be enough. koji already support this. Is this the sub-architecture the proposal is referring to ? (using koji add-pkg f31 glibc --extra-arches=EXTRA_ARCHES ... ). The list of packages having a second optimized build can be as large as the packages provided by the server spin and any additional packages that would opt-in.
Personally, I would like to see some "numbers" of the performance with avx optimized build (using copr repo on few key packages ). And I expect optimizing some packages would have low impact, so maybe a
... benchmark need to be done in order to enable an alternate build on a selected set of packages.
(previous email sent too early).
On Tue, Jul 23, 2019 at 10:44 AM Nicolas Chauvet wrote:
Le mar. 23 juil. 2019 à 08:30, Igor Gnatenko a écrit :
On Tue, Jul 23, 2019 at 4:31 AM Igor Gnatenko wrote:
Hi Florian,
On Mon, Jul 22, 2019 at 9:28 PM Ben Cotton wrote:
== Summary == Fedora currently uses the original K8 micro-architecture (without 3DNow! and other AMD-specific parts) as the baseline for its <code>x86_64</code> architecture. This baseline dates back to 2003 and has not been updated since. As a result, performance of Fedora is not as good as it could be on current CPUs.
This change to update the micro-architecture level for the architecture to something more recent.
== Owner ==
- Name: [[User:fweimer| Florian Weimer]]
- Email: []
== Detailed Description ==
After preliminary discussions with CPU vendors, we propose AVX2 as the new baseline. AVX2 support was introduced into CPUs from 2013 to 2015. See [ CPUs with AVX2].
Along with AVX2, it makes sense to enable certain other CPU features which are not strictly implied by AVX2, such as CMPXCHG16B, FMA, and earlier vector extensions such as SSE 4.2. Details are still being worked out.
It seems that Intel is still manufacturing CPUs without AVX support (not even talking about AVX2) in 2019. So this is clearly no-go for me.
But I do want to see some refreshments in this area! There are multiple options how to proceed I think:
- Lower requirement to something like SSE4 and select other CPU
features which are available in most of CPUs for last decade. 2. Build every package on x86_64 twice (one for compatible set and one for this new-features set), possibly by introducting sub-architecture in koji or using koji-shadow (that's just implementation detail. Produce an official spin which is built from these packages.
Thinking about this even more, it should not be very hard thing to do:
- Define new architecture in RPM/libsolv (let's call it "haswell" or
x86_64avx2 ? or even avx2 ?
I just did not want to bikeshed :)
- Define set of capabilities it should have, write appropriate check
in RPM/libdnf
- Add new architecture in Fedora Koji
Do we really need a whole separate architecture ? I expect that enabling few selected packages to have a second (a third) optimized build will be enough. koji already support this. Is this the sub-architecture the proposal is referring to ? (using koji add-pkg f31 glibc --extra-arches=EXTRA_ARCHES ... ). The list of packages having a second optimized build can be as large as the packages provided by the server spin and any additional packages that would opt-in.
I would leave this decision to Florian. Strictly speaking, we are talking not only about avx2, but some other instructions / options which might affect system as a whole (like FMA).
So whether it is used by all packages or by few selected ones require a bit more investigation.
Personally, I would like to see some "numbers" of the performance with avx optimized build (using copr repo on few key packages ). And I expect optimizing some packages would have low impact, so maybe a
Yes, number would be really useful.
Nicolas (kwizart) _______________________________________________ devel mailing list -- To unsubscribe send an email to Fedora Code of Conduct: List Guidelines: List Archives:
Le mar. 23 juil. 2019 à 08:30, Igor Gnatenko <ignatenkobrain(a)> a écrit :
- Define new architecture in RPM/libsolv (let's call it "haswell" or
x86_64avx2 ? or even avx2 ?
SOMETHING, though. I can't be the only one here old enough to remember when Linux packages came in .i386, .i486, .i586, and then .i686 flavors. And then gradually the earlier generations were dropped off, and it was all just .i686. (Which some distros merged back into .i386 — a mistake, IMHO — while others kept as .i686.)
Doing this split, as much work as it would be on the infrastructure side, would also give us the numbers everyone is clamoring for. We could observe actual download/install rates for both arches, and pull the plug on legacy .x86_64 when the time is right, just as it was done with .i686.
Because if something isn't going to run on even some CURRENT x86_64 processors being sold, calling it "x86_64" is just wrong, just like calling Pentium-only packages "i386" was.
Hello, Igor Gnatenko.
Tue, 23 Jul 2019 07:34:06 +0200 you wrote:
- Define new architecture in RPM/libsolv (let's call it "haswell" or
I have a better idea: use modules to build special AVX/SSE4 enabled versions of some packages.
-- Sincerely, Vitaly Zaitsev (
On Tue, 23 Jul 2019 at 08:08, Vitaly Zaitsev via devel <> wrote:
Hello, Igor Gnatenko.
Tue, 23 Jul 2019 07:34:06 +0200 you wrote:
- Define new architecture in RPM/libsolv (let's call it "haswell" or
I have a better idea: use modules to build special AVX/SSE4 enabled versions of some packages.
You are looking at having to have a module for every package.. because gcc and glibc are the main target to be enabled in. And once you replace gcc/glibc.. you might as well do the kernel, and..
I think either a different focus distribution (who needs AVX2 nethack?) or a different architecture would work better.
-- Sincerely, Vitaly Zaitsev ( _______________________________________________ devel mailing list -- To unsubscribe send an email to Fedora Code of Conduct: List Guidelines: List Archives:
On 7/22/19 10:34 PM, Igor Gnatenko wrote:
On Tue, Jul 23, 2019 at 4:31 AM Igor Gnatenko wrote:
Thinking about this even more, it should not be very hard thing to do:
- Define new architecture in RPM/libsolv (let's call it "haswell" or
- Define set of capabilities it should have, write appropriate check
in RPM/libdnf
- Add new architecture in Fedora Koji
- Once bootstrapped, create composes
- At some point in future, merge this arch back to x86_64 and move forward
What do you think?
Unless someone can show some kind of MASSIVE benefit, I'm not in favor.
It's a ton of duplication of effort, tons more disk space, tons more cpu cycles wasted, a ton more mirror disk space, a ton more bandwith, etc.
On Tue, Jul 23, 2019 at 12:39 PM Kevin Fenzi wrote:
On 7/22/19 10:34 PM, Igor Gnatenko wrote:
On Tue, Jul 23, 2019 at 4:31 AM Igor Gnatenko wrote:
Thinking about this even more, it should not be very hard thing to do:
- Define new architecture in RPM/libsolv (let's call it "haswell" or
- Define set of capabilities it should have, write appropriate check
in RPM/libdnf
- Add new architecture in Fedora Koji
- Once bootstrapped, create composes
- At some point in future, merge this arch back to x86_64 and move forward
What do you think?
Unless someone can show some kind of MASSIVE benefit, I'm not in favor.
I think too often we focus on the technical implications (performance gain, etc) and sometimes don't consider wider aspects. So I'm curious what your view is. Can you elaborate on what kind of benefit you would view as warranting this?
It's a ton of duplication of effort, tons more disk space, tons more cpu cycles wasted, a ton more mirror disk space, a ton more bandwith, etc.
So let's look at this statement, for example. Everything listed is machine related, except the first part on duplication of effort. Machine related items are solvable with more machine resources. (That is not to be flippant, but it's far easier to solve than human impact.)
On the effort part, what if we structured it so it wasn't immediately 2x the effort. That would indeed be poor. If we assume for a minute that we have the machine resources, we can certainly come up with workflows that facilitate something like this in a manner that doesn't cause a large human overhead. I'm actually thinking of other areas that would benefit from not exactly the new architecture approach as traditionally know, but a new target space that allows the Fedora project to do new things.
Dismissing ideas like this because we thought too narrowly about the single discussion starter is something I often see Fedora do, and I honestly think it's causing the project to miss opportunities to add value in multiple ways.
On Tue, 2019-07-23 at 13:32 -0400, Josh Boyer wrote:
On Tue, Jul 23, 2019 at 12:39 PM Kevin Fenzi wrote:
On 7/22/19 10:34 PM, Igor Gnatenko wrote:
On Tue, Jul 23, 2019 at 4:31 AM Igor Gnatenko wrote: Thinking about this even more, it should not be very hard thing to do:
- Define new architecture in RPM/libsolv (let's call it "haswell" or
- Define set of capabilities it should have, write appropriate check
in RPM/libdnf
- Add new architecture in Fedora Koji
- Once bootstrapped, create composes
- At some point in future, merge this arch back to x86_64 and move forward
What do you think?
Unless someone can show some kind of MASSIVE benefit, I'm not in favor.
I think too often we focus on the technical implications (performance gain, etc) and sometimes don't consider wider aspects. So I'm curious what your view is. Can you elaborate on what kind of benefit you would view as warranting this?
It's a ton of duplication of effort, tons more disk space, tons more cpu cycles wasted, a ton more mirror disk space, a ton more bandwith, etc.
So let's look at this statement, for example. Everything listed is machine related, except the first part on duplication of effort. Machine related items are solvable with more machine resources. (That is not to be flippant, but it's far easier to solve than human impact.)
Well, sort of - except that, life being life, machines inevitably go wrong. Fans give out and they choke. Builds mysteriously fail because of some test flake or a neutrino hitting the CPU at just the wrong moment or something. Disks go wonky. And all of these things get fixed by...people. Adding an arch adds another arch worth of all those things happening and needing to be fixed by someone.
Also, we can't really solve the machine resources of mirrors. Well, I mean, I guess we *could*, but I doubt anyone in RH is going to sign off on us buying a ton of expensive storage hardware and shipping it off to random universities around the world...
On the effort part, what if we structured it so it wasn't immediately 2x the effort. That would indeed be poor. If we assume for a minute that we have the machine resources, we can certainly come up with workflows that facilitate something like this in a manner that doesn't cause a large human overhead. I'm actually thinking of other areas that would benefit from not exactly the new architecture approach as traditionally know, but a new target space that allows the Fedora project to do new things.
I agree that this would be possible, but it comes with the caveat that the people who would likely get stuck with improving the workflows are the same people currently being overworked by the bad workflows.
The 'don't do a release for a year' proposal (or whatever variations of it were discussed) was supposed to help with that kinda thing, but...that didn't happen. So, we're all still on the treadmills.
On Tue, Jul 23, 2019 at 2:37 PM Adam Williamson wrote:
On Tue, 2019-07-23 at 13:32 -0400, Josh Boyer wrote:
On Tue, Jul 23, 2019 at 12:39 PM Kevin Fenzi wrote:
On 7/22/19 10:34 PM, Igor Gnatenko wrote:
On Tue, Jul 23, 2019 at 4:31 AM Igor Gnatenko wrote: Thinking about this even more, it should not be very hard thing to do:
- Define new architecture in RPM/libsolv (let's call it "haswell" or
- Define set of capabilities it should have, write appropriate check
in RPM/libdnf
- Add new architecture in Fedora Koji
- Once bootstrapped, create composes
- At some point in future, merge this arch back to x86_64 and move forward
What do you think?
Unless someone can show some kind of MASSIVE benefit, I'm not in favor.
I think too often we focus on the technical implications (performance gain, etc) and sometimes don't consider wider aspects. So I'm curious what your view is. Can you elaborate on what kind of benefit you would view as warranting this?
It's a ton of duplication of effort, tons more disk space, tons more cpu cycles wasted, a ton more mirror disk space, a ton more bandwith, etc.
So let's look at this statement, for example. Everything listed is machine related, except the first part on duplication of effort. Machine related items are solvable with more machine resources. (That is not to be flippant, but it's far easier to solve than human impact.)
Well, sort of - except that, life being life, machines inevitably go wrong. Fans give out and they choke. Builds mysteriously fail because of some test flake or a neutrino hitting the CPU at just the wrong moment or something. Disks go wonky. And all of these things get fixed by...people. Adding an arch adds another arch worth of all those things happening and needing to be fixed by someone.
Agreed. I'd like to frame the discussion less around "adding another arch" and more around "adding a new thing", but you mostly correct. I would suggest that there is this nebulous thing called "the cloud" that mitigates a small part of that, but I also fully understand using that magical machine resource presents its own challenges.
Also, we can't really solve the machine resources of mirrors. Well, I mean, I guess we *could*, but I doubt anyone in RH is going to sign off on us buying a ton of expensive storage hardware and shipping it off to random universities around the world...
Honestly, I'm less concerned about this. Why? Because anything new like this does not immediately require the full weight of a mirror system. The level of interest is likely to be small enough at the start that we can and should approach it in a measured way.
On the effort part, what if we structured it so it wasn't immediately 2x the effort. That would indeed be poor. If we assume for a minute that we have the machine resources, we can certainly come up with workflows that facilitate something like this in a manner that doesn't cause a large human overhead. I'm actually thinking of other areas that would benefit from not exactly the new architecture approach as traditionally know, but a new target space that allows the Fedora project to do new things.
I agree that this would be possible, but it comes with the caveat that the people who would likely get stuck with improving the workflows are the same people currently being overworked by the bad workflows.
This makes the assumption that there is no influx of actual humans. Given history, maybe that's a fair assumption. I think for anything new to work, we'd need at least some way to add actual human participants. Either by freeing up existing people, or bringing new, interested people in.
The 'don't do a release for a year' proposal (or whatever variations of it were discussed) was supposed to help with that kinda thing, but...that didn't happen. So, we're all still on the treadmills.
It didn't happen, but I'm seeing different approaches already being taken to address similar issues. The current discussion around dropping a number of apps, for example.
All of these things require a cost/benefit analysis for sure. That is very hard, but it's also very healthy. Just doing the same thing forever just gets you the same thing forever, right? I haven't been a super-active Fedora participant very recently, but I'm encouraged that the project is starting to look at things in new ways and evaluating what is actually a valuable thing to do. I find it very exciting.
On Tue, 2019-07-23 at 14:57 -0400, Josh Boyer wrote:
Also, we can't really solve the machine resources of mirrors. Well, I mean, I guess we *could*, but I doubt anyone in RH is going to sign off on us buying a ton of expensive storage hardware and shipping it off to random universities around the world...
Honestly, I'm less concerned about this. Why? Because anything new like this does not immediately require the full weight of a mirror system. The level of interest is likely to be small enough at the start that we can and should approach it in a measured way.
True, but the way our build process, repo layout and mirroring system work, if you want to leave a bit of Fedora out when you're mirroring it, this is not easy. rsync bundles can't really do the job in cases like this, because of how they're based on directory structures combined with how we structure our repos. Mirrors have to use some kind of script with a filter to do this; quick-fedora-mirror helps, but you still have to maintain and write the filters. My mirror currently rejoices in this:
to try and reduce the amount of bandwidth it eats. Which is of course fun to remember about and maintain. And hey look, indeed I haven't, cos I didn't add 28 to it yet...
Of course we could completely rearrange how we build and store things, but...see under 'human resources' :)
Josh Boyer wrote:
I would suggest that there is this nebulous thing called "the cloud" that mitigates a small part of that, but I also fully understand using that magical machine resource presents its own challenges.
As the FSF puts it: "There is no cloud, just other people's computers."
Kevin Kofler
On Tue, Jul 23, 2019 at 09:50:17PM +0200, Kevin Kofler wrote:
I would suggest that there is this nebulous thing called "the cloud" that mitigates a small part of that, but I also fully understand using that magical machine resource presents its own challenges.
As the FSF puts it: "There is no cloud, just other people's computers."
Yes, and that's not necessarily always bad. In this case, it's exactly the point.
On Tue, Jul 23, 2019 at 7:37 PM Adam Williamson wrote:
On Tue, 2019-07-23 at 13:32 -0400, Josh Boyer wrote:
On Tue, Jul 23, 2019 at 12:39 PM Kevin Fenzi wrote:
On 7/22/19 10:34 PM, Igor Gnatenko wrote:
On Tue, Jul 23, 2019 at 4:31 AM Igor Gnatenko wrote: Thinking about this even more, it should not be very hard thing to do:
- Define new architecture in RPM/libsolv (let's call it "haswell" or
- Define set of capabilities it should have, write appropriate check
in RPM/libdnf
- Add new architecture in Fedora Koji
- Once bootstrapped, create composes
- At some point in future, merge this arch back to x86_64 and move forward
What do you think?
Unless someone can show some kind of MASSIVE benefit, I'm not in favor.
I think too often we focus on the technical implications (performance gain, etc) and sometimes don't consider wider aspects. So I'm curious what your view is. Can you elaborate on what kind of benefit you would view as warranting this?
It's a ton of duplication of effort, tons more disk space, tons more cpu cycles wasted, a ton more mirror disk space, a ton more bandwith, etc.
So let's look at this statement, for example. Everything listed is machine related, except the first part on duplication of effort. Machine related items are solvable with more machine resources. (That is not to be flippant, but it's far easier to solve than human impact.)
Well, sort of - except that, life being life, machines inevitably go wrong. Fans give out and they choke. Builds mysteriously fail because of some test flake or a neutrino hitting the CPU at just the wrong moment or something. Disks go wonky. And all of these things get fixed by...people. Adding an arch adds another arch worth of all those things happening and needing to be fixed by someone.
Also, we can't really solve the machine resources of mirrors. Well, I mean, I guess we *could*, but I doubt anyone in RH is going to sign off on us buying a ton of expensive storage hardware and shipping it off to random universities around the world...
On the effort part, what if we structured it so it wasn't immediately 2x the effort. That would indeed be poor. If we assume for a minute that we have the machine resources, we can certainly come up with workflows that facilitate something like this in a manner that doesn't cause a large human overhead. I'm actually thinking of other areas that would benefit from not exactly the new architecture approach as traditionally know, but a new target space that allows the Fedora project to do new things.
I agree that this would be possible, but it comes with the caveat that the people who would likely get stuck with improving the workflows are the same people currently being overworked by the bad workflows.
Completely agree here, but I've seen no concrete proposals from the leadership as to how they intend to fix this. There's proposals from the CPE team to jettison things, which makes sense for them but it's completely undefined what the wider impact on other teams or the wider community will be there, I suspect it just moves the bottleneck.
The 'don't do a release for a year' proposal (or whatever variations of it were discussed) was supposed to help with that kinda thing, but...that didn't happen. So, we're all still on the treadmills.
Part of that was because the proposal was just "stop releasing to fix stuff" but there wasn't any form of proposal of what was going to be fixed or how, there was just wide ranging hand wavy "stuff".
Josh Boyer wrote:
I think too often we focus on the technical implications (performance gain, etc) and sometimes don't consider wider aspects.
What "wider aspects" would you want to consider? What implications other than technical matter for a technical decision such as this one?
Kevin Kofler
On Tue, Jul 23, 2019 at 3:48 PM Kevin Kofler wrote:
What "wider aspects" would you want to consider? What implications other than technical matter for a technical decision such as this one?
This is much larger than a technical decision. There are big impacts, as we've seen, on who can use Fedora if we implement the change as proposed.
Igor Gnatenko wrote:
- Lower requirement to something like SSE4 and select other CPU
features which are available in most of CPUs for last decade.
Sorry, but -1 to SSE4 too. One of my machines supports only up to SSSE3, and other replies in this thread have also suggested SSSE3 as the most we can assume. And if you ask me, we should just stick to SSE2 as the baseline. What are the big gains to be had from SSE3, SSSE3, SSE4.1, and SSE4.2? Especially if you limit it to packages that don't do runtime detection? (Performance-sensitive software SHOULD do runtime detection, and most of it does, e.g., OpenBLAS.)
- Build every package on x86_64 twice (one for compatible set and one
for this new-features set), possibly by introducting sub-architecture in koji or using koji-shadow (that's just implementation detail. Produce an official spin which is built from these packages.
That would at least be tolerable, but still, I'm against it. It sounds like a huge waste of resources for very little practical gain to me.
- Invent some mechanism for selecting appropriate feature set in
runtime (somebody mentioned fat binaries in this thread).
We already have 2 such mechanisms: * several upstream software packages check CPUID directly. See, e.g., how OpenBLAS does it. Or the performance-sensitive parts of Chromium. Etc. * you can drop optimized builds of entire shared objects (.so) into an appropriate subdirectory of %{_libdir}. Some profiles such as haswell are already supported. If we need more, they can be added.
So I don't see a need for fat binaries.
Kevin Kofler
On Tue, Jul 23, 2019 at 12:08 PM Kevin Kofler wrote:
Igor Gnatenko wrote:
- Lower requirement to something like SSE4 and select other CPU
features which are available in most of CPUs for last decade.
Sorry, but -1 to SSE4 too. One of my machines supports only up to SSSE3, and other replies in this thread have also suggested SSSE3 as the most we can assume. And if you ask me, we should just stick to SSE2 as the baseline. What are the big gains to be had from SSE3, SSSE3, SSE4.1, and SSE4.2? Especially if you limit it to packages that don't do runtime detection? (Performance-sensitive software SHOULD do runtime detection, and most of it does, e.g., OpenBLAS.)
I used SSE4 as an example. Obviously one needs to spend time digging into all this and find appropriate set.
From what I saw, openblas does not do any runtime detection. You either compile it with avx2 or not. And in runtime it will check whether it was enabled during compilation and use some kind of fallback.
- Build every package on x86_64 twice (one for compatible set and one
for this new-features set), possibly by introducting sub-architecture in koji or using koji-shadow (that's just implementation detail. Produce an official spin which is built from these packages.
That would at least be tolerable, but still, I'm against it. It sounds like a huge waste of resources for very little practical gain to me.
Nicolas pointed out that it is possible to add extra arches on package basis. So if we choose to just build some packages with these settings enabled, we can do that. But if from the change would benefit most of the packages, building all of them sounds reasonable.
And after all, it is Fedora resources, so you probably don't have to care much about it. It is also not some uncommon architecture where 1 server costs billions. So I think if benefit is big, we could find resources for doing this.
- Invent some mechanism for selecting appropriate feature set in
runtime (somebody mentioned fat binaries in this thread).
We already have 2 such mechanisms:
- several upstream software packages check CPUID directly. See, e.g., how OpenBLAS does it. Or the performance-sensitive parts of Chromium. Etc.
You can't do several things with this, like FMA.
- you can drop optimized builds of entire shared objects (.so) into an appropriate subdirectory of %{_libdir}. Some profiles such as haswell are already supported. If we need more, they can be added.
You are talking about libraries while I am talking about binaries.
So I don't see a need for fat binaries.
Kevin Kofler
devel mailing list -- To unsubscribe send an email to Fedora Code of Conduct: List Guidelines: List Archives:
On Tue, 23 Jul 2019 12:16:45 +0200 Igor Gnatenko wrote:
On Tue, Jul 23, 2019 at 12:08 PM Kevin Kofler wrote:
Igor Gnatenko wrote:
- Lower requirement to something like SSE4 and select other CPU
features which are available in most of CPUs for last decade.
Sorry, but -1 to SSE4 too. One of my machines supports only up to SSSE3, and other replies in this thread have also suggested SSSE3 as the most we can assume. And if you ask me, we should just stick to SSE2 as the baseline. What are the big gains to be had from SSE3, SSSE3, SSE4.1, and SSE4.2? Especially if you limit it to packages that don't do runtime detection? (Performance-sensitive software SHOULD do runtime detection, and most of it does, e.g., OpenBLAS.)
I used SSE4 as an example. Obviously one needs to spend time digging into all this and find appropriate set.
From what I saw, openblas does not do any runtime detection. You either compile it with avx2 or not. And in runtime it will check whether it was enabled during compilation and use some kind of fallback.
openblas can do a runtime CPU detection for x86, aarch64 and Power, if built accordingly
Igor Gnatenko wrote:
From what I saw, openblas does not do any runtime detection. You either compile it with avx2 or not. And in runtime it will check whether it was enabled during compilation and use some kind of fallback.
If built with the DYNAMIC_ARCH option, which is the case in the Fedora packages, OpenBLAS actually compiles its routines several times, for many different architectures it supports (even some with the same instruction sets, but different performance characteristics), and then picks whatever is best according to the runtime CPUID information.
We already have 2 such mechanisms:
- several upstream software packages check CPUID directly. See, e.g., how OpenBLAS does it. Or the performance-sensitive parts of Chromium. Etc.
You can't do several things with this, like FMA.
Sure you can! OpenBLAS actually also checks for both FMA variants (FMA3 and FMA4) during the runtime detection. The routines for the CPUs that support FMA also make use of it.
You are talking about libraries while I am talking about binaries.
Then just build the binary as a library, twice, and make a dummy main program that links to the library. (See also the "kdeinit hack", which does/did something similar for different reasons.)
Kevin Kofler
On Tue, Jul 23, 2019 at 11:05:59AM +0200, Kevin Kofler wrote:
assume. And if you ask me, we should just stick to SSE2 as the baseline.
Ie the status quo.
What are the big gains to be had from SSE3, SSSE3, SSE4.1, and SSE4.2?
Each of those individually, and from a general system library persepective, I'd wager not a whole lot. But in aggregate, there are a lot of Clear Linux benchmarks showing a sizeable bump in general purpose performance.
That said -- A reasonable argument can be made to bump the baseline to require SSE3, because all non-AMD x86_64 CPUs support it, and on the AMD side, anything beyond their 1st-gen single-core K8s supports it. (We're talking April 2005 here, versus the September 2003 introduction of the very first x86_64 processor)
As another data point, Windows 8.x effectively required SSE3 on 64-bit CPUs as the other CPU features they required (LAHF/SAHF, CMPXCHG16B, and NX) were only implemented together on SSE3-capable processors.
(And Steam's hardware survey shows that a full 100% of their users have an SSE3-capable processor..)
- Solomon
On Tue, Jul 23, 2019 at 08:25:59AM -0400, Solomon Peachy wrote:
On Tue, Jul 23, 2019 at 11:05:59AM +0200, Kevin Kofler wrote:
assume. And if you ask me, we should just stick to SSE2 as the baseline.
Ie the status quo.
What are the big gains to be had from SSE3, SSSE3, SSE4.1, and SSE4.2?
Each of those individually, and from a general system library persepective, I'd wager not a whole lot. But in aggregate, there are a lot of Clear Linux benchmarks showing a sizeable bump in general purpose performance.
That said -- A reasonable argument can be made to bump the baseline to require SSE3, because all non-AMD x86_64 CPUs support it, and on the AMD side, anything beyond their 1st-gen single-core K8s supports it. (We're talking April 2005 here, versus the September 2003 introduction of the very first x86_64 processor)
FWIW, in order to maintain historical guest ABI compatibility qemu defaults to a CPU model, qemu64, that lacks sse3 support.
Most apps using libvirt (virt-install, virt-manager, OpenStack, oVirt, etc) will override this historical default with something more modern. So the QEMU default is only an issue for people who manually launch QEMU without giving an explicit "-cpu" arg to pick something better.
Fortunately some work in QEMU upstream stands a good chance of letting us move to a default CPU model that is more useful/modern in the not too distant future (hopefully < 12 months)
Regards, Daniel
On Monday, July 22, 2019, Ben Cotton wrote:
== Summary == Fedora currently uses the original K8 micro-architecture (without 3DNow! and other AMD-specific parts) as the baseline for its <code>x86_64</code> architecture. This baseline dates back to 2003 and has not been updated since. As a result, performance of Fedora is not as good as it could be on current CPUs.
This change to update the micro-architecture level for the architecture to something more recent.
== Owner ==
- Name: [[User:fweimer| Florian Weimer]]
- Email: []
== Detailed Description ==
After preliminary discussions with CPU vendors, we propose AVX2 as the new baseline. AVX2 support was introduced into CPUs from 2013 to 2015. See [ Extensions#CPUs_with_AVX2 CPUs with AVX2].
Along with AVX2, it makes sense to enable certain other CPU features which are not strictly implied by AVX2, such as CMPXCHG16B, FMA, and earlier vector extensions such as SSE 4.2. Details are still being worked out.
A test rebuild of a distribution largely based on Fedora 28 showed that there is only a small number of build failures due to the baseline switch. Very few packages are confused about the availability of the CMPXCHG16B instruction, leading to linking failures related to <code>-latomic</code>, and there are some hard-coded floating point results that could change due to vectorization. (The latter is within bounds of the usual cross-architecture variation for such tests.)
== Benefit to Fedora ==
Fedora will use current CPUs more efficiently, increasing performance and reducing power consumption.
Moreover, when Fedora is advertised as a distribution by a compute service provider, users can be certain that their AVX2-optimized software will run in this environment.
== Scope ==
- Proposal owners: Update the <code>gcc</code> and
<code>redhat-rpm-config</code> package to implement the new compiler flags. It is expected that the new baseline will be implemented in a new GCC <code>-march=</code> option for convenience.
- Other developers: Other developers may have to adjust test suites
which expect exact floating point results, and correct linking with <code>libatomic</code>. They will also have to upgrade their x86-64 machines to something that can execute AVX2 instructions.
- Release engineering: [ #8513]
** All Fedora builders need to be AVX2-capable. ** Infrastructure ticket: [ #7968]
- Policies and guidelines: No guidelines need to be changed.
- Trademark approval: N/A (not needed for this Change)
== Upgrade/compatibility impact == Fedora installations on systems with CPUs which are not able to execute AVX2 instructions will not be able to upgrade.
== How To Test == General system testing will provide test coverage for this change.
== User Experience == User should observe improved performance and, likely, battery life. Developers will benefit from the knowledge that code with AVX2 optimizations will run wherever Fedora runs.
== Dependencies == There are no direct dependencies on this change at this time.
== Contingency Plan == It is possible to not implement this change, or implement a smaller subset of it (adopting the CMPXCHG16B instruction only, for example).
- Contingency mechanism: Mass rebuild with different/previous compiler
- Contingency deadline: Final mass rebuild.
- Blocks release? No.
- Blocks product? No.
== Documentation == The new micro-architecture baseline and the resulting requirements need to be documented.
== Release Notes == Release notes must mention how users can determine whether their system supports AVX2 prior to upgrading, for example by running <code>grep avx2 /proc/cpuinfo</code>.
Please just take back this change and come back at April first if it was supposed to be a joke - if not then submit again in about 10 years.
Le 2019-07-23 07:02, drago01 a écrit :
Please just take back this change and come back at April first if it was supposed to be a joke - if not then submit again in about 10 years.
Fedora used to have the x86 repo for old hardware, and the x86_64 repo for new hardware. Now that the tech cursor moved enough x86 kernels are being retired, there would be nothing wrong in moving this functional split at another technical level.
Old hardware and new hardware will always exist. Trying to handle both in a single arch is just going to clash somewhere.
On Tue, Jul 23, 2019 at 11:31 AM Nicolas Mailhot via devel wrote:
Le 2019-07-23 07:02, drago01 a écrit :
Please just take back this change and come back at April first if it was supposed to be a joke - if not then submit again in about 10 years.
Fedora used to have the x86 repo for old hardware, and the x86_64 repo for new hardware. Now that the tech cursor moved enough x86 kernels are being retired, there would be nothing wrong in moving this functional split at another technical level.
The problem with that is getting someone to do the work. The whole reason that the i686 kernel was retired was due to people not stepping up to do the maintenance of the kernel, and the kernel alone. Having been one of the few people in the community that's been involved in and lead arch bring ups and maintained architectures in the bad old days of secondary koji instances and continue to lead the ARMv7 and aarch64 architectures I can tell you it's not an insignificant amount of work, both the initial boot strap and ongoing maintenance. I've been involved in the alternate architecture projects in Fedora for ~ 9.5 years and it's a LOT of work and it's not a do it once and it's done, it's constant and ongoing.
There needs to be a group of people prepared to put in that level of effort and from my own personal experience with the various Arm architectures over the years (armv5, armv7 and aarch64) and the out come of the i686 project I can tell you there's a lot of people that demand that there must be something because they need it and just about no-one who's prepared to actually do the damn well work. People will quite happily demand you work 24*7 without sleep, complain when their pet feature doesn't work, or when it fails to boot on their £4 ten year old SD card and insist that it must be fixed yesterday all while contributing exactly nothing!
Le 2019-07-23 12:48, Peter Robinson a écrit :
On Tue, Jul 23, 2019 at 11:31 AM Nicolas Mailhot via devel wrote:
Le 2019-07-23 07:02, drago01 a écrit :
Please just take back this change and come back at April first if it was supposed to be a joke - if not then submit again in about 10 years.
Fedora used to have the x86 repo for old hardware, and the x86_64 repo for new hardware. Now that the tech cursor moved enough x86 kernels are being retired, there would be nothing wrong in moving this functional split at another technical level.
The problem with that is getting someone to do the work. The whole reason that the i686 kernel was retired was due to people not stepping up to do the maintenance of the kernel, and the kernel alone.
I’m assuming (perhaps wrongly) that the people proposing the change would be ready to maintain the new hardware arch, and shake out the bugs associated with the new compiler flags. Because the bulk of us seem stuck with older hardware for now.
Peter Robinson wrote:
The problem with that is getting someone to do the work. The whole reason that the i686 kernel was retired was due to people not stepping up to do the maintenance of the kernel, and the kernel alone. Having been one of the few people in the community that's been involved in and lead arch bring ups and maintained architectures in the bad old days of secondary koji instances and continue to lead the ARMv7 and aarch64 architectures I can tell you it's not an insignificant amount of work, both the initial boot strap and ongoing maintenance. I've been involved in the alternate architecture projects in Fedora for ~ 9.5 years and it's a LOT of work and it's not a do it once and it's done, it's constant and ongoing.
Well, to be fair, the initial bootstrapping wouldn't be that big an issue because the bootstrap for x86_64+AVX2 would just be a copy of the normal x86_64, which would then be gradually replaced by mass rebuilds.
The bigger issue is the resource overhead, and especially the scarce resource that is time humans have to spend for debugging.
Kevin Kofler
On Tue, Jul 23, 2019 at 7:14 PM Kevin Kofler wrote:
Peter Robinson wrote:
The problem with that is getting someone to do the work. The whole reason that the i686 kernel was retired was due to people not stepping up to do the maintenance of the kernel, and the kernel alone. Having been one of the few people in the community that's been involved in and lead arch bring ups and maintained architectures in the bad old days of secondary koji instances and continue to lead the ARMv7 and aarch64 architectures I can tell you it's not an insignificant amount of work, both the initial boot strap and ongoing maintenance. I've been involved in the alternate architecture projects in Fedora for ~ 9.5 years and it's a LOT of work and it's not a do it once and it's done, it's constant and ongoing.
Well, to be fair, the initial bootstrapping wouldn't be that big an issue because the bootstrap for x86_64+AVX2 would just be a copy of the normal x86_64, which would then be gradually replaced by mass rebuilds.
Yep. This is how OpenMandriva bootstrapped the AMD Ryzen optimized build. Their path also involved declaring a new architecture ("znver1" is currently unknown to upstream rpm, libsolv, dnf, etc.).
The bigger issue is the resource overhead, and especially the scarce resource that is time humans have to spend for debugging.
Right. The other thing to keep in mind is that this is only slightly less scarce than distro developer friendly ARM hardware. At least for the proposed architecture, most of us don't have the hardware to use it. Regardless of the willingness of humans, most of us aren't going to spend literally thousands of dollars to get new PCs that have the necessary instructions. Many of us are likely not able to afford it, and those who can are also mindful of how much waste that causes and may not do it anyway.
-- 真実はいつも一つ!/ Always, there's only one truth!
This sounds like 'You should stop using and contributing to Fedora for x86_64' to me.
Technically, I don't have any concern.
Practically, as a user, I only have one machine that supports AVX2 which is my laptop. As a packager, the main machines that I use for building and testing my packages locally is an Intel Core 2 Duo and I expect it to serve me another 5 years. All my virtual machines run on i7-3770 at home which is also not AVX2 compatible.
I'd like to share some info of my other machines as well, so you probably can have a better understanding of a lifetime of computers. I only retired my Pentium 4 1.4GHz machine in early 2018. And I still have an AMD Sempron 2800+ working at home without any problems, although it is not running Fedora. Besides that, my old ThinkPad with Pentium T4200 is serving as my NAS running with Fedora 29.
People probably suggest me to replace some of the old machines. Sure, but it is much more costly than switching to another operating system such as CentOS, unless some affordable yet upstream-friendly SBSA compliant aarch64 machine available, which is not x86_64 of course.
I believe I'm not the only one with such a long computer lifetime, especially here in China.
So I don't think it's a good idea for this to happen within the near future, for example, 3 years.
On Tue, Jul 23, 2019 at 3:27 AM Ben Cotton wrote:
== Summary == Fedora currently uses the original K8 micro-architecture (without 3DNow! and other AMD-specific parts) as the baseline for its <code>x86_64</code> architecture. This baseline dates back to 2003 and has not been updated since. As a result, performance of Fedora is not as good as it could be on current CPUs.
This change to update the micro-architecture level for the architecture to something more recent.
== Owner ==
- Name: [[User:fweimer| Florian Weimer]]
- Email: []
== Detailed Description ==
After preliminary discussions with CPU vendors, we propose AVX2 as the new baseline. AVX2 support was introduced into CPUs from 2013 to 2015. See [ CPUs with AVX2].
Along with AVX2, it makes sense to enable certain other CPU features which are not strictly implied by AVX2, such as CMPXCHG16B, FMA, and earlier vector extensions such as SSE 4.2. Details are still being worked out.
A test rebuild of a distribution largely based on Fedora 28 showed that there is only a small number of build failures due to the baseline switch. Very few packages are confused about the availability of the CMPXCHG16B instruction, leading to linking failures related to <code>-latomic</code>, and there are some hard-coded floating point results that could change due to vectorization. (The latter is within bounds of the usual cross-architecture variation for such tests.)
== Benefit to Fedora ==
Fedora will use current CPUs more efficiently, increasing performance and reducing power consumption.
Moreover, when Fedora is advertised as a distribution by a compute service provider, users can be certain that their AVX2-optimized software will run in this environment.
== Scope ==
- Proposal owners: Update the <code>gcc</code> and
<code>redhat-rpm-config</code> package to implement the new compiler flags. It is expected that the new baseline will be implemented in a new GCC <code>-march=</code> option for convenience.
- Other developers: Other developers may have to adjust test suites
which expect exact floating point results, and correct linking with <code>libatomic</code>. They will also have to upgrade their x86-64 machines to something that can execute AVX2 instructions.
- Release engineering: [ #8513]
** All Fedora builders need to be AVX2-capable. ** Infrastructure ticket: [ #7968]
- Policies and guidelines: No guidelines need to be changed.
- Trademark approval: N/A (not needed for this Change)
== Upgrade/compatibility impact == Fedora installations on systems with CPUs which are not able to execute AVX2 instructions will not be able to upgrade.
== How To Test == General system testing will provide test coverage for this change.
== User Experience == User should observe improved performance and, likely, battery life. Developers will benefit from the knowledge that code with AVX2 optimizations will run wherever Fedora runs.
== Dependencies == There are no direct dependencies on this change at this time.
== Contingency Plan == It is possible to not implement this change, or implement a smaller subset of it (adopting the CMPXCHG16B instruction only, for example).
- Contingency mechanism: Mass rebuild with different/previous compiler glags.
- Contingency deadline: Final mass rebuild.
- Blocks release? No.
- Blocks product? No.
== Documentation == The new micro-architecture baseline and the resulting requirements need to be documented.
== Release Notes == Release notes must mention how users can determine whether their system supports AVX2 prior to upgrading, for example by running <code>grep avx2 /proc/cpuinfo</code>.
-- Ben Cotton He / Him / His Fedora Program Manager Red Hat TZ=America/Indiana/Indianapolis _______________________________________________ devel mailing list -- To unsubscribe send an email to Fedora Code of Conduct: List Guidelines: List Archives:
On 7/22/19 8:51 PM, Ben Cotton wrote:
After preliminary discussions with CPU vendors, we propose AVX2 as the new baseline. AVX2 support was introduced into CPUs from 2013 to 2015. See [ CPUs with AVX2].
In other words, Fedora will be unusable on most HW anything older than 3-4 years.
This is not helpful and furtherly directs Fedora into meaninglessness.
Given the nearly only negative replies to this proposal: can we please just officially mark it as retracted/rejected and move on?
P.S.: all my Fedora machines would no longer be able to run Fedora >= 32, effectively ending my involvement in this community :(
Ben Cotton writes:
== Summary == Fedora currently uses the original K8 micro-architecture (without 3DNow! and other AMD-specific parts) as the baseline for its <code>x86_64</code> architecture. This baseline dates back to 2003 and has not been updated since. As a result, performance of Fedora is not as good as it could be on current CPUs.
This change to update the micro-architecture level for the architecture to something more recent.
== Owner ==
- Name: [[User:fweimer| Florian Weimer]]
- Email: []
== Detailed Description ==
After preliminary discussions with CPU vendors, we propose AVX2 as the new baseline. AVX2 support was introduced into CPUs from 2013 to 2015. See [ CPUs with AVX2].
Along with AVX2, it makes sense to enable certain other CPU features which are not strictly implied by AVX2, such as CMPXCHG16B, FMA, and earlier vector extensions such as SSE 4.2. Details are still being worked out.
A test rebuild of a distribution largely based on Fedora 28 showed that there is only a small number of build failures due to the baseline switch. Very few packages are confused about the availability of the CMPXCHG16B instruction, leading to linking failures related to <code>-latomic</code>, and there are some hard-coded floating point results that could change due to vectorization. (The latter is within bounds of the usual cross-architecture variation for such tests.)
== Benefit to Fedora ==
Fedora will use current CPUs more efficiently, increasing performance and reducing power consumption.
Moreover, when Fedora is advertised as a distribution by a compute service provider, users can be certain that their AVX2-optimized software will run in this environment.
== Scope ==
- Proposal owners: Update the <code>gcc</code> and
<code>redhat-rpm-config</code> package to implement the new compiler flags. It is expected that the new baseline will be implemented in a new GCC <code>-march=</code> option for convenience.
- Other developers: Other developers may have to adjust test suites
which expect exact floating point results, and correct linking with <code>libatomic</code>. They will also have to upgrade their x86-64 machines to something that can execute AVX2 instructions.
- Release engineering: [ #8513]
** All Fedora builders need to be AVX2-capable. ** Infrastructure ticket: [ #7968]
- Policies and guidelines: No guidelines need to be changed.
- Trademark approval: N/A (not needed for this Change)
== Upgrade/compatibility impact == Fedora installations on systems with CPUs which are not able to execute AVX2 instructions will not be able to upgrade.
== How To Test == General system testing will provide test coverage for this change.
== User Experience == User should observe improved performance and, likely, battery life. Developers will benefit from the knowledge that code with AVX2 optimizations will run wherever Fedora runs.
== Dependencies == There are no direct dependencies on this change at this time.
== Contingency Plan == It is possible to not implement this change, or implement a smaller subset of it (adopting the CMPXCHG16B instruction only, for example).
- Contingency mechanism: Mass rebuild with different/previous compiler glags.
- Contingency deadline: Final mass rebuild.
- Blocks release? No.
- Blocks product? No.
== Documentation == The new micro-architecture baseline and the resulting requirements need to be documented.
== Release Notes == Release notes must mention how users can determine whether their system supports AVX2 prior to upgrading, for example by running <code>grep avx2 /proc/cpuinfo</code>.
-- Ben Cotton He / Him / His Fedora Program Manager Red Hat TZ=America/Indiana/Indianapolis _______________________________________________ devel mailing list -- To unsubscribe send an email to Fedora Code of Conduct: List Guidelines: List Archives:
Hi Ben
Considering there are new CPUs being sold by Intel today that doesn't even have AVX2 (point in case: Pentium Gold G5620), this sounds to me like a move that is happening way too soon. I would take the lowest denominator of features for CPUs of atleast 3 years of age considering how long some CPUs are being used in virtualized environments and at a lot of different cloud-providers (I've seen 5+ year old CPUs in at some smaller providers).
With kind regards Patrik
Patrik Mattsson wrote:
I would take the lowest denominator of features for CPUs of atleast 3 years of age considering how long some CPUs are being used in virtualized environments and at a lot of different cloud-providers (I've seen 5+ year old CPUs in at some smaller providers).
At least 10 years!
My notebook is 11 years old and still working. Here in Austria, that makes me look weird, but there are countries in this world where such machines are much more common place. See, e.g., Zamir Sun's reply about China. And there are even poorer countries out there.
So no, 3 years are not sufficient.
Kevin Kofler
On Tue, Jul 23, 2019 at 11:09 AM Kevin Kofler wrote:
Patrik Mattsson wrote:
I would take the lowest denominator of features for CPUs of atleast 3 years of age considering how long some CPUs are being used in virtualized environments and at a lot of different cloud-providers (I've seen 5+ year old CPUs in at some smaller providers).
At least 10 years!
My notebook is 11 years old and still working. Here in Austria, that makes me look weird, but there are countries in this world where such machines are much more common place. See, e.g., Zamir Sun's reply about China. And there are even poorer countries out there.
So no, 3 years are not sufficient.
I'm still getting complaints from groups when the same team bumped the i686 compile flags, which I somehow missed the proposal, for newer processors because their 10+ year old OLPC XO laptops can't run the newer software.
Well, that would be too much. 2011-ish hardware is still in use. But there is some truth behind this, may be baseline should be about 2008? SSE 4.2 as a baseline makes more sence.
On 7/22/19 9:51 PM, Ben Cotton wrote:
Along with AVX2, it makes sense to enable certain other CPU features which are not strictly implied by AVX2, such as CMPXCHG16B, FMA, and earlier vector extensions such as SSE 4.2. Details are still being worked out.
And why not use FMV?? the gcc/glibc is new enough, FMV could be implemented package by package ... using clear linux make-fmv-patch recipe one can adapt the code to various capabilities (and of course the patches can be contributed upstream, tweaked and customized)
On Mon, Jul 22, 2019 at 8:52 PM Ben Cotton wrote:
= Detailed Description ==
After preliminary discussions with CPU vendors, we propose AVX2 as the new baseline. AVX2 support was introduced into CPUs from 2013 to 2015. See [ CPUs with AVX2].
We have 3 test machines in Brno's Fedora QA office, none of which support AVX2. Both my parents run Fedora on their laptops, none of which support AVX2. My own desktop machine at home is on the borderline (Haswell).
I consider this proposal a complete no-go.
That said, I think it might be interesting to have a way for newer CPUs to make use of the new instructions, and people proposed ways to achieve that. But first, we should gather some performance numbers, whether it's worth it at all.
On Mon, Jul 22, 2019 at 11:52 AM Ben Cotton wrote:
== Summary ==
After preliminary discussions with CPU vendors, we propose AVX2 as the new baseline. AVX2 support was introduced into CPUs from 2013 to 2015. See [ CPUs with AVX2].
Along with AVX2, it makes sense to enable certain other CPU features which are not strictly implied by AVX2, such as CMPXCHG16B, FMA, and earlier vector extensions such as SSE 4.2. Details are still being worked out.
In the interest of a productive discussion, could we maybe focus on what the benefits are, both of changing the baseline in general and of enabling any particular features? As I see it, there are a few classes of relevant features:
Features like SSE2: enabling SSE2 as the basic floating point mechanism changes the ABI drastically. But x86_64 already requires SSE2, so this is irrelevant.
Things like SSSE3, SSE3, SSE4, AVX1, AVX2, FMA, etc: for the most part, these accelerate existing algorithms. I'm sure that someone somewhere has written an algorithm that requires FMA for enhanced precision, but otherwise pretty much any code that benefits from any of these features would work just fine, if slower, without the features.
Things like CMPXCHG16B that change the set of things that can be done on the CPU. I could easily imagine programs that use algorithms that fundamentally depend on CMPXCHG16B. There is no drop-in replacement.
Things like FSGSBASE that change the way that software interacts with the kernel. Don't even get me started on FSGSBASE on Linux.
So I could see a concrete benefit to Fedora from requiring CMPXCHG16B. If FSGSBASE were actually supported and widely used, I could see that, too, but FSGSBASE is not a credible requirement for Fedora since IIRC it's not supported on Sandy Bridge. Even Intel seems to consider Sandy Bridge to be an important CPU to support.
As for the vector features, they're always a moving target. Even if Fedora did require AVX2, performance would be left on the table by not using AVX-512 where avaiable. (And AVX2 isn't such a great thing anyway given certain microarchitectures' latency and power issues.)
I think that, for vector extensions, Fedora shouldn't require anything beyond SSE2 for basic functionality. Instead, Fedora should figure out where there are material benefits to using them and find ways to make it easier for packagers to make them available. Ideally this would all be figured out at runtime, but install-time choices could make sense, too.
On Tue, Jul 23, 2019 at 07:52:09AM -0700, Andrew Lutomirski wrote:
Things like CMPXCHG16B that change the set of things that can be done on the CPU. I could easily imagine programs that use algorithms that fundamentally depend on CMPXCHG16B. There is no drop-in replacement.
FWIW, CMPXCH16B is a hard requirement for Windows 8.1 (August 2013!) and beyond.
In AMD-land, it seems that CMPXCHG16B support was added at the same time as they added SSE3 (ie April 2005). so requiring that would cut off the 1st-gen single-core x86_64 AMD parts)
All non-AMD x86_64 parts support both CMPXCHG16B and SSE3.
- Solomon
Andrew Lutomirski wrote:
Features like SSE2: enabling SSE2 as the basic floating point mechanism changes the ABI drastically. But x86_64 already requires SSE2, so this is irrelevant.
For what it's worth, only the x86_64 ABI actually makes use of this. For i686 (32-bit), even when Fedora moved to requiring SSE2, the ABI was not changed (because that would have meant bootstrapping a whole new architecture, breaking compatibility with all existing binaries in the process). So i686 is stuck with the x87 ABI, copying floating-point data back and forth between x87 and SSE registers.
Things like SSSE3, SSE3, SSE4, AVX1, AVX2, FMA, etc: for the most part, these accelerate existing algorithms. I'm sure that someone somewhere has written an algorithm that requires FMA for enhanced precision, but otherwise pretty much any code that benefits from any of these features would work just fine, if slower, without the features.
FMA can also be emulated in pure software (while still using hardware floating-point instructions!), it is even implemented in glibc. So this also falls under "would work just fine, if slower, without the features".
but FSGSBASE is not a credible requirement for Fedora since IIRC it's not supported on Sandy Bridge. Even Intel seems to consider Sandy Bridge to be an important CPU to support.
AVX2 is also not supported on Sandy Bridge.
I think that, for vector extensions, Fedora shouldn't require anything beyond SSE2 for basic functionality. Instead, Fedora should figure out where there are material benefits to using them and find ways to make it easier for packagers to make them available. Ideally this would all be figured out at runtime,
I agree.
but install-time choices could make sense, too.
I think we should just do it right and figure it out at runtime. Otherwise, the user ends up having to manually install things such as those infamous atlas-sse* subpackages, and most users will not bother doing that, or even not even know that they exist or at least which one to pick.
Kevin Kofler
On 7/23/19 7:52 AM, Andrew Lutomirski wrote:
In the interest of a productive discussion, could we maybe focus on what the benefits are, both of changing the baseline in general and of enabling any particular features?
As someone whose software heavily depends on SSE and AVX2 assembly code, we always do runtime detection. The SSE2 baseline of x86_64 is handy as there are a few things I can inline as a result, but there is no performance benefit to an AVX2 baseline, other than possibly the binary size dropping a bit as it no longer has to include the SSE2 versions of the functions.
I'm afraid this turned into a bit of and essay on more useful things Fedora could do for portable performance engineering, should anyone care.
I actually have no interest in Fedora except as a requirement to work on packaging for research software around EPEL, specifically for HPC and so performance-oriented. I'm not sure how long it's worth persisting in view of how difficult it is to contribute now, but these points are mostly general.
The x86 change is clearly a non-starter, and I'm surprised to see where it came from, but I don't see anyone mentioning much on rationale higher-level aspects, apart from some better things to do. Strikingly, there's no quantification of expected performance benefits. Anyway, they'd be rather limited by the compiler options we're supposed to use, that don't include vectorization, so you don't even get the benefit you could from SSE2. (I've been told off in review for turning that on, though an FPC member has approved it.)
I've seen lack of support for things that would help, and plenty of comments from people who clearly haven't work in this area. At least one sensible portable performance-oriented change has been blocked in committee (interchangeable BLAS implementations), and what I've seen from Red Hat and Fedora makes it increasingly hard to justify a RHEL-ish basis for HPC. However, things could be done for computational performance where it matters.
SIMD hwcaps have been mentioned, and I'm baffled why they haven't been implemented generally. That and similar changes are actually more important for non-x86 architectures less likely to have dynamically-dispatched SIMD-specific implementations. [The value of "SIMD" includes things like FMA, and I know not just for floating point.]
However, hwcaps won't help for programs with no separate library performance component; Gromacs is an example. On a heterogeneous HPC system you need multiple parallel-installable versions with a convention for the paths they'll be on. Other than that, maintainers could look at function multi-versioning for performance-critical code where that's possible. It isn't always, specifically not for Fortran, and I'd probably look at that first if I got back to GCC maintenance. (Actually, adding FMV to BLIS wasn't effective for some reason I haven't had time to chase.)
The "Clear Linux" stuff mentioned is unconvincing. The only worked example I've seen is for FFTW, where it actually has no effect, and I've seen no numbers. Using FMV outside performance-critical kernels -- just something GCC says is vectorizable -- is probably not a good idea, and any changes ought to be contributed explicitly for the source and support non-x86. (By the way, don't even Intel assume AVX as a baseline, not AVX2?)
There's already multi-simd support for ATLAS -- though I know no good reason to ATLAS -- and at least one package (libxsmm) has a minimum requirement of SSE3 without complaint. (I got that down from SSE4 for the benefit of systems we had, though you wouldn't use them for anything CPU-bound.)
Tradeoffs to satisfy a wide variety of users - a base system with most common software easy to try which can then be re-installed for performance. Flatpacks should help with easy but not performance optimal installation of many packages. Spack ( may be a packaging approach that gives some performance portability - one can get a compilation recipie so that performance is reasonable good. Easybuild ( is another way to go. Source based systems such as Gentoo may give better performance if configured correctly.
On 7/23/19 7:00 PM, Dave Love wrote:
I'm afraid this turned into a bit of and essay on more useful things Fedora could do for portable performance engineering, should anyone care.
I actually have no interest in Fedora except as a requirement to work on packaging for research software around EPEL, specifically for HPC and so performance-oriented. I'm not sure how long it's worth persisting in view of how difficult it is to contribute now, but these points are mostly general.
The x86 change is clearly a non-starter, and I'm surprised to see where it came from, but I don't see anyone mentioning much on rationale higher-level aspects, apart from some better things to do. Strikingly, there's no quantification of expected performance benefits. Anyway, they'd be rather limited by the compiler options we're supposed to use, that don't include vectorization, so you don't even get the benefit you could from SSE2. (I've been told off in review for turning that on, though an FPC member has approved it.)
I've seen lack of support for things that would help, and plenty of comments from people who clearly haven't work in this area. At least one sensible portable performance-oriented change has been blocked in committee (interchangeable BLAS implementations), and what I've seen from Red Hat and Fedora makes it increasingly hard to justify a RHEL-ish basis for HPC. However, things could be done for computational performance where it matters.
SIMD hwcaps have been mentioned, and I'm baffled why they haven't been implemented generally. That and similar changes are actually more important for non-x86 architectures less likely to have dynamically-dispatched SIMD-specific implementations. [The value of "SIMD" includes things like FMA, and I know not just for floating point.]
However, hwcaps won't help for programs with no separate library performance component; Gromacs is an example. On a heterogeneous HPC system you need multiple parallel-installable versions with a convention for the paths they'll be on. Other than that, maintainers could look at function multi-versioning for performance-critical code where that's possible. It isn't always, specifically not for Fortran, and I'd probably look at that first if I got back to GCC maintenance. (Actually, adding FMV to BLIS wasn't effective for some reason I haven't had time to chase.)
The "Clear Linux" stuff mentioned is unconvincing. The only worked example I've seen is for FFTW, where it actually has no effect, and I've seen no numbers. Using FMV outside performance-critical kernels -- just something GCC says is vectorizable -- is probably not a good idea, and any changes ought to be contributed explicitly for the source and support non-x86. (By the way, don't even Intel assume AVX as a baseline, not AVX2?)
There's already multi-simd support for ATLAS -- though I know no good reason to ATLAS -- and at least one package (libxsmm) has a minimum requirement of SSE3 without complaint. (I got that down from SSE4 for the benefit of systems we had, though you wouldn't use them for anything CPU-bound.) _______________________________________________ devel mailing list -- To unsubscribe send an email to Fedora Code of Conduct: List Guidelines: List Archives:
Benson Muite writes:
Tradeoffs to satisfy a wide variety of users - a base system with most common software easy to try which can then be re-installed for performance. Flatpacks should help with easy but not performance optimal installation of many packages. Spack ( may be a packaging approach that gives some performance portability - one can get a compilation recipie so that performance is reasonable good. Easybuild ( is another way to go. Source based systems such as Gentoo may give better performance if configured correctly.
That completely misses the point, apart from one about discouarging packaging and contributing to Fedora maintenance. Please assume that I know plenty about alternative packaging systems like Spack, and non-packaging systems like easybuild, which don't address the issue. If I want to rebuild rpms with different CFLAGS, obviously I can. Flatpak is irrelevant, but part of an unfortunate trend. If you advocate a solution, it better be capable of running in a manageable way across many nodes of a potentially non-x86_64 heterogeneous HPC cluster.
Dave Love wrote:
they'd be rather limited by the compiler options we're supposed to use, that don't include vectorization, so you don't even get the benefit you could from SSE2. (I've been told off in review for turning that on, though an FPC member has approved it.)
Why don't we enable -ftree-vectorize by default?
However, hwcaps won't help for programs with no separate library performance component; Gromacs is an example. On a heterogeneous HPC system you need multiple parallel-installable versions with a convention for the paths they'll be on.
As I wrote elsewhere in this huge thread: just turn the program into a library with a dummy main program.
There's already multi-simd support for ATLAS -- though I know no good reason to ATLAS
You mean the atlas-* subpackages that one has to manually install? That's actually one big reason to NOT use ATLAS, now that we have OpenBLAS that does it right.
and at least one package (libxsmm) has a minimum requirement of SSE3 without complaint. (I got that down from SSE4 for the benefit of systems we had, though you wouldn't use them for anything CPU-bound.)
Now you have a complaint. :-)
The baseline is SSE2, so the packages are supposed to support systems with nothing beyond SSE2. Just waiting until somebody reports the inevitable SIGILL is not a nice thing to do.
Now, if upstream doesn't support non-SSE3 CPUs, it might be nontrivial to fix the issue. But in principle, a package that requires SSE3 must be considered a bug.
Kevin Kofler
* Kevin Kofler:
Dave Love wrote:
they'd be rather limited by the compiler options we're supposed to use, that don't include vectorization, so you don't even get the benefit you could from SSE2. (I've been told off in review for turning that on, though an FPC member has approved it.)
Why don't we enable -ftree-vectorize by default?
GCC upstream thinks that it is still not beneficial in general. Obviously, there will always be *some* regressions, but for GCC 9, the thinking was that the regressions still outweight the striking benefits in some cases.
I believe Clang enables the auto-vectorizer at -O2.
However, hwcaps won't help for programs with no separate library performance component; Gromacs is an example. On a heterogeneous HPC system you need multiple parallel-installable versions with a convention for the paths they'll be on.
As I wrote elsewhere in this huge thread: just turn the program into a library with a dummy main program.
That requires manual work, so it's unclear how to do this for large parts of the distribution. And people will worry about PIC-related losses, or due to assumptions regarding symbol interposition (which affect inter-procedural analysis). The latter even affects Fedora because PIE does not turn off these optimizations.
Thanks, Florian
Florian Weimer wrote:
- Kevin Kofler:
As I wrote elsewhere in this huge thread: just turn the program into a library with a dummy main program.
That requires manual work, so it's unclear how to do this for large parts of the distribution.
I would not do this for large parts of the distribution, but only for the handful programs where it makes sense. It is surely not worth doubling the distribution's size to have ls run maybe 1% faster on some computers.
And people will worry about PIC-related losses, or due to assumptions regarding symbol interposition (which affect inter-procedural analysis). The latter even affects Fedora because PIE does not turn off these optimizations.
Then use -fno-semantic-interposition.
Kevin Kofler
You wrote:
Dave Love wrote:
they'd be rather limited by the compiler options we're supposed to use, that don't include vectorization, so you don't even get the benefit you could from SSE2. (I've been told off in review for turning that on, though an FPC member has approved it.)
Why don't we enable -ftree-vectorize by default?
I'm happy for it not to be default, as long as sane optimization options are allowed, and people don't think that they'll get all the benefit of recent micro-architectures without them. Note that you're likely to need unrolling to benefit from vectorization in numerical code anyhow.
As I wrote elsewhere in this huge thread: just turn the program into a library with a dummy main program.
That will produce technical problems as well as big maintenance ones, and all this isn't useful without the hwcaps anyhow. Effort is best put into engineering the programs.
You mean the atlas-* subpackages that one has to manually install? That's actually one big reason to NOT use ATLAS, now that we have OpenBLAS that does it right.
The main reason not to use ATLAS is that it's not performant (or wasn't, last I checked). As far as I know, OpenBLAS still isn't competitive with BLIS for avx512, which is the main reason I packaged BLIS and made shims to subvert slower BLASes.
and at least one package (libxsmm) has a minimum requirement of SSE3 without complaint. (I got that down from SSE4 for the benefit of systems we had, though you wouldn't use them for anything CPU-bound.)
Now you have a complaint. :-)
The baseline is SSE2, so the packages are supposed to support systems with nothing beyond SSE2. Just waiting until somebody reports the inevitable SIGILL is not a nice thing to do.
Now, if upstream doesn't support non-SSE3 CPUs, it might be nontrivial to fix the issue. But in principle, a package that requires SSE3 must be considered a bug.
Too bad IMHO. The probability of anyone running cp2k on that sort of system in a mode that invokes libxsmm is too small. Meanwhile, in that space, I can't get ga rebuilt so LAMMPS will actually run on a non-Infiniband fabric, and a bunch of other things that need fixing.
Even if people aren't well disposed to engineering, "Everything in the real world is engineering tradeoffs" (Richard O'Keefe).
I was going to argue this would make us lose a lot of hardware and most likely a lot of our the hardware owners as users too.
But I see that most of what I planned to say is already said, so I'll just add my: please, don't do this.
(My sample from home and work: out of 6 Fedora hosts expected to still live during F32, 2 have AVX2, 3 have AVX only, and one has naither.)
On 7/22/19 9:51 PM, Ben Cotton wrote:
After preliminary discussions with CPU vendors, we propose AVX2 as the new baseline. AVX2 support was introduced into CPUs from 2013 to 2015. See [ CPUs with AVX2].
This proposal seems mostly like an experiment in disguise to find out whether the Fedora developers can agree on *something*, and quite clearly the answer is yes, at least this once we can all agree to disagree with the proposed change.
For the record, none of the ~10 computers at my house, including my current work laptop, support AVX2. Such a change would force me to move the entire family away from Fedora, and would be "interesting" to try develop rpm on, say, Debian. Would make for some catchy headlines though ;)
- Panu -
Panu Matilainen wrote:
This proposal seems mostly like an experiment in disguise to find out whether the Fedora developers can agree on *something*,
This also looks to me like the tactic to ask for the moon to get a "compromise" that is still unacceptable.
and quite clearly the answer is yes, at least this once we can all agree to disagree with the proposed change.
I disagree with ANY raised vector instruction requirement, considering that: * it would make Fedora incompatible with some hardware out there, * the performance increase to be had is marginal, given that we are mostly talking about code written in C or C++ without even compiler vectorization (-ftree-vectorize) turned on, * there are already mechanisms for runtime feature detection, which are already widely used in those few packages that can actually benefit from the vector instructions (because they are performance-sensitive and because they have handwritten assembly or vector intrinsics code), * upstreams still widely support SSE2, so I don't see a burden for maintainers to keep it going (unlike the case of pre-SSE2 32-bit x86 where a few upstreams had dropped support).
Kevin Kofler
Personally, I am not at all against raising the bar for baseline x86_64. Of course, it'd be ideal to have some sort of derived x86_64_avx arch, but if we find out it'd require too much of an investment into infra/releng, I'd be +1 for just changing the base x86_64. Sure, it'd make sense to actually see some numbers from Fedora compiled with SSE4/AVX/AVX2 and not just guess from Clear Linux results.
I see AVX2 is just too high baseline (although, all my PCs and laptops support that for at least 2 years), but raising the baseline to something like AVX or SSE4 might make sense. I don't know why people with *not ancient* computers should have degraded performance just because we want to support everything from K8 from 2003. But as I said, it'd be nice to see some benchmarks to base the decision on and have optimized x86_64 as secondary arch, if possible.
On Wed, Jul 31, 2019 at 11:00 AM Kevin Kofler wrote:
- the performance increase to be had is marginal, given that we are mostly talking about code written in C or C++ without even compiler
vectorization (-ftree-vectorize) turned on,
Are you sure? Fore example (and there are more of them), lots of these do not seem marginal: ,
On Wed, 31 Jul 2019 at 09:15, Frantisek Zatloukal wrote:
Personally, I am not at all against raising the bar for baseline x86_64. Of course, it'd be ideal to have some sort of derived x86_64_avx arch, but if we find out it'd require too much of an investment into infra/releng, I'd be +1 for just changing the base x86_64. Sure, it'd make sense to actually see some numbers from Fedora compiled with SSE4/AVX/AVX2 and not just guess from Clear Linux results.
I see AVX2 is just too high baseline (although, all my PCs and laptops support that for at least 2 years), but raising the baseline to something like AVX or SSE4 might make sense. I don't know why people with *not ancient* computers should have degraded performance just because we want to support everything from K8 from 2003. But as I said, it'd be nice to see some benchmarks to base the decision on and have optimized x86_64 as secondary arch, if possible.
On Wed, Jul 31, 2019 at 11:00 AM Kevin Kofler wrote:
- the performance increase to be had is marginal, given that we are mostly talking about code written in C or C++ without even compiler
vectorization (-ftree-vectorize) turned on,
Are you sure? Fore example (and there are more of them), lots of these do not seem marginal: ,
The problem with words like marginal is that what Kevin in his head and what you have in your head probably mean two different things. Also when I see such statistics, I usually wonder "Are they repeatable?" Not just in the case that someone else runs Clear Linux and gets similar timings.. but if I compile my code with those options do I get those numbers or do I need to use Clear Linux to do so because there are other changes not taken into account by just compiling things with an option?
This was something we ran into several times in the past with the race to keep up with Mandrake or SuSE during the i486/i586/i686 days.. and again with various super computer rebuilds years later. We can compile the code with the same options but you may not get the same speeds. There can be other changes in the structure of the executable chain from kernel down to file node structure. All of those need to be taken into account to 'duplicate' test results.
Without doing that testing and confirming that we know all the options, we are no better off than the person who says they compile everything with -funrolloops
Frantisek Zatloukal writes:
On Wed, Jul 31, 2019 at 11:00 AM Kevin Kofler wrote:
- the performance increase to be had is marginal, given that we are mostly talking about code written in C or C++ without even compiler
vectorization (-ftree-vectorize) turned on,
Are you sure? Fore example (and there are more of them), lots of these do not seem marginal: ,
I see typically useless benchmarks without enough information, or profiles, that provide no real insight. These things rarely measure what they purport to; also error bars -- we've heard of them. Numpy is presumably dominated by level 3 BLAS (a library which is swappable on Ubuntu, as it should be in Fedora, with potentially two orders of magnitude performance difference in DGEMM), and whatever threading is used. I suspect similarly for the other things. First take Intel proprietary stuff out of the equation, and think about those numbers taken at face value.
Note that using avx2 can be worse than sse2/4, and cache effects are often more important (as in optimized BLAS).
I don't agree with the proposal, and am only interested in EPEL, but:
Kevin Kofler writes:
I disagree with ANY raised vector instruction requirement, considering that:
- it would make Fedora incompatible with some hardware out there,
That's already so for hardware which is at least of similar age to SSE2-only x86_64, i.e. POWER7; my build logs show -mcpu=power8.
- the performance increase to be had is marginal, given that we are mostly talking about code written in C or C++ without even compiler vectorization (-ftree-vectorize) turned on,
I forget the details, but libxsmm is something that depends on an instruction introduced with SSE3, and is a good example of portable performance engineering over a wide range of (x86_64) processors.
- there are already mechanisms for runtime feature detection, which are already widely used in those few packages that can actually benefit from the vector instructions (because they are performance-sensitive and because they have handwritten assembly or vector intrinsics code),
I disagree that dynamic dispatch is sufficiently widely used in scientific code (probably can't be with Fortran). Also recent GCC can provide decent performance for specific targets without target-specific programming. BLIS' portable C version DGEMM got around 60%(?) the speed of the hand-tuned implementation built for haswell, as reported somewhere in the BLIS issues. For people who don't know, DEGMM (generalized matrix-matrix multiplication) is as SIMD-intensive as it gets, with high enough floating point intensity relative to memory access for large enough dimensions; non-matrix-matrix linear algebra typically doesn't if it doesn't fit in cache.
I disagree with ANY raised vector instruction requirement, considering that:
- it would make Fedora incompatible with some hardware out there,
That's already so for hardware which is at least of similar age to SSE2-only x86_64, i.e. POWER7; my build logs show -mcpu=power8.
For ppc64le, which is the only Power64 architecture Fedora now supports, the first HW that was supported running Linux on little endian was Power8 HW so that is exactly as expected. As opposed to ppc64 which is big endian which until it was retired still supported what ever generation was the Power Mac G5.
- the performance increase to be had is marginal, given that we are mostly talking about code written in C or C++ without even compiler vectorization (-ftree-vectorize) turned on,
I forget the details, but libxsmm is something that depends on an instruction introduced with SSE3, and is a good example of portable performance engineering over a wide range of (x86_64) processors.
- there are already mechanisms for runtime feature detection, which are already widely used in those few packages that can actually benefit from the vector instructions (because they are performance-sensitive and because they have handwritten assembly or vector intrinsics code),
I disagree that dynamic dispatch is sufficiently widely used in scientific code (probably can't be with Fortran). Also recent GCC can provide decent performance for specific targets without target-specific programming. BLIS' portable C version DGEMM got around 60%(?) the speed of the hand-tuned implementation built for haswell, as reported somewhere in the BLIS issues. For people who don't know, DEGMM (generalized matrix-matrix multiplication) is as SIMD-intensive as it gets, with high enough floating point intensity relative to memory access for large enough dimensions; non-matrix-matrix linear algebra typically doesn't if it doesn't fit in cache. _______________________________________________ devel mailing list -- To unsubscribe send an email to Fedora Code of Conduct: List Guidelines: List Archives:
Dave Love wrote:
I forget the details, but libxsmm is something that depends on an instruction introduced with SSE3, and is a good example of portable performance engineering over a wide range of (x86_64) processors.
According to the documentation, libxsmm actually also supports a generic/SSE2 code path (LIBXSMM_X86_GENERIC), with runtime detection. So I do not see a valid reason to require SSE3 in libxsmm.
Kevin Kofler