Hi,
Earlier this year on this list several of us were discussing the possibility of adjustments being made to the architecture optimizations of the Fedora Core packages. One of the suggestions thrown around that had zero opposition was changing the base architecture from i386 to i486. Why has this not been done? This is especially poignant since Fedora Core 1 finally broke i386 compatibility with its kernel.
I'm sending a related note on the Pentium 4.
Peace, William
On Sun, 28 Nov 2004, William M. Quarles wrote:
Earlier this year on this list several of us were discussing the possibility of adjustments being made to the architecture optimizations of the Fedora Core packages. One of the suggestions thrown around that had zero opposition was changing the base architecture from i386 to i486. Why has this not been done? This is especially poignant since Fedora Core 1 finally broke i386 compatibility with its kernel.
Lots of pain for little gain (especially if we're just moving to i486 as the base arch).
-- Elliot
Elliot Lee wrote:
On Sun, 28 Nov 2004, William M. Quarles wrote:
Earlier this year on this list several of us were discussing the possibility of adjustments being made to the architecture optimizations of the Fedora Core packages. One of the suggestions thrown around that had zero opposition was changing the base architecture from i386 to i486. Why has this not been done? This is especially poignant since Fedora Core 1 finally broke i386 compatibility with its kernel.
Lots of pain for little gain (especially if we're just moving to i486 as the base arch).
What kind of pain are we talking about here?
Peace, William
On Sun, Nov 28, 2004 at 05:54:43PM -0500, Jeff Spaleta wrote:
On Sun, 28 Nov 2004 17:47:35 -0500, William M. Quarles
What kind of pain are we talking about here?
just as importantly... what kind of gain do you expect to see? Since the issue raised was gain to pain.... is there really any useful gain in moving to i486 as the base arch?
Indeed:
http://www.ee.oulu.fi/~pp/faqentry
(submitted to the fedora faq some time ago, didn't hear anything back and it's potentially a bit too complex for that context).
For the instruction set bits, Chapter 17 of http://www.intel.com/design/pentiumii/manuals/243192.htm has details on the instruction set differences between the different x86 iterations.
Just some ballpark figures on how often gcc gets to use these instructions, and this is glibc which might have used these in handcoded assembly: (objdump --disassemble /lib/i686/libc.so.6 | grep <instruction> |wc -l )
cpmxchg:7 xadd: 8 bswap: 136 cmov: 1099 (and this already limits us to non-VIA C3 i686) Total lines: 297992
Doesn't take into account how often this code is called and how much slower the i386 instruction set alternative is in reality. My guess is "unmeasurable".
Someone feel like doing an experiment on some real code, glibc isn't really representative of typical code? Just compile some large package with different -march= options (keeping mtune at pentium4) and see what non-i386 instructions it actually generates. Bonus points for listing the functions and showing whether they are in the oprofile/gprof top #10 or not.
Pekka Pietikainen wrote:
On Sun, Nov 28, 2004 at 05:54:43PM -0500, Jeff Spaleta wrote:
On Sun, 28 Nov 2004 17:47:35 -0500, William M. Quarles
What kind of pain are we talking about here?
just as importantly... what kind of gain do you expect to see? Since the issue raised was gain to pain.... is there really any useful gain in moving to i486 as the base arch?
Indeed:
http://www.ee.oulu.fi/~pp/faqentry
(submitted to the fedora faq some time ago, didn't hear anything back and it's potentially a bit too complex for that context).
For the instruction set bits, Chapter 17 of http://www.intel.com/design/pentiumii/manuals/243192.htm has details on the instruction set differences between the different x86 iterations.
Just some ballpark figures on how often gcc gets to use these instructions, and this is glibc which might have used these in handcoded assembly: (objdump --disassemble /lib/i686/libc.so.6 | grep <instruction> |wc -l )
cpmxchg:7 xadd: 8 bswap: 136 cmov: 1099 (and this already limits us to non-VIA C3 i686) Total lines: 297992
Doesn't take into account how often this code is called and how much slower the i386 instruction set alternative is in reality. My guess is "unmeasurable".
Someone feel like doing an experiment on some real code, glibc isn't really representative of typical code? Just compile some large package with different -march= options (keeping mtune at pentium4) and see what non-i386 instructions it actually generates. Bonus points for listing the functions and showing whether they are in the oprofile/gprof top #10 or not.
I would, but are there any free ways of doing benchmarks? Not to mention I'm not really much of a programmer, so I don't know what oprofile/gprof are.
BTW, I think that you mean -mcpu, not -mtune, as long as we are talking about ix86 processors.
---- Peace, William
On Sun, 2004-11-28 at 19:46 -0500, William M. Quarles wrote:
I would, but are there any free ways of doing benchmarks? Not to mention I'm not really much of a programmer, so I don't know what oprofile/gprof are.
for what it's worth... cmov isn't faster on newer (pM/pIV/amd64 level) CPUs than the open coded conditional jump anymore.... so there no longer really is a reason to use cmov-only code.
On Mon, 2004-11-29 at 09:11 +0100, Arjan van de Ven wrote:
for what it's worth... cmov isn't faster on newer (pM/pIV/amd64 level) CPUs than the open coded conditional jump anymore.... so there no longer really is a reason to use cmov-only code.
CMOVcc will use less space in the instruction cache than the Jcc/MOV pair, though.
On Mon, Nov 29, 2004 at 01:02:46AM -0800, Nicholas Miell wrote:
On Mon, 2004-11-29 at 09:11 +0100, Arjan van de Ven wrote:
for what it's worth... cmov isn't faster on newer (pM/pIV/amd64 level) CPUs than the open coded conditional jump anymore.... so there no longer really is a reason to use cmov-only code.
CMOVcc will use less space in the instruction cache than the Jcc/MOV pair, though.
only sometimes.... since cmov doens't work on all register/memory combinations extra code might be needed to glue that together...
.... and we're suddenly talking about 0.01% performance ;)
On Mon, 2004-11-29 at 10:16 +0100, Arjan van de Ven wrote:
On Mon, Nov 29, 2004 at 01:02:46AM -0800, Nicholas Miell wrote:
CMOVcc will use less space in the instruction cache than the Jcc/MOV pair, though.
only sometimes.... since cmov doens't work on all register/memory combinations extra code might be needed to glue that together...
.... and we're suddenly talking about 0.01% performance ;)
Well, yeah. :)
There's also branch prediction and decode bandwidth issues that I didn't bother to mention.
But, if you're going to optimize for i686 or better for other reasons, there's no reason not to use CMOVcc instead of Jcc/MOV, where possible. Not that you'll ever notice the difference...
On Mon, Nov 29, 2004 at 01:30:11AM -0800, Nicholas Miell wrote:
On Mon, 2004-11-29 at 10:16 +0100, Arjan van de Ven wrote:
On Mon, Nov 29, 2004 at 01:02:46AM -0800, Nicholas Miell wrote:
CMOVcc will use less space in the instruction cache than the Jcc/MOV pair, though.
only sometimes.... since cmov doens't work on all register/memory combinations extra code might be needed to glue that together...
.... and we're suddenly talking about 0.01% performance ;)
Well, yeah. :)
There's also branch prediction and decode bandwidth issues that I didn't bother to mention.
branch prediction on p4 actually takes hints from the compiler now ;)
But, if you're going to optimize for i686 or better for other reasons,
we *already* optimize for i686 even in the i386 rpms
there's no reason not to use CMOVcc instead of Jcc/MOV, where possible.
there is a reason... it keeps running on older hw and on via C3's :)
On Mon, 2004-11-29 at 10:32 +0100, Arjan van de Ven wrote:
branch prediction on p4 actually takes hints from the compiler now ;)
That turned out to be useless, Intel no longer recommends it (or even documents it, IIRC). I'm not sure if gcc still generates the hints when targeting P4s.
we *already* optimize for i686 even in the i386 rpms
I meant in terms of using i686 specific instructions (i.e. SSE, SSE2, etc.), not i686 instruction scheduling.
Nicholas Miell wrote:
On Mon, 2004-11-29 at 10:32 +0100, Arjan van de Ven wrote:
branch prediction on p4 actually takes hints from the compiler now ;)
That turned out to be useless, Intel no longer recommends it (or even documents it, IIRC). I'm not sure if gcc still generates the hints when targeting P4s.
we *already* optimize for i686 even in the i386 rpms
I meant in terms of using i686 specific instructions (i.e. SSE, SSE2, etc.), not i686 instruction scheduling.
SSE is not an i686 instruction set, it's a Pentium-III-and-better instruction set. SSE2 is Pentium-4-and-better. Other 9686 processors are not compatible with those instruction sets. I hope that you don't run into further problems over this when building packages.
---- Peace, William
Arjan van de Ven wrote:
On Mon, Nov 29, 2004 at 01:30:11AM -0800, Nicholas Miell wrote:
<snip>
But, if you're going to optimize for i686 or better for other reasons,
we *already* optimize for i686 even in the i386 rpms
<snip>
Not in all of them. Mozilla's spec file does not include $RPM_OPT_FLAGS in any of the CFLAGS-type variables. I'm sure that there are several other packages like that. Somebody should fix that.
---- Peace, William
On Mon, Nov 29, 2004 at 01:30:11AM -0800, Nicholas Miell wrote:
On Mon, 2004-11-29 at 10:16 +0100, Arjan van de Ven wrote:
On Mon, Nov 29, 2004 at 01:02:46AM -0800, Nicholas Miell wrote:
CMOVcc will use less space in the instruction cache than the Jcc/MOV pair, though.
only sometimes.... since cmov doens't work on all register/memory combinations extra code might be needed to glue that together...
.... and we're suddenly talking about 0.01% performance ;)
Well, yeah. :)
There's also branch prediction and decode bandwidth issues that I didn't bother to mention.
Although P4 have the ds/cs segment prefixes for static branch prediction, the non-preproduction chips actually don't use it, so it only makes code bigger. http://gcc.gnu.org/ml/gcc-patches/2004-07/msg02200.html
But, if you're going to optimize for i686 or better for other reasons, there's no reason not to use CMOVcc instead of Jcc/MOV, where possible.
Well, there is a reason aside from some CPUs not having those insns at all: on some recent Intel CPUs CMOVcc is actually slower than Jcc/MOV.
Jakub
On Mon, Nov 29, 2004 at 01:30:11AM -0800, Nicholas Miell wrote:
There's also branch prediction and decode bandwidth issues that I didn't bother to mention.
The performance is the same on the newer CPU's including branch prediction and all the other stuff. You have to predict for a CMOV just like a jump.
Arjan van de Ven wrote:
On Sun, 2004-11-28 at 19:46 -0500, William M. Quarles wrote:
I would, but are there any free ways of doing benchmarks? Not to mention I'm not really much of a programmer, so I don't know what oprofile/gprof are.
for what it's worth... cmov isn't faster on newer (pM/pIV/amd64 level) CPUs than the open coded conditional jump anymore.... so there no longer really is a reason to use cmov-only code.
More terminology that I am not aware of... cmov? I know that I'm a novice about development, you don't have to further proove it to me.
---- Peace, William
On Tue, 2004-11-30 at 01:33 -0500, William M. Quarles wrote:
Arjan van de Ven wrote:
On Sun, 2004-11-28 at 19:46 -0500, William M. Quarles wrote:
I would, but are there any free ways of doing benchmarks? Not to mention I'm not really much of a programmer, so I don't know what oprofile/gprof are.
for what it's worth... cmov isn't faster on newer (pM/pIV/amd64 level) CPUs than the open coded conditional jump anymore.... so there no longer really is a reason to use cmov-only code.
More terminology that I am not aware of... cmov?
cmov is a conditional move instruction on x86. Basically a C code construct like this
if (some_condition == 5) A = B;
normally gets translated into (pseudo asm)
compare some_condition, 5 jump_if_not_equal label; move B into A label: ... the rest of the program
the "jump_if_not_equal" instruction is a conditional instruction, which means that the cpu cannot look ahead and decide what the next instruction is, until the actual compare is finished. With the current deeply pipelined cpus that is sort of a problem (the solution is that the cpu makes a guess what it'll be based on past decisions for this line of code, and if wrong, it backtracks).
Now with cmov, the code looks like
compare some_condition, 5 move_if_equal B into A ... the rest of the program
and in theory there is no question about which instructions will be executed when, so the "cost" of having an empty pipeline until the decision is known wouldn't be there. And that's mostly true for PPro/PII level CPUS.
However, newer ones (both AMD and Intel) operate in such a way that the advantage of this no longer is an advantage, they need to know the result anyway in effect (and also make a guess about the "if" result)
I know that I'm a novice about development, you don't have to further proove it to me.
I absolutely don't mean it in that way.
Arjan van de Ven wrote:
On Tue, 2004-11-30 at 01:33 -0500, William M. Quarles wrote:
Arjan van de Ven wrote:
On Sun, 2004-11-28 at 19:46 -0500, William M. Quarles wrote:
I would, but are there any free ways of doing benchmarks? Not to mention I'm not really much of a programmer, so I don't know what oprofile/gprof are.
for what it's worth... cmov isn't faster on newer (pM/pIV/amd64 level) CPUs than the open coded conditional jump anymore.... so there no longer really is a reason to use cmov-only code.
More terminology that I am not aware of... cmov?
cmov is a conditional move instruction on x86. Basically a C code construct like this
if (some_condition == 5) A = B;
normally gets translated into (pseudo asm)
compare some_condition, 5 jump_if_not_equal label; move B into A label: ... the rest of the program
the "jump_if_not_equal" instruction is a conditional instruction, which means that the cpu cannot look ahead and decide what the next instruction is, until the actual compare is finished. With the current deeply pipelined cpus that is sort of a problem (the solution is that the cpu makes a guess what it'll be based on past decisions for this line of code, and if wrong, it backtracks).
Now with cmov, the code looks like
compare some_condition, 5 move_if_equal B into A ... the rest of the program
and in theory there is no question about which instructions will be executed when, so the "cost" of having an empty pipeline until the decision is known wouldn't be there. And that's mostly true for PPro/PII level CPUS.
Thanks, that cleared thins up a lot.
As someone who was still using a Pentium II a year ago, and is now only using a Pentium III computer (which would still have this benefit since the only difference between those and the PIII is an additional yet in this case irrelevant instruction set), I'd really like to see that performance difference. Now were you saying that this cmov is part of i486 or i686? I'd like to try rebuilding everything to see if it makes a difference.
Would I have to worry about any trademark problems?
Is there a script that Red Hat uses to automate the build process? Does it use SRPMS or do the sources and specs have to be "preinstalled?"
However, newer ones (both AMD and Intel) operate in such a way that the advantage of this no longer is an advantage, they need to know the result anyway in effect (and also make a guess about the "if" result)
Is the advantage still there, or is it just no longer a significant advantage? Or is it a cost? Unless it's a cost, I don't see what's wrong with putting in the advantage, unless it's a lot more work on your part. Is it more work?
I know that I'm a novice about development, you don't have to further proove it to me.
I absolutely don't mean it in that way.
I guess that I should have put a smiley face next to that one. :-)
---- Peace, William
On Tue, Nov 30, 2004 at 12:05:51PM -0500, William M. Quarles wrote:
this case irrelevant instruction set), I'd really like to see that performance difference. Now were you saying that this cmov is part of i486 or i686? I'd like to try rebuilding everything to see if it makes a difference.
cmov is an optional instruction for i686 - it turns up on the PPro, PII, PIII, PIV and Pentium M at least. By PIV its as slow or slower than not using it.
Is the advantage still there, or is it just no longer a significant advantage? Or is it a cost? Unless it's a cost, I don't see what's wrong with putting in the advantage, unless it's a lot more work on your part. Is it more work?
Its a cost because some processors we support today lack this instruction but are otherwise 686.
Alan
On Tue, 2004-11-30 at 12:05 -0500, William M. Quarles wrote:
performance difference. Now were you saying that this cmov is part of i486 or i686?
i686
Would I have to worry about any trademark problems?
?????
Is there a script that Red Hat uses to automate the build process? Does it use SRPMS or do the sources and specs have to be "preinstalled?"
srpms; what you can do yourself is install most of the rpms (so that all buildrequires are met), then get the src.rpm's in one dir and do
for i in *.src.rpm ; do rpmbuild --rebuild --target i686 $i ; done
for a first cut approximation... takes 24 to 36 hours though.
However, newer ones (both AMD and Intel) operate in such a way that the advantage of this no longer is an advantage, they need to know the result anyway in effect (and also make a guess about the "if" result)
Is the advantage still there
not anymore
, or is it just no longer a significant advantage? Or is it a cost? Unless it's a cost, I don't see what's
on newest P4 cores it seems to be a cost even (given that cmov can't do all the addressing combinations normal mov can, gcc may have to add some slight additional glue code which is a cost)
wrong with putting in the advantage, unless it's a lot more work on your part. Is it more work?
it means that you don't run on cpus without cmov...
Arjan van de Ven wrote:
On Tue, 2004-11-30 at 12:05 -0500, William M. Quarles wrote:
performance difference. Now were you saying that this cmov is part of i486 or i686?
i686
What differences between i386 and i486 would become a benefit if the base architecture was changed from i386 to i486?
---- Thanks, William
On Tue, Nov 30, 2004 at 09:56:23PM -0500, William M. Quarles wrote:
What differences between i386 and i486 would become a benefit if the base architecture was changed from i386 to i486?
The only thing 486 might give is the ability to consign the old linuxthread stuff to the dustbin of back compatibility.
Alan Cox wrote:
On Tue, Nov 30, 2004 at 09:56:23PM -0500, William M. Quarles wrote:
What differences between i386 and i486 would become a benefit if the base architecture was changed from i386 to i486?
The only thing 486 might give is the ability to consign the old linuxthread stuff to the dustbin of back compatibility.
I'm guessing that's supposed to be a joke?
---- Peace, William
On Tue, 2004-11-30 at 23:06 -0500, William M. Quarles wrote:
Alan Cox wrote:
On Tue, Nov 30, 2004 at 09:56:23PM -0500, William M. Quarles wrote:
What differences between i386 and i486 would become a benefit if the base architecture was changed from i386 to i486?
The only thing 486 might give is the ability to consign the old linuxthread stuff to the dustbin of back compatibility.
I'm guessing that's supposed to be a joke?
No, the x86 NPTL implementation requires the CMPXCHG instruction, which was introduced in the i486.
On Tue, Nov 30, 2004 at 11:06:21PM -0500, William M. Quarles wrote:
The only thing 486 might give is the ability to consign the old linuxthread stuff to the dustbin of back compatibility.
I'm guessing that's supposed to be a joke?
No its quite serious. 486 adds a lot of the SMP instructions like XADD that good threading in user space wants.
On Wed, 2004-12-01 at 08:12 -0500, Alan Cox wrote:
On Tue, Nov 30, 2004 at 11:06:21PM -0500, William M. Quarles wrote:
The only thing 486 might give is the ability to consign the old linuxthread stuff to the dustbin of back compatibility.
I'm guessing that's supposed to be a joke?
No its quite serious. 486 adds a lot of the SMP instructions like XADD that good threading in user space wants.
So actually that could be a good reason why we should switch over to i486 as base arch?
Arjan van de Ven wrote:
So actually that could be a good reason why we should switch over to i486 as base arch?
for the glibc package... sure ;)
for all other packages... none will use the new instructions afaics
Absolutely agreed, more pain that gain for s/i386/i486/ everywhere.
However -- as always -- the problem is one of perception, and '*.i386.rpm' signifies something quite different than '*.i486.rpm' no matter what instructions are actually used within the package.
Perhaps we need to change the perception in order to lose linux threads forever.
Again, I know quite well more pain than gain, fixing all the bleeping scripts that have embedded i386/* and/or *.i386.rpm, to know 'i486' is gonna break a whole lot of process.
But perhaps it's time to signify Something Else Instead.
73 de Jeff
On Wed, Dec 01, 2004 at 03:19:33PM +0100, Arjan van de Ven wrote:
On Wed, Dec 01, 2004 at 09:18:08AM -0500, Jeff Johnson wrote:
scripts that have embedded i386/* and/or *.i386.rpm, to know 'i486' is gonna break a whole lot of process.
yeah just make the i686 rpms be i386 ones with a Require: cpu(cmov)
Ignoring the severe technical problems that it would create (It'd break _everything_ and be way too much work to implement), how about .ia32.rpm for everything that is .i386 now and use i?86.rpm for stuff that requires a specific cpu?
Too much work for just ending some FAQ's for no technical benefit, certainly ;)
Pekka Pietikainen wrote:
On Wed, Dec 01, 2004 at 03:19:33PM +0100, Arjan van de Ven wrote:
On Wed, Dec 01, 2004 at 09:18:08AM -0500, Jeff Johnson wrote:
scripts that have embedded i386/* and/or *.i386.rpm, to know 'i486' is gonna break a whole lot of process.
yeah just make the i686 rpms be i386 ones with a Require: cpu(cmov)
Ignoring the severe technical problems that it would create (It'd break _everything_ and be way too much work to implement), how about .ia32.rpm for everything that is .i386 now and use i?86.rpm for stuff that requires a specific cpu?
No, Requires: cpu(cmov) breaks nothing, it's just another strcmp to rpm. Your expectations are what is confusing you. Yes, there would be a lot of confusion for a short period of time, but as Arjan has pointed out, there are only a handful of packages that need to carry the dependency.
s/i386/ia32/ is marketing hype and fluff, much like s/x86_64/amd64/, vendors need to signify Newer! better! Bestest! somehow.
Too much work for just ending some FAQ's for no technical benefit, certainly ;)
The real reason for doing Requires: cpu(cmov) is that it identifies the reason for the dependency quite clearly, narrowly, and objectively, which will be easier to support, maintain, and adjusts expectations to what is needed, rather than endless learned discussions of what an 'i686' actually means these days.
73 de Jeff
yeah just make the i686 rpms be i386 ones with a Require: cpu(cmov)
Ignoring the severe technical problems that it would create (It'd break _everything_ and be way too much work to implement), how about .ia32.rpm for everything that is .i386 now and use i?86.rpm for stuff that requires a specific cpu?
No, Requires: cpu(cmov) breaks nothing, it's just another strcmp to rpm. Your expectations are what is confusing you. Yes, there would be a lot of confusion for a short period of time, but as Arjan has pointed out, there are only a handful of packages that need to carry the dependency.
RPM isn't the only thing that touches files though. Lots of stuff doesn't expect that it'll find 2 i386 packages, so it'll dump them to the same filename. I know it sounds silly, but having the arch difference means it's easy to generate non-colliding filenames.
That's not to say we shouldn't do the "Requires: cpu(cmov)" as _well_, since it could obviously be helpful in many situations.
Peter Jones wrote:
yeah just make the i686 rpms be i386 ones with a Require: cpu(cmov)
Ignoring the severe technical problems that it would create (It'd break _everything_ and be way too much work to implement), how about .ia32.rpm for everything that is .i386 now and use i?86.rpm for stuff that requires a specific cpu?
No, Requires: cpu(cmov) breaks nothing, it's just another strcmp to rpm. Your expectations are what is confusing you. Yes, there would be a lot of confusion for a short period of time, but as Arjan has pointed out, there are only a handful of packages that need to carry the dependency.
RPM isn't the only thing that touches files though. Lots of stuff doesn't expect that it'll find 2 i386 packages, so it'll dump them to the same filename. I know it sounds silly, but having the arch difference means it's easy to generate non-colliding filenames.
That's not to say we shouldn't do the "Requires: cpu(cmov)" as _well_, since it could obviously be helpful in many situations.
OK, to summarize:
a) There's a whole lot of pain and not much gain messing with package file names (and file paths and scripts and ... ) b) The dependency Requires: cpu(cmov) (or equivalent token) might (*will* imho) be useful identifying packages that actually use, say, cmov. (Note: there's more than cmov that needs marking, generalizing the cpu(...) name space is quite straightforward.) c) Users want a clear call on what package file name to install, as some *.i386.rpm will not run on arch i386, very confusing. d) linuxthreads needs to die! die! die! (but that's just me ;-)
Name your poison (if any) please.
73 de Jeff
On Wed, 2004-12-01 at 12:07 -0500, Jeff Johnson wrote:
OK, to summarize:
a) There's a whole lot of pain and not much gain messing withpackage file names (and file paths and scripts and ... ) b) The dependency Requires: cpu(cmov) (or equivalent token) might (*will* imho) be useful identifying packages that actually use, say, cmov. (Note: there's more than cmov that needs marking, generalizing the cpu(...) name space is quite straightforward.) c) Users want a clear call on what package file name to install, as some *.i386.rpm will not run on arch i386, very confusing. d) linuxthreads needs to die! die! die! (but that's just me ;-)
Name your poison (if any) please.
What's going to Provide: cpu(cmov)?
Nicholas Miell wrote:
On Wed, 2004-12-01 at 12:07 -0500, Jeff Johnson wrote:
OK, to summarize:
a) There's a whole lot of pain and not much gain messing with package file names (and file paths and scripts and ... ) b) The dependency Requires: cpu(cmov) (or equivalent token) might (*will* imho) be useful identifying packages that actually use, say, cmov. (Note: there's more than cmov that needs marking, generalizing the cpu(...) name space is quite straightforward.) c) Users want a clear call on what package file name to install, as some *.i386.rpm will not run on arch i386, very confusing. d) linuxthreads needs to die! die! die! (but that's just me ;-)
Name your poison (if any) please.
What's going to Provide: cpu(cmov)?
Can be done by simple string in per-arch kernel package right now, although that doesn't really solve the problem adequately, as dependencies in packages are static content. But even a static dependency would be as good as, say Provides: kernel-abi = 2.6
Prolly the strongest mechanism is to attach a run-time probe dependency to the "cpu(...)" name space and parse /proc/cpuinfo for the relevant info. That mostly works, but will have problems in chroot's w/o /proc mounted, and will be kinda weird if/when, say, the mobo or disk is moved amongst machines, to mention just 2 possible problems off the top of my head.
I suspect those deficiencies can be lived with, and are no worse than existing arch based tests.
It's not like a missing dependency is gonna explode somebody's monitor or anything really seriously deadly or risky ... NPTL was far far riskier than what I am humbly proposing ;-)
73 de Jeff
On Wed, Dec 01, 2004 at 11:10:04PM -0500, Jeff Johnson wrote:
Prolly the strongest mechanism is to attach a run-time probe dependency to the "cpu(...)" name space and parse /proc/cpuinfo for the relevant info. That mostly works, but will have problems in chroot's w/o /proc mounted, and will be kinda weird if/when, say, the mobo or disk is moved amongst machines, to mention just 2 possible problems off the top of my head.
I suspect those deficiencies can be lived with, and are no worse than existing arch based tests.
Would there be any compilications if the thing was generalized and you could do Requires-Return-Code: /usr/lib/rpm/check-x86-cpu-flags cmov (or whatever the syntax would be, basic idea is that there's some external thing that exits with 0 or 1).
In any case, such a thing should be easy to override, even when using a frontend like yum.
Pekka Pietikainen wrote:
On Wed, Dec 01, 2004 at 11:10:04PM -0500, Jeff Johnson wrote:
Prolly the strongest mechanism is to attach a run-time probe dependency to the "cpu(...)" name space and parse /proc/cpuinfo for the relevant info. That mostly works, but will have problems in chroot's w/o /proc mounted, and will be kinda weird if/when, say, the mobo or disk is moved amongst machines, to mention just 2 possible problems off the top of my head.
I suspect those deficiencies can be lived with, and are no worse than existing arch based tests.
Would there be any compilications if the thing was generalized and you could do Requires-Return-Code: /usr/lib/rpm/check-x86-cpu-flags cmov (or whatever the syntax would be, basic idea is that there's some external thing that exits with 0 or 1).
Well, not a script please, scripts break way too often to be reliable.
But yes, return codes like 0 == condition is TRUE 1 == condition is FALSE from a function that is passed the {N,EVR,FLAGS} dependency triple which is dispatched iff the 'cpu(...)' name space wrapper is detected.
Kinda like 'rpmlib(...)' tracking dependencies, which are also a run-time probe dependency wrapped in a name space.
In any case, such a thing should be easy to override, even when using a frontend like yum.
Yep.
73 de Jeff
[ Stuff about what's going to 'Provide: cpu(cmov)' omitted. ]
Ok, now that that's settled, how are packages going to get the "Requires: cpu(cmov)" dependency?
And why can't we just say that i686 packages all require a i686 variant that provides CMOVcc?
Sure, there are i686 variants that don't, but what's stopping them from using the generic i386 version (which is optimized for i686, anyway)?
Nicholas Miell wrote:
[ Stuff about what's going to 'Provide: cpu(cmov)' omitted. ]
Ok, now that that's settled, how are packages going to get the "Requires: cpu(cmov)" dependency?
Again, there's nothing (well you are gonna need a matching Provides: somehow) stopping anyone from adding Requires: cpu(cmov) to a package spec file, presumably because cmov is known to be used within package executable.
The process can be done automagically as well, basically disassembling every elf file and grepping for known i686 specific opcodes. That mechanism is crude enough that perhaps some compiler geek would suggest a better mechanism almost instantly ;-)
And why can't we just say that i686 packages all require a i686 variant that provides CMOVcc?
Because that is exactly where rpm is right now, with murky and implicit assumptions about what is provided and what is not, and no clear way to identify without the artifact of inventing bogus i686 arch names (kinda like ppc* is today, there's way too many ppc* arches, and I certainly have no idea what distinguishing properties each has, other than different letters after "ppc")
Sure, there are i686 variants that don't, but what's stopping them from using the generic i386 version (which is optimized for i686, anyway)?
And that is the lowest common denominator package naming that is currently being used in FC4, some packages have ".i386.rpm" suffixes, yet will not run on hw arch i386, causing user confision about every 3 months or so, and we discuss this topic Yet Again.
73 de Jeff
On Thu, 2004-12-02 at 20:27 -0500, Jeff Johnson wrote:
Nicholas Miell wrote:
Ok, now that that's settled, how are packages going to get the "Requires: cpu(cmov)" dependency?
Again, there's nothing (well you are gonna need a matching Provides: somehow) stopping anyone from adding Requires: cpu(cmov) to a package spec file, presumably because cmov is known to be used within package executable.
The process can be done automagically as well, basically disassembling every elf file and grepping for known i686 specific opcodes. That mechanism is crude enough that perhaps some compiler geek would suggest a better mechanism almost instantly ;-)
Both options appear to be fragile and error prone.
Requiring the manual addition of a dependency to the spec means lots of package authors are going to forget to do it, making it effectively useless as an indicator of what packages actually require (especially because i686 packages already require CMOVcc).
Automagic dependencies are going to screw up for any piece of software that selects different CPU optimized routines at run-time, some of which use CMOVcc.
And why can't we just say that i686 packages all require a i686 variant that provides CMOVcc?
Because that is exactly where rpm is right now, with murky and implicit assumptions about what is provided and what is not, and no clear way to identify without the artifact of inventing bogus i686 arch names (kinda like ppc* is today, there's way too many ppc* arches, and I certainly have no idea what distinguishing properties each has, other than different letters after "ppc")
Well, instead of weird Requires that have to be manually inserted by the spec author and Provides that are mysteriously provided by nothing, why not just document exactly what the different RPM architectures mean.
This isn't as hard as you may initially think -- just say that each target requires whatever gcc requires when optimizing for that target with the default RPM build flags.
Sure, there are i686 variants that don't, but what's stopping them from using the generic i386 version (which is optimized for i686, anyway)?
And that is the lowest common denominator package naming that is currently being used in FC4, some packages have ".i386.rpm" suffixes, yet will not run on hw arch i386, causing user confision about every 3 months or so, and we discuss this topic Yet Again.
Well, that's a bug, those packages should be fixed.
And if you get tired of user confusion, make a FAQ. Saying "It's in the FAQ, dimwit." or something to that effect can be rather satisfying.
Nicholas Miell wrote:
On Thu, 2004-12-02 at 20:27 -0500, Jeff Johnson wrote:
Nicholas Miell wrote:
Ok, now that that's settled, how are packages going to get the "Requires: cpu(cmov)" dependency?
Again, there's nothing (well you are gonna need a matching Provides: somehow) stopping anyone from adding Requires: cpu(cmov) to a package spec file, presumably because cmov is known to be used within package executable.
The process can be done automagically as well, basically disassembling every elf file and grepping for known i686 specific opcodes. That mechanism is crude enough that perhaps some compiler geek would suggest a better mechanism almost instantly ;-)
Both options appear to be fragile and error prone.
Requiring the manual addition of a dependency to the spec means lots of package authors are going to forget to do it, making it effectively useless as an indicator of what packages actually require (especially because i686 packages already require CMOVcc).
Automagic dependencies are going to screw up for any piece of software that selects different CPU optimized routines at run-time, some of which use CMOVcc.
And why can't we just say that i686 packages all require a i686 variant that provides CMOVcc?
Because that is exactly where rpm is right now, with murky and implicit assumptions about what is provided and what is not, and no clear way to identify without the artifact of inventing bogus i686 arch names (kinda like ppc* is today, there's way too many ppc* arches, and I certainly have no idea what distinguishing properties each has, other than different letters after "ppc")
Well, instead of weird Requires that have to be manually inserted by the spec author and Provides that are mysteriously provided by nothing, why not just document exactly what the different RPM architectures mean.
The package maintainers for the handful of packages that use, say, cmov are quite knowledgeable and competent. Requiring manual entry is also a viable solution to the general problem because a missing Requires: changes nothing that is not already happening in practice, that, indeed, there are implict and undocumented dependencies in existing packages.
But I'm all in favor of automation, do not think that I am suggesting manual package edits as the best possible solution.
The rpm implementation does not control what strings are used to identify arch in packages. For example, PLD is using "pentium3", "pentium4", and "amd64" while Red Hat is using "i586", "i686" and "x86_64" with essentially the same meanings, and all of those strings are being carried in default rpm configuration.
Documenting what is meant by all those strings that are used differently is a vendor or distro, not an rpm, problem. The rpm implementation supports only the strcmp mechanism, not the deeper semantic meaning of arch.
And the "weird Requires" is actually an attempt to rationalize this mess, as arch in rpm is entirely the wrong name space to tag packages with, there are way too many problems to do anything other than just abandon arch as a meaningful objective identifier of rpm package content imho.
This isn't as hard as you may initially think -- just say that each target requires whatever gcc requires when optimizing for that target with the default RPM build flags.
That has already been attempted, the configured optflags (which is all that rpm has ever attempted to control for when building a package) have been in packages for years.
That of course doesn't work at all when packages do not use $RPM_OPT_FLAGS.
Attempting to reason from what gcc uses also does not "work" when there are, say, several different sets of compilation flags used within a single package, so there is no one "target requires whatever gcc requires", the mapping is many files within the package compiled by gcc onto a single package arch identifier.
Sure, there are i686 variants that don't, but what's stopping them from using the generic i386 version (which is optimized for i686, anyway)?
And that is the lowest common denominator package naming that is currently being used in FC4, some packages have ".i386.rpm" suffixes, yet will not run on hw arch i386, causing user confision about every 3 months or so, and we discuss this topic Yet Again.
Well, that's a bug, those packages should be fixed.
How, by changing the label from "i386" to "i486" or by changing the compilation to agree with the label?
And if you get tired of user confusion, make a FAQ. Saying "It's in the FAQ, dimwit." or something to that effect can be rather satisfying.
Easier than a FAQ is not making any change at all (i.e. leaving packages as "i386.rpm"), which appears to be where this thread is ending up, not surprisingly.
Talk to you in a couple of months, when the "i386" vs. "i486" issue will surely arise again, same Bat time, same Bat channel ;-)
73 de Jeff
The rpm implementation does not control what strings are used to identify arch in packages. For example, PLD is using "pentium3", "pentium4", and "amd64" while Red Hat is using "i586", "i686" and "x86_64" with essentially the same meanings, and all of those strings are being carried in default rpm configuration.
The pentium 3 is definatly a i686 arch... the p2 and pentium pro as well...
Kyrre Ness Sjobak wrote:
The rpm implementation does not control what strings are used to identify arch in packages. For example, PLD is using "pentium3", "pentium4", and "amd64" while Red Hat is using "i586", "i686" and "x86_64" with essentially the same meanings, and all of those strings are being carried in default rpm configuration.
The pentium 3 is definatly a i686 arch... the p2 and pentium pro as well...
Right!
But "pentium3" is most definitely not an an "i686" arch for rpm, because the strings are not identical, and because FC and RHEL do not build *.pentium3.rpm packages.
73 de Jeff
Alan Cox wrote:
On Tue, Nov 30, 2004 at 11:06:21PM -0500, William M. Quarles wrote:
The only thing 486 might give is the ability to consign the old linuxthread stuff to the dustbin of back compatibility.
I'm guessing that's supposed to be a joke?
No its quite serious. 486 adds a lot of the SMP instructions like XADD that good threading in user space wants.
True. However they could be emulated inefficiently on 386. I remember seeing a patch (I think from Andi Kleen) that added 386 suport to NPTL.
Also note that http://cobind.com/ supports 386
Pádraig.
ons, 01.12.2004 kl. 03.56 skrev William M. Quarles:
Arjan van de Ven wrote:
On Tue, 2004-11-30 at 12:05 -0500, William M. Quarles wrote:
performance difference. Now were you saying that this cmov is part of i486 or i686?
i686
What differences between i386 and i486 would become a benefit if the base architecture was changed from i386 to i486?
Or to i586? Who are running newish fedora/redhat things on anything older than a pentium 1 anyway?
Sorry if am sounding stupid... But really...
Kyrre
On Wed, 2004-12-01 at 17:48 +0100, Kyrre Ness Sjobak wrote:
ons, 01.12.2004 kl. 03.56 skrev William M. Quarles:
What differences between i386 and i486 would become a benefit if the base architecture was changed from i386 to i486?
Or to i586? Who are running newish fedora/redhat things on anything older than a pentium 1 anyway?
Sorry if am sounding stupid... But really...
There were several long threads on this in May and June of this year. Perhaps the entry point most relevant to your question is here: http://www.redhat.com/archives/fedora-devel-list/2004-June/msg00011.html
Read the whole discussion so you know which pieces information were factual and which were later corrected.
-Toshio
Pekka Pietikainen wrote:
On Sun, Nov 28, 2004 at 05:54:43PM -0500, Jeff Spaleta wrote:
On Sun, 28 Nov 2004 17:47:35 -0500, William M. Quarles
What kind of pain are we talking about here?
just as importantly... what kind of gain do you expect to see? Since the issue raised was gain to pain.... is there really any useful gain in moving to i486 as the base arch?
Indeed:
http://www.ee.oulu.fi/~pp/faqentry
(submitted to the fedora faq some time ago, didn't hear anything back and it's potentially a bit too complex for that context).
For the instruction set bits, Chapter 17 of http://www.intel.com/design/pentiumii/manuals/243192.htm has details on the instruction set differences between the different x86 iterations.
Just some ballpark figures on how often gcc gets to use these instructions, and this is glibc which might have used these in handcoded assembly: (objdump --disassemble /lib/i686/libc.so.6 | grep <instruction> |wc -l )
cpmxchg:7 xadd: 8 bswap: 136 cmov: 1099 (and this already limits us to non-VIA C3 i686) Total lines: 297992
Interesting.
However, grep does not take into account which code paths do what, I'm sure that run time statistics would be more revealing. Perhaps there might be some way to trick oprofile into revealing how often the instructions are actuall used, dunno.
Doesn't take into account how often this code is called and how much slower the i386 instruction set alternative is in reality. My guess is "unmeasurable".
What you said ;-)
Perhaps, only objective tests can reveal all.
Someone feel like doing an experiment on some real code, glibc isn't really representative of typical code? Just compile some large package with different -march= options (keeping mtune at pentium4) and see what non-i386 instructions it actually generates. Bonus points for listing the functions and showing whether they are in the oprofile/gprof top #10 or not.
All that being said, just about the only remaining impediment to claiming that FC4 is "Only i486 and above." is the "i386" string in the package file names and directory structures. Most, if not all, of the packages in RH distros have been compiled with tunings more appropriate for i486 and above for years. Which shouldn't surprise, because that's what most users are using. And it also shouldn't surprise that there are indeed instructions that have crept into various packages that preven execution on i386, rdtsc in rpm (so I don't have to stare at gettimeofday in straces) comes to mind, as there are very very few operational i386 boxen within RH these days, and hence no explicit QA checks for only i386 appropriate instructions.
Changing "i386" in package file names everywhere is a great deal of pain for almost no gain imho.
73 de Jeff
On Sun, Nov 28, 2004 at 09:36:45PM -0500, Jeff Johnson wrote:
And it also shouldn't surprise that there are indeed instructions that have crept into various packages that preven execution on i386, rdtsc in rpm (so I don't have to stare at gettimeofday in straces) comes to mind.
Hopefully you're checking the cpuid feature flags to make sure 'tsc' is there first, and falling back to get_timeofday if not present ? If not, this is horribly broken on..
- lots of 586's. Cyrix, and early AMDs iirc didn't have TSC.
- Any CPU with errata making TSC unusable. Winchip C6 was one such beast. (586), there may be others too.
- Some NUMA boxes have big problems keeping TSCs in sync, and fall back to alternative timing sources.
Come to think of it, why is rpm needing to do this anyway ?
Dave
Dave Jones wrote:
On Sun, Nov 28, 2004 at 09:36:45PM -0500, Jeff Johnson wrote:
And it also shouldn't surprise that there are indeed instructions that have crept into various packages that preven execution on i386, rdtsc in rpm (so I don't have to stare at gettimeofday in straces) comes to mind.
Hopefully you're checking the cpuid feature flags to make sure 'tsc' is there first, and falling back to get_timeofday if not present ? If not, this is horribly broken on..
- lots of 586's.
Cyrix, and early AMDs iirc didn't have TSC.
- Any CPU with errata making TSC unusable.
Winchip C6 was one such beast. (586), there may be others too.
- Some NUMA boxes have big problems keeping TSCs
in sync, and fall back to alternative timing sources.
Yep.
Come to think of it, why is rpm needing to do this anyway ?
Because I'm ask continuosly and repeatedly Why is rpm slow? And noone is willing to hear the answer Because packages and rpm features are getting fatter and fatter.
There is one remianing (and excrutaingly painful to fix) bottleneck in rpm, you know as Preparing ============ ...
Add --stats to any command, measure your own bottlenecks. But won't work on any of the platforms you mention above.
73 de Jeff
Jeff Johnson wrote:
<snip>
Changing "i386" in package file names everywhere is a great deal of pain for almost no gain imho.
What's the pain? You aren't changing the filename, you are changing what architecture you compile for. You change the RPM configuration line for i486 to -march=i486 -mcpu=i686 or pentium4, whatever you guys are using now, and run rpmbuild. The filenames would automatically come out with i486 in them, the same way that they automatically come out with i386 in them.
---- Peace, William