Folks,
I'd like to kick off a discussion about flags for ARMv7. My proposal here is that we treat v7hl as an entirely different architecture, and don't try any multi-arch kind of hacks (there isn't the established user base for Fedora ARM to justify doing any of those things at the moment).
Things I think we should consider as a minimum:
*). Little endian (obviously, but worth stating) (l) *). Cortex-A8 or higher fully compliant core(s) *). ARM VFP3 hardware floating point (h) *). ARM NEON Architecture *). Thumb2 interworking *). Your suggestion here?
I think we should build for ARM (as opposed to Thumb2) but we should support interworking with Thumb2 code through the toolchain options. We should then later consider implementing some Thumb2 optimization. It's more armv7thl, but the (t) is implied since it's ARMv7 anyway.
Several folks have begun looking at toolchain bringup based on the F-15 toolchain applied to an F-13 userspace initially. But I'd like us to discuss options/requirements for toolchains before we go too far.
Once I get some feedback, I'll be updating the wiki, along with some more F-15 goals and (hopefully) generally useful stuff.
Jon.
On Wed, May 4, 2011 at 1:46 PM, Jon Masters jonathan@jonmasters.org wrote:
Folks,
I'd like to kick off a discussion about flags for ARMv7. My proposal here is that we treat v7hl as an entirely different architecture, and don't try any multi-arch kind of hacks (there isn't the established user base for Fedora ARM to justify doing any of those things at the moment).
Things I think we should consider as a minimum:
*). Little endian (obviously, but worth stating) (l) *). Cortex-A8 or higher fully compliant core(s) *). ARM VFP3 hardware floating point (h) *). ARM NEON Architecture
Btw, if you make NEON mandatory, then we don't have to worry about the "vfp3-d16" limitation of the tegra2 / dove and therefor go the full "vfp3-d32"... (if you consider the Cortex-15 will have NEON by default, it's a good minimal requirement..)
*). Thumb2 interworking *). Your suggestion here?
I think we should build for ARM (as opposed to Thumb2) but we should support interworking with Thumb2 code through the toolchain options. We should then later consider implementing some Thumb2 optimization. It's more armv7thl, but the (t) is implied since it's ARMv7 anyway.
Several folks have begun looking at toolchain bringup based on the F-15 toolchain applied to an F-13 userspace initially. But I'd like us to discuss options/requirements for toolchains before we go too far.
Once I get some feedback, I'll be updating the wiki, along with some more F-15 goals and (hopefully) generally useful stuff.
Jon.
Regards,
On Wed, May 4, 2011 at 7:50 PM, Robert Nelson robertcnelson@gmail.com wrote:
On Wed, May 4, 2011 at 1:46 PM, Jon Masters jonathan@jonmasters.org wrote:
Folks,
I'd like to kick off a discussion about flags for ARMv7. My proposal here is that we treat v7hl as an entirely different architecture, and don't try any multi-arch kind of hacks (there isn't the established user base for Fedora ARM to justify doing any of those things at the moment).
Things I think we should consider as a minimum:
*). Little endian (obviously, but worth stating) (l) *). Cortex-A8 or higher fully compliant core(s) *). ARM VFP3 hardware floating point (h) *). ARM NEON Architecture
Btw, if you make NEON mandatory, then we don't have to worry about the "vfp3-d16" limitation of the tegra2 / dove and therefor go the full "vfp3-d32"... (if you consider the Cortex-15 will have NEON by default, it's a good minimal requirement..)
I believe NEON is run time detectable just like SSE is, so there's no need to actually explicitly compile for NEON as if the processor has the capability and the code has the ability to optimise for it (ORC, pixman, cairo come to mind) there's code paths in the code that will just use it.
In terms of dual core devices, in particular tablets, tegra2 is the currently one of the most used A9 based chipset. As NEON isn't a hard requirement of ARMv7 like hardfp is so there's nothing to say there won't be other chips that don't come with it (I believe there are others that don't have NEON too).
Peter
On Wed, 2011-05-04 at 13:50 -0500, Robert Nelson wrote:
Btw, if you make NEON mandatory, then we don't have to worry about the "vfp3-d16" limitation of the tegra2 / dove and therefor go the full "vfp3-d32"... (if you consider the Cortex-15 will have NEON by default, it's a good minimal requirement..)
This is the problem. By all accounts A15 will also be heavily optimized for Thumb2 code. So it's really do we want to focus on really great A15 support at the cost of losing a few v7 machines and perhaps not really seeing the benefit from Thumb2 on A8/A9. One thing we do have to keep in mind is that we have an opportunity here to make a total bluesky choice.
Jon.
On 05/04/2011 07:46 PM, Jon Masters wrote:
Folks,
I'd like to kick off a discussion about flags for ARMv7. My proposal here is that we treat v7hl as an entirely different architecture, and don't try any multi-arch kind of hacks (there isn't the established user base for Fedora ARM to justify doing any of those things at the moment).
Agreed, as long as armv5tel remains supported.
Things I think we should consider as a minimum:
*). Little endian (obviously, but worth stating) (l) *). Cortex-A8 or higher fully compliant core(s) *). ARM VFP3 hardware floating point (h) *). ARM NEON Architecture *). Thumb2 interworking *). Your suggestion here?
I think NEON is one step too far because Tegra doesn't support it, and Tegra is rapidly becoming very prolific in convenient form factors, such as these:
http://uk.computers.toshiba-europe.com/innovation/jsp/SUPPORTSECTION/discont...
http://www.motorola.com/Consumers/GB-EN/Consumer-Products-and-Services/Mobil...
I think we should build for ARM (as opposed to Thumb2) but we should support interworking with Thumb2 code through the toolchain options. We should then later consider implementing some Thumb2 optimization. It's more armv7thl, but the (t) is implied since it's ARMv7 anyway.
What reason is there to not use Thumb2 on ARMv7?
Gordan
I think NEON is one step too far because Tegra doesn't support it, and Tegra is rapidly becoming very prolific in convenient form factors, such as these:
http://uk.computers.toshiba-europe.com/innovation/jsp/SUPPORTSECTION/discont...
http://www.motorola.com/Consumers/GB-EN/Consumer-Products-and-Services/Mobil...
As far as I know, not only tegra (in big cortex famaly) doesn't support NEON. There are other cortex without NEON support.
Anyway, I will consider a port armv7 (with VFP) like a very good thing (if we are talking about this), excellent for most of the new evaluation board.
I`m a sort of newbie in ARM world, so I can excuse me if I sometimes can tell something wrong.
Bye Alexjan.
On Thu, May 5, 2011 at 7:35 AM, Gordan Bobic gordan@bobich.net wrote:
On 05/04/2011 07:46 PM, Jon Masters wrote:
Folks,
I'd like to kick off a discussion about flags for ARMv7. My proposal here is that we treat v7hl as an entirely different architecture, and don't try any multi-arch kind of hacks (there isn't the established user base for Fedora ARM to justify doing any of those things at the moment).
Agreed, as long as armv5tel remains supported.
Things I think we should consider as a minimum:
*). Little endian (obviously, but worth stating) (l) *). Cortex-A8 or higher fully compliant core(s)
Defiantly compile with -march=armv7-a to enable the Cortex-A8 errata work arounds. Tuning is another story. Code tuned for an A8 will work quite well on an A9, but code tuned for an A9 may do poorly on an A8.
*). ARM VFP3 hardware floating point (h) *). ARM NEON Architecture *). Thumb2 interworking
I don't think you have to anything explicit here. Thumb-2 and ARM code interoperate just fine.
*). Your suggestion here?
*) Hard float. Never worse than soft float, sometimes much better.
I think NEON is one step too far because Tegra doesn't support it
I'm quite fond of my AC100. I'd recommend VFPv3-D16 and then using IFUNC or hwcaps to pull in NEON optimised libraries for things like ffmpeg.
I think we should build for ARM (as opposed to Thumb2) but we should support interworking with Thumb2 code through the toolchain options. We should then later consider implementing some Thumb2 optimization. It's more armv7thl, but the (t) is implied since it's ARMv7 anyway.
What reason is there to not use Thumb2 on ARMv7?
Thumb-2 code is a bit slower but significantly smaller. Benchmarks in Thumb-2 mode run at around 93 % of the speed of the same code in ARM mode and take around ~75 % of the space. For some applications Thumb-2 should be faster as more of the hot loops will fit in the I cache. I haven't seen that in any benchmarks so far.
See also Dave's email at: http://lists.linaro.org/pipermail/linaro-dev/2011-April/004106.html
for the difference between an ARM and Thumb-2 kernel.
I recommend Thumb-2. It's also were we're putting our efforts in in Linaro.
-- Michael
On Thu, 2011-05-05 at 08:23 +1200, Michael Hope wrote:
*). ARM VFP3 hardware floating point (h) *). ARM NEON Architecture *). Thumb2 interworking
I don't think you have to anything explicit here. Thumb-2 and ARM code interoperate just fine.
*). Your suggestion here?
*) Hard float. Never worse than soft float, sometimes much better.
Note that hard float is implied by VFPv3, and by the subject being "armv7hl". The intention is to use the aapcs-vfp binding but the only real debate is whether to use the D16/32 option. That comes down to whether we want to focus on really great A15 plus or be more inclusive of various minimal ARMv7 systems that are already out there today.
Perhaps we need a matrix of known capabilities in v7 hardware. I guess that's another wiki page I should create, starting with my Beagle, Panda and Efika systems, but a pointer to AC100 specs would be cool (I found various reviews on it, and wow it's pricey compared to e.g. Efika) :)
Jon.
On Thu, May 5, 2011 at 3:50 PM, Jon Masters jonathan@jonmasters.org wrote:
On Thu, 2011-05-05 at 08:23 +1200, Michael Hope wrote:
*). ARM VFP3 hardware floating point (h) *). ARM NEON Architecture *). Thumb2 interworking
I don't think you have to anything explicit here. Thumb-2 and ARM code interoperate just fine.
*). Your suggestion here?
*) Hard float. Never worse than soft float, sometimes much better.
Note that hard float is implied by VFPv3, and by the subject being "armv7hl". The intention is to use the aapcs-vfp binding but the only real debate is whether to use the D16/32 option. That comes down to whether we want to focus on really great A15 plus or be more inclusive of various minimal ARMv7 systems that are already out there today.
Ah, thanks.
Perhaps we need a matrix of known capabilities in v7 hardware. I guess that's another wiki page I should create, starting with my Beagle, Panda and Efika systems, but a pointer to AC100 specs would be cool (I found various reviews on it, and wow it's pricey compared to e.g. Efika) :)
Have a look at: http://wiki.debian.org/ArmHardFloatPort/VfpComparison#FPU
-- Michael
Jon Masters wrote:
On Thu, 2011-05-05 at 08:23 +1200, Michael Hope wrote:
*). ARM VFP3 hardware floating point (h) *). ARM NEON Architecture *). Thumb2 interworking
I don't think you have to anything explicit here. Thumb-2 and ARM code interoperate just fine.
*). Your suggestion here?
*) Hard float. Never worse than soft float, sometimes much better.
Note that hard float is implied by VFPv3, and by the subject being "armv7hl". The intention is to use the aapcs-vfp binding but the only real debate is whether to use the D16/32 option. That comes down to whether we want to focus on really great A15 plus or be more inclusive of various minimal ARMv7 systems that are already out there today.
Perhaps we need a matrix of known capabilities in v7 hardware. I guess that's another wiki page I should create, starting with my Beagle, Panda and Efika systems, but a pointer to AC100 specs would be cool (I found various reviews on it, and wow it's pricey compared to e.g. Efika) :)
Only since Efika price dropped by 40%. The pricing was reasonably comparable before (bearing in mind that AC100 is 2x 1GHz vs 1x 800MHz). Sadly, the AC100 has been discontinued, but other machines based on the same SoC are just coming out, e.g. Trimslice.
Gordan
On Thu, May 5, 2011 at 4:50 AM, Jon Masters jonathan@jonmasters.org wrote:
On Thu, 2011-05-05 at 08:23 +1200, Michael Hope wrote:
*). ARM VFP3 hardware floating point (h) *). ARM NEON Architecture *). Thumb2 interworking
I don't think you have to anything explicit here. Thumb-2 and ARM code interoperate just fine.
*). Your suggestion here?
*) Hard float. Never worse than soft float, sometimes much better.
Note that hard float is implied by VFPv3, and by the subject being "armv7hl". The intention is to use the aapcs-vfp binding but the only real debate is whether to use the D16/32 option. That comes down to whether we want to focus on really great A15 plus or be more inclusive of various minimal ARMv7 systems that are already out there today.
Perhaps we need a matrix of known capabilities in v7 hardware. I guess that's another wiki page I should create, starting with my Beagle, Panda and Efika systems, but a pointer to AC100 specs would be cool (I found various reviews on it, and wow it's pricey compared to e.g. Efika) :)
The AC100 is based on the nvidia tegra 250. Its also used in a lot of the Android 3 tablets and also devices like the ASUS transformer and slider [1]. At the moment it seems to be the chip of choice for a lot of the latest ARM consumer devices. Also it seems there's a Marvell Dove processor that also uses D16.
While I like the idea of optimising for the A15 processor all literature I've seen on it indicates that the first silicon won't be available until sometime in 2012 and devices running them until 2013 although I would like to be proved wrong. Also it appears that the A15 will be using VFPv4 [2] so I suspect to make best use of that we'll need to compile for it and therefore be different again to A8/A9.
Peter
[1] http://en.wikipedia.org/wiki/Nvidia_Tegra [2] http://www.arm.com/products/processors/cortex-a/cortex-a15.php [3] http://wiki.debian.org/ArmHardFloatPort/VfpComparison
Quoting Peter Robinson pbrobinson@gmail.com:
While I like the idea of optimising for the A15 processor all literature I've seen on it indicates that the first silicon won't be available until sometime in 2012 and devices running them until 2013 although I would like to be proved wrong. Also it appears that the A15 will be using VFPv4 [2] so I suspect to make best use of that we'll need to compile for it and therefore be different again to A8/A9.
What version of ARM is Microsofts baseline going to be?
http://windows8news.com/2011/01/05/windows-8-arm-press-release-microsoft/ "Microsoft demonstrated the next version of Windows running on new SoC platforms from Intel running on x86 architecture and from NVIDIA, Qualcomm and Texas Instruments on ARM architecture."
"The technology demonstration included Windows client support across a range of scenarios, such as hardware-accelerated graphics and media playback, hardware-accelerated Web browsing ..."
I'm guessing since it is windows 8, it will be the A15's. (I don't think they will release 8 completely in the next year or so anyway.)
Do all of these have GPGPU's (tegra2, omap4, snapdragon) too?
On Thu, May 05, 2011 at 09:41:32AM -0500, omalleys@msu.edu wrote:
Quoting Peter Robinson pbrobinson@gmail.com:
While I like the idea of optimising for the A15 processor all literature I've seen on it indicates that the first silicon won't be available until sometime in 2012 and devices running them until 2013 although I would like to be proved wrong. Also it appears that the A15 will be using VFPv4 [2] so I suspect to make best use of that we'll need to compile for it and therefore be different again to A8/A9.
What version of ARM is Microsofts baseline going to be?
Even if someone here did know, I'm sure they couldn't tell the world.
On Thu, May 5, 2011 at 3:41 PM, omalleys@msu.edu wrote:
Quoting Peter Robinson pbrobinson@gmail.com:
While I like the idea of optimising for the A15 processor all literature I've seen on it indicates that the first silicon won't be available until sometime in 2012 and devices running them until 2013 although I would like to be proved wrong. Also it appears that the A15 will be using VFPv4 [2] so I suspect to make best use of that we'll need to compile for it and therefore be different again to A8/A9.
What version of ARM is Microsofts baseline going to be?
The demo of IE 10 that was given was running on a nvidia tegra 250 A9 processor so as it is at the moment its running on A9 processors currently. Obviously that's possible to change (and honestly it doesn't bother me at all).
Peter
Hi Jon,
On Wed, May 4, 2011 at 7:46 PM, Jon Masters jonathan@jonmasters.org wrote:
Folks,
I'd like to kick off a discussion about flags for ARMv7. My proposal here is that we treat v7hl as an entirely different architecture, and don't try any multi-arch kind of hacks (there isn't the established user base for Fedora ARM to justify doing any of those things at the moment).
Things I think we should consider as a minimum:
*). Little endian (obviously, but worth stating) (l) *). Cortex-A8 or higher fully compliant core(s) *). ARM VFP3 hardware floating point (h) *). ARM NEON Architecture *). Thumb2 interworking *). Your suggestion here?
I think we should build for ARM (as opposed to Thumb2) but we should support interworking with Thumb2 code through the toolchain options. We should then later consider implementing some Thumb2 optimization. It's more armv7thl, but the (t) is implied since it's ARMv7 anyway.
Several folks have begun looking at toolchain bringup based on the F-15 toolchain applied to an F-13 userspace initially. But I'd like us to discuss options/requirements for toolchains before we go too far.
What's the possible issues and impact of building the F-15 build chain on F-13 vs building on a F-14 toolchain like mainline F-15 on x86 would have been?
Once I get some feedback, I'll be updating the wiki, along with some more F-15 goals and (hopefully) generally useful stuff.
Thanks for kicking off this conversation. Its been something discussed on irc and other channels.
I would like, if its possible, to get to a reasonable decision ASAP so its possible to get what ever changes are necessary into dist-f15 final (simply so we follow F-15 release as close as possible) for the rehat-rpm-config or what ever packages are needed to compile and support the HW platform including things like what ever the rpm naming ends up being.
Peter
On Wed, 2011-05-04 at 14:46 -0400, Jon Masters wrote:
I think we should build for ARM (as opposed to Thumb2) but we should support interworking with Thumb2 code through the toolchain options. We should then later consider implementing some Thumb2 optimization. It's more armv7thl, but the (t) is implied since it's ARMv7 anyway.
The thumb2 situation needs a bit more investigation. I had someone do tests on thumb vs non-thumb last year, and there was no question about thumb1 on v5 (thumb caused a major performance hit) but it was not nearly as clear for thumb2 on v7 - the (very preliminary) results (using a BeagleBoard) indicated the same or slightly better performance for thumb2 over non-thumb2 (this was for packages built using the rpm/mock toolchain with what were otherwise normal Fedora flags), and obviously a smaller binary size. I think some comprehensive testing is needed.
-Chris
On Wed, 2011-05-04 at 18:38 -0400, Chris Tyler wrote:
On Wed, 2011-05-04 at 14:46 -0400, Jon Masters wrote:
I think we should build for ARM (as opposed to Thumb2) but we should support interworking with Thumb2 code through the toolchain options. We should then later consider implementing some Thumb2 optimization. It's more armv7thl, but the (t) is implied since it's ARMv7 anyway.
The thumb2 situation needs a bit more investigation.
I spoke with some folks this evening about this, and other optimization criteria. Apparently, we can expect up to a 30% improvement from Thumb2 but the reality is more like 18-24% for high cache-hit/non-data heavy cases where we could really get an improvement. However, it depends on the micro-architecture in use. We'll possibly see great improvements in Thumb2 performance come A15 but may not notice them on A8 and A9MP.
I'm leaning toward...
I think some comprehensive testing is needed.
And I agree. Do you have someone looking into Thumb2 some more?
As to VFPv3, I'm coming around to the idea of settling on VFPv3-D16 since it seems to be the lowest common denominator on v7. We could do with some comparisons in terms of performance between the two AAPCS variants, if someone is willing to do the investigation. And of course it would really vary from application to application somewhat.
My current revised thought, until proven otherwise:
*). Little endian (obviously, but worth stating) (l) *). Cortex-A8 or higher fully compliant core(s) *). ARM VFP3 hardware floating point (h) (D16 though) *). Not required, but some optimized libs for ARM NEON Architecture *). Thumb2 interworking support but not built for Thumb2
Jon.
On Wed, May 04, 2011 at 01:46:06PM -0500, Jon Masters wrote:
Folks,
I'd like to kick off a discussion about flags for ARMv7. My proposal here is that we treat v7hl as an entirely different architecture, and don't try any multi-arch kind of hacks (there isn't the established user base for Fedora ARM to justify doing any of those things at the moment).
Things I think we should consider as a minimum:
*). Little endian (obviously, but worth stating) (l) *). Cortex-A8 or higher fully compliant core(s)
Is there a measureable difference in code optimized for A8 vs A9 when running on A9? If so, my inclination would be to build for the future - A9. It's not like A9 hardware is hard to come by these days.
*). ARM VFP3 hardware floating point (h) *). ARM NEON Architecture
If libraries using NEON can detect NEON-capable CPUs and switch to using those functions, all the better. Unfortunately, not all A9 products include NEON, but then again, not all apps can make use of NEON SIMD instructions anyhow.
On Thu, 2011-05-05 at 22:36 -0500, Matt Domsch wrote:
On Wed, May 04, 2011 at 01:46:06PM -0500, Jon Masters wrote:
I'd like to kick off a discussion about flags for ARMv7. My proposal here is that we treat v7hl as an entirely different architecture, and don't try any multi-arch kind of hacks (there isn't the established user base for Fedora ARM to justify doing any of those things at the moment).
Things I think we should consider as a minimum:
*). Little endian (obviously, but worth stating) (l) *). Cortex-A8 or higher fully compliant core(s)
Is there a measureable difference in code optimized for A8 vs A9 when running on A9? If so, my inclination would be to build for the future
- A9. It's not like A9 hardware is hard to come by these days.
There's a difference in performance at the microarchitecture level between Cortex A8/A9 and A15, but AFAIK not much between A8/A9. I spoke with some friends from ARM the other evening about this and I'll followup on the subject of tuning at the LDS event next week.
Anyway, for now I'm inclined to optimize for the future. We have a clean slate, and you know I'm all about portability and compatibility, but not if there's nothing already out there to worry about ;) We should make sure whatever we do that we're sane on A15 when it comes out. And I am inclined to realize the value of not harming support for Tegra, so that (in my mind) seals the deal with regard to VFP-D16 vs. D32.
*). ARM VFP3 hardware floating point (h) *). ARM NEON Architecture
If libraries using NEON can detect NEON-capable CPUs and switch to using those functions, all the better. Unfortunately, not all A9 products include NEON, but then again, not all apps can make use of NEON SIMD instructions anyhow.
Yea. I like the SSE analogy. If you see my followup emails later in the thread you'll see we decided that it wasn't worth requiring NEON and instead use the hwcaps and similar mechanisms to pull in the libraries as necessary. Worst case, we do a few optimized libraries one can optionally install, I guess similar to how x86 did i686 glibc.
btw, I managed to unplug myself (well, physically re-located to a Starbucks and was only marginally distracted by IRC/email on my phone) for long enough earlier to read all of the AAPCS, relevant IEEE 754 and so forth, so I now feel that I have a good enough understanding of how the ABI is actually implemented. Think of aapcs-vfp as more of what they intended, and the non hardfp "base" standard as just an accepting the reality that VFP wasn't always present historically.
If you read how VFPv3 actually implements its registers, you'll see that the extra 16 (for D32) really are totally separate anyway as far as the ABI is concerned, so I remain unconvinced we lose a lot by being D16.
Jon.
Jon Masters wrote:
On Thu, 2011-05-05 at 22:36 -0500, Matt Domsch wrote:
On Wed, May 04, 2011 at 01:46:06PM -0500, Jon Masters wrote:
I'd like to kick off a discussion about flags for ARMv7. My proposal here is that we treat v7hl as an entirely different architecture, and don't try any multi-arch kind of hacks (there isn't the established user base for Fedora ARM to justify doing any of those things at the moment).
Things I think we should consider as a minimum:
*). Little endian (obviously, but worth stating) (l) *). Cortex-A8 or higher fully compliant core(s)
Is there a measureable difference in code optimized for A8 vs A9 when running on A9? If so, my inclination would be to build for the future
- A9. It's not like A9 hardware is hard to come by these days.
There's a difference in performance at the microarchitecture level between Cortex A8/A9 and A15, but AFAIK not much between A8/A9. I spoke with some friends from ARM the other evening about this and I'll followup on the subject of tuning at the LDS event next week.
Anyway, for now I'm inclined to optimize for the future. We have a clean slate, and you know I'm all about portability and compatibility, but not if there's nothing already out there to worry about ;)
How important do we deem optimal operation on Efika and Beagleboard?
We should make sure whatever we do that we're sane on A15 when it comes out. And I am inclined to realize the value of not harming support for Tegra, so that (in my mind) seals the deal with regard to VFP-D16 vs. D32.
Is there an ABI change required when going from A8/A9 to A15 to get most of the benefit from it, like there is between softfp and hardfp? If there isn't, I'm not sure it's worth worrying too much about. Target the most basic platform and have optimized packages for things that benefit from it most.
*). ARM VFP3 hardware floating point (h) *). ARM NEON Architecture
If libraries using NEON can detect NEON-capable CPUs and switch to using those functions, all the better. Unfortunately, not all A9 products include NEON, but then again, not all apps can make use of NEON SIMD instructions anyhow.
Yea. I like the SSE analogy. If you see my followup emails later in the thread you'll see we decided that it wasn't worth requiring NEON and instead use the hwcaps and similar mechanisms to pull in the libraries as necessary. Worst case, we do a few optimized libraries one can optionally install, I guess similar to how x86 did i686 glibc.
Persactly. :)
btw, I managed to unplug myself (well, physically re-located to a Starbucks and was only marginally distracted by IRC/email on my phone) for long enough earlier to read all of the AAPCS, relevant IEEE 754 and so forth, so I now feel that I have a good enough understanding of how the ABI is actually implemented. Think of aapcs-vfp as more of what they intended, and the non hardfp "base" standard as just an accepting the reality that VFP wasn't always present historically.
If you read how VFPv3 actually implements its registers, you'll see that the extra 16 (for D32) really are totally separate anyway as far as the ABI is concerned, so I remain unconvinced we lose a lot by being D16.
Can you clarify that? Does that mean that a we can have a D16 distro with D32 kernel/glibc?
Gordan
On Fri, May 6, 2011 at 6:57 PM, Gordan Bobic gordan@bobich.net wrote:
Jon Masters wrote:
If you read how VFPv3 actually implements its registers, you'll see that the extra 16 (for D32) really are totally separate anyway as far as the ABI is concerned, so I remain unconvinced we lose a lot by being D16.
Can you clarify that? Does that mean that a we can have a D16 distro with D32 kernel/glibc?
VFPv3-D16 and VFPv3-D32 code are compatible. You can call D16 code from D32 code and vice-versa.
-- Michael
On Sat, 2011-05-07 at 00:09 +1200, Michael Hope wrote:
On Fri, May 6, 2011 at 6:57 PM, Gordan Bobic gordan@bobich.net wrote:
Jon Masters wrote:
If you read how VFPv3 actually implements its registers, you'll see that the extra 16 (for D32) really are totally separate anyway as far as the ABI is concerned, so I remain unconvinced we lose a lot by being D16.
Can you clarify that? Does that mean that a we can have a D16 distro with D32 kernel/glibc?
VFPv3-D16 and VFPv3-D32 code are compatible. You can call D16 code from D32 code and vice-versa.
This isn't stated in AAPCS but I assumed it to be the case since the register context preservation extends only to the first 16 registers. So, Phillipe can correct us if we're wrong, but this sounds right.
As far as I know, the first 16 are the only registers mandated for passing parameters, etc. which is why there is compatibility. And the other 16 registers would only be used at the local function level, so they would not really buy you very much, especially in non-floating point code! Especially in the kernel, I don't see any realized benefit from building with D32, but some math library might benefit marginally.
We can optimize for A9+ VFPv3-D16 and then there can be packages that are D32 later on, but I suspect it would be more likely that one would create optimized NEON binaries and perhaps one day switch to a D32 base set of configuration flags once nobody is using D16 v7 devices.
Philippe: If I'm crazy, let me know ;)
Jon.
On Fri, May 6, 2011 at 11:01 AM, Jon Masters jonathan@jonmasters.org wrote:
On Sat, 2011-05-07 at 00:09 +1200, Michael Hope wrote:
On Fri, May 6, 2011 at 6:57 PM, Gordan Bobic gordan@bobich.net wrote:
Jon Masters wrote:
We can optimize for A9+ VFPv3-D16 and then there can be packages that are D32 later on, but I suspect it would be more likely that one would create optimized NEON binaries and perhaps one day switch to a D32 base set of configuration flags once nobody is using D16 v7 devices.
Marvell ARMADA 200 series and Tegra2...
Philippe: If I'm crazy, let me know ;)
Matt Sealey matt@genesi-usa.com Product Development Analyst, Genesi USA, Inc.
From: arm-bounces@lists.fedoraproject.org [mailto:arm- bounces@lists.fedoraproject.org] On Behalf Of Jon Masters Sent: 06 May 2011 05:52 To: Matt Domsch Cc: arm@lists.fedoraproject.org Subject: Re: [fedora-arm] armv7hl requirements
On Thu, 2011-05-05 at 22:36 -0500, Matt Domsch wrote:
On Wed, May 04, 2011 at 01:46:06PM -0500, Jon Masters wrote:
I'd like to kick off a discussion about flags for ARMv7. My proposal here is that we treat v7hl as an entirely different architecture, and don't try any multi-arch kind of hacks (there isn't the established
user
base for Fedora ARM to justify doing any of those things at the
moment).
Things I think we should consider as a minimum:
*). Little endian (obviously, but worth stating) (l) *). Cortex-A8 or higher fully compliant core(s)
Is there a measureable difference in code optimized for A8 vs A9 when running on A9? If so, my inclination would be to build for the future
- A9. It's not like A9 hardware is hard to come by these days.
There's a difference in performance at the microarchitecture level between Cortex A8/A9 and A15, but AFAIK not much between A8/A9. I spoke with some friends from ARM the other evening about this and I'll followup on the subject of tuning at the LDS event next week.
Anyway, for now I'm inclined to optimize for the future. We have a clean slate, and you know I'm all about portability and compatibility, but not if there's nothing already out there to worry about ;) We should make sure whatever we do that we're sane on A15 when it comes out. And I am inclined to realize the value of not harming support for Tegra, so that (in my mind) seals the deal with regard to VFP-D16 vs. D32.
Using ARMv7 VFP-D16 as baseline would be best. In terms of platforms, there are now enough A9 based platforms to use this as default build target with Fedora.
*). ARM VFP3 hardware floating point (h) *). ARM NEON Architecture
If libraries using NEON can detect NEON-capable CPUs and switch to using those functions, all the better. Unfortunately, not all A9 products include NEON, but then again, not all apps can make use of NEON SIMD instructions anyhow.
Yea. I like the SSE analogy. If you see my followup emails later in the thread you'll see we decided that it wasn't worth requiring NEON and instead use the hwcaps and similar mechanisms to pull in the libraries as necessary. Worst case, we do a few optimized libraries one can optionally install, I guess similar to how x86 did i686 glibc.
Using hwcaps mechanism should enable you to pull in the right optimized libraries for NEON.
Regards,
Philippe
Matt Domsch wrote:
On Wed, May 04, 2011 at 01:46:06PM -0500, Jon Masters wrote:
Folks,
I'd like to kick off a discussion about flags for ARMv7. My proposal here is that we treat v7hl as an entirely different architecture, and don't try any multi-arch kind of hacks (there isn't the established user base for Fedora ARM to justify doing any of those things at the moment).
Things I think we should consider as a minimum:
*). Little endian (obviously, but worth stating) (l) *). Cortex-A8 or higher fully compliant core(s)
Is there a measureable difference in code optimized for A8 vs A9 when running on A9? If so, my inclination would be to build for the future
- A9. It's not like A9 hardware is hard to come by these days.
I think the key thing to consider is cost/benefit. If the code optimized for A9 runs poorly on A8, but the code optimized for A8 runs on A9 almost as well as the code optimized for the A9, then it would be better to optimize for the A8.
Since A8 doesn't have OoOE, it wouldn't surprise me if latter was the case (A8 code running imperceptibly worse on A9 but A9 code running much worse on A8).
There is also the more philosophical issue - A9 is already faster than A8, so it may be better to target the A8 on the basis that it is more in need of that extra boost.
Having said that, the only real hardware platforms that are likely to be affected here are the Genesi Efika and the Beagleboard. Everything else that is popular seems to be either ARMv5 (Sheeva/Guru plugs) or A9 (AC100, Pandaboard, Trimslice).
*). ARM VFP3 hardware floating point (h) *). ARM NEON Architecture
If libraries using NEON can detect NEON-capable CPUs and switch to using those functions, all the better. Unfortunately, not all A9 products include NEON, but then again, not all apps can make use of NEON SIMD instructions anyhow.
Agreed - NEON should definitely be optional rather than mandatory, especially if there is no ABI change required. This should be no different to earlier versions of x86 Fedora where we had an i386 distro with i686 kernel and glibc. We can have optional NEON/D32/VFP4 packages for things that benefit from it, if the platform supports it.
Gordan
On Fri, May 6, 2011 at 7:51 AM, Gordan Bobic gordan@bobich.net wrote:
Matt Domsch wrote:
On Wed, May 04, 2011 at 01:46:06PM -0500, Jon Masters wrote:
Folks,
I'd like to kick off a discussion about flags for ARMv7. My proposal here is that we treat v7hl as an entirely different architecture, and don't try any multi-arch kind of hacks (there isn't the established user base for Fedora ARM to justify doing any of those things at the moment).
Things I think we should consider as a minimum:
*). Little endian (obviously, but worth stating) (l) *). Cortex-A8 or higher fully compliant core(s)
Is there a measureable difference in code optimized for A8 vs A9 when running on A9? If so, my inclination would be to build for the future
- A9. It's not like A9 hardware is hard to come by these days.
I think the key thing to consider is cost/benefit. If the code optimized for A9 runs poorly on A8, but the code optimized for A8 runs on A9 almost as well as the code optimized for the A9, then it would be better to optimize for the A8.
Since A8 doesn't have OoOE, it wouldn't surprise me if latter was the case (A8 code running imperceptibly worse on A9 but A9 code running much worse on A8).
There is also the more philosophical issue - A9 is already faster than A8, so it may be better to target the A8 on the basis that it is more in need of that extra boost.
Having said that, the only real hardware platforms that are likely to be affected here are the Genesi Efika and the Beagleboard. Everything else that is popular seems to be either ARMv5 (Sheeva/Guru plugs) or A9 (AC100, Pandaboard, Trimslice).
Actually there's dozens of devices that use A8 processors. The Nokia n900 and dozens of other phones. They may or may not be desirable to support but there are a number of other tablet or netbook devices that would be nice to be able to use.
Peter
On 05/05/2011 11:51 PM, Gordan Bobic wrote:
I think the key thing to consider is cost/benefit. If the code optimized for A9 runs poorly on A8, but the code optimized for A8 runs on A9 almost as well as the code optimized for the A9, then it would be better to optimize for the A8.
Since A8 doesn't have OoOE, it wouldn't surprise me if latter was the case (A8 code running imperceptibly worse on A9 but A9 code running much worse on A8).
There is also the more philosophical issue - A9 is already faster than A8, so it may be better to target the A8 on the basis that it is more in need of that extra boost.
Two other thoughts on backward/forward compatibility and tuning:
Since there is still going to be an ARMv5 Fedora distribution, no device will be left behind. Some devices may not realize their full potential, but they will continue to work.
There have been a number of iterations of what Fedora x86 tunes for as the years have gone by. Maybe Fedora 17+ should be tuned for A15, but we can make the decision when the time comes. As long as we aren't doing something that is going to require ABI breakage later, when A15 is common, why not support the widest range of available ARMv7 hardware there is for the next 6 months? With this in mind, I suggest tuning for A8 on F15 and reevaluating the target again in Fedora 16.
Hi Jon,
On Wed, May 04 2011, Jon Masters wrote:
I'd like to kick off a discussion about flags for ARMv7. My proposal here is that we treat v7hl as an entirely different architecture, and don't try any multi-arch kind of hacks (there isn't the established user base for Fedora ARM to justify doing any of those things at the moment).
Things I think we should consider as a minimum:
*). Little endian (obviously, but worth stating) (l) *). Cortex-A8 or higher fully compliant core(s) *). ARM VFP3 hardware floating point (h) *). ARM NEON Architecture *). Thumb2 interworking *). Your suggestion here?
I think we should build for ARM (as opposed to Thumb2) but we should support interworking with Thumb2 code through the toolchain options. We should then later consider implementing some Thumb2 optimization. It's more armv7thl, but the (t) is implied since it's ARMv7 anyway.
Several folks have begun looking at toolchain bringup based on the F-15 toolchain applied to an F-13 userspace initially. But I'd like us to discuss options/requirements for toolchains before we go too far.
Once I get some feedback, I'll be updating the wiki, along with some more F-15 goals and (hopefully) generally useful stuff.
Just for the record, this sounds great from OLPC's perspective; +1. (I expect we'd rather build for Thumb2, even if only for the size benefit.)
- Chris.
Hi Jon,
On Wed, May 04 2011, Jon Masters wrote:
I'd like to kick off a discussion about flags for ARMv7. My proposal here is that we treat v7hl as an entirely different architecture, and don't try any multi-arch kind of hacks (there isn't the established user base for Fedora ARM to justify doing any of those things at the moment).
Things I think we should consider as a minimum:
*). Little endian (obviously, but worth stating) (l) *). Cortex-A8 or higher fully compliant core(s) *). ARM VFP3 hardware floating point (h) *). ARM NEON Architecture *). Thumb2 interworking *). Your suggestion here?
I think we should build for ARM (as opposed to Thumb2) but we should support interworking with Thumb2 code through the toolchain options. We should then later consider implementing some Thumb2 optimization. It's more armv7thl, but the (t) is implied since it's ARMv7 anyway.
Several folks have begun looking at toolchain bringup based on the F-15 toolchain applied to an F-13 userspace initially. But I'd like us to discuss options/requirements for toolchains before we go too far.
Once I get some feedback, I'll be updating the wiki, along with some more F-15 goals and (hopefully) generally useful stuff.
Just for the record, this sounds great from OLPC's perspective; +1. (I expect we'd rather build for Thumb2, even if only for the size benefit.)
ive started building some f15 rpms with hardfp
i set in redhat-rpm-config -march=armv7-a -mfpu=vfpv3-d16 -mfloat-abi=hard -mthumb
im using meego as a base to bootstrap, we will need to build a couple of times to get everything bootstrapped right with the full set of flags. meego dropped some things like selinux. im slowly making some progress. i want to get to having a fedora minimal buildroot by the end the week. though that might be a bit hard since gcc will take some time to compile.
Dennis
Dennis Gilmore wrote:
Hi Jon,
On Wed, May 04 2011, Jon Masters wrote:
I'd like to kick off a discussion about flags for ARMv7. My proposal here is that we treat v7hl as an entirely different architecture, and don't try any multi-arch kind of hacks (there isn't the established user base for Fedora ARM to justify doing any of those things at the moment).
Things I think we should consider as a minimum:
*). Little endian (obviously, but worth stating) (l) *). Cortex-A8 or higher fully compliant core(s) *). ARM VFP3 hardware floating point (h) *). ARM NEON Architecture *). Thumb2 interworking *). Your suggestion here?
I think we should build for ARM (as opposed to Thumb2) but we should support interworking with Thumb2 code through the toolchain options. We should then later consider implementing some Thumb2 optimization. It's more armv7thl, but the (t) is implied since it's ARMv7 anyway.
Several folks have begun looking at toolchain bringup based on the F-15 toolchain applied to an F-13 userspace initially. But I'd like us to discuss options/requirements for toolchains before we go too far.
Once I get some feedback, I'll be updating the wiki, along with some more F-15 goals and (hopefully) generally useful stuff.
Just for the record, this sounds great from OLPC's perspective; +1. (I expect we'd rather build for Thumb2, even if only for the size benefit.)
ive started building some f15 rpms with hardfp
i set in redhat-rpm-config -march=armv7-a -mfpu=vfpv3-d16 -mfloat-abi=hard -mthumb
im using meego as a base to bootstrap, we will need to build a couple of times to get everything bootstrapped right with the full set of flags. meego dropped some things like selinux. im slowly making some progress. i want to get to having a fedora minimal buildroot by the end the week. though that might be a bit hard since gcc will take some time to compile.
Does that mean we are skipping F14 alltogether? I'm not against the idea, just curious. Anything that helps close the gap to primary distros is a good thing. :)
Gordan
Quoting Gordan Bobic gordan@bobich.net:
Dennis Gilmore wrote:
Hi Jon,
On Wed, May 04 2011, Jon Masters wrote:
I'd like to kick off a discussion about flags for ARMv7. My proposal here is that we treat v7hl as an entirely different architecture, and don't try any multi-arch kind of hacks (there isn't the established user base for Fedora ARM to justify doing any of those things at the moment).
Things I think we should consider as a minimum:
*). Little endian (obviously, but worth stating) (l) *). Cortex-A8 or higher fully compliant core(s) *). ARM VFP3 hardware floating point (h) *). ARM NEON Architecture *). Thumb2 interworking *). Your suggestion here?
I think we should build for ARM (as opposed to Thumb2) but we should support interworking with Thumb2 code through the toolchain options. We should then later consider implementing some Thumb2 optimization. It's more armv7thl, but the (t) is implied since it's ARMv7 anyway.
Several folks have begun looking at toolchain bringup based on the F-15 toolchain applied to an F-13 userspace initially. But I'd like us to discuss options/requirements for toolchains before we go too far.
Once I get some feedback, I'll be updating the wiki, along with some more F-15 goals and (hopefully) generally useful stuff.
Just for the record, this sounds great from OLPC's perspective; +1. (I expect we'd rather build for Thumb2, even if only for the size benefit.)
ive started building some f15 rpms with hardfp
i set in redhat-rpm-config -march=armv7-a -mfpu=vfpv3-d16 -mfloat-abi=hard -mthumb
im using meego as a base to bootstrap, we will need to build a couple of times to get everything bootstrapped right with the full set of flags. meego dropped some things like selinux. im slowly making some progress. i want to get to having a fedora minimal buildroot by the end the week. though that might be a bit hard since gcc will take some time to compile.
Does that mean we are skipping F14 alltogether? I'm not against the idea, just curious. Anything that helps close the gap to primary distros is a good thing. :)
I don't think armv5 is skipping f14.
It is probably a good idea to skip F14 for armv7 though and start off at least in the general ballpark of the mainline distro.
The rpm tweaks need to make it upstream though. :)
On Tue, May 10, 2011 at 4:28 PM, omalleys@msu.edu wrote:
Quoting Gordan Bobic gordan@bobich.net:
Dennis Gilmore wrote:
Hi Jon,
On Wed, May 04 2011, Jon Masters wrote:
I'd like to kick off a discussion about flags for ARMv7. My proposal here is that we treat v7hl as an entirely different architecture, and don't try any multi-arch kind of hacks (there isn't the established user base for Fedora ARM to justify doing any of those things at the moment).
Things I think we should consider as a minimum:
*). Little endian (obviously, but worth stating) (l) *). Cortex-A8 or higher fully compliant core(s) *). ARM VFP3 hardware floating point (h) *). ARM NEON Architecture *). Thumb2 interworking *). Your suggestion here?
I think we should build for ARM (as opposed to Thumb2) but we should support interworking with Thumb2 code through the toolchain options. We should then later consider implementing some Thumb2 optimization. It's more armv7thl, but the (t) is implied since it's ARMv7 anyway.
Several folks have begun looking at toolchain bringup based on the F-15 toolchain applied to an F-13 userspace initially. But I'd like us to discuss options/requirements for toolchains before we go too far.
Once I get some feedback, I'll be updating the wiki, along with some more F-15 goals and (hopefully) generally useful stuff.
Just for the record, this sounds great from OLPC's perspective; +1. (I expect we'd rather build for Thumb2, even if only for the size benefit.)
ive started building some f15 rpms with hardfp
i set in redhat-rpm-config -march=armv7-a -mfpu=vfpv3-d16 -mfloat-abi=hard -mthumb
im using meego as a base to bootstrap, we will need to build a couple of times to get everything bootstrapped right with the full set of flags. meego dropped some things like selinux. im slowly making some progress. i want to get to having a fedora minimal buildroot by the end the week. though that might be a bit hard since gcc will take some time to compile.
Does that mean we are skipping F14 alltogether? I'm not against the idea, just curious. Anything that helps close the gap to primary distros is a good thing. :)
I don't think armv5 is skipping f14.
No definitely not skipping armv5 F-14. I'm working on it but having a few issues. All help and assistance appreciated.
It is probably a good idea to skip F14 for armv7 though and start off at least in the general ballpark of the mainline distro.
The armv7+hardfp would be hard for F-14 as it would need a lot of patches to gcc etc. armv7+soft as we have now probably won't provide massive improvements. Ultimately I think we'll end up with a arm5tel for maximum hardware support as we have now, and then a arm7hl (plus possibly thumb) for best performance on armv7 platforms.
The rpm tweaks need to make it upstream though. :)
Correct, and it would be good to get this into there soon assuming the above settings are what we all agree upon for v7 + hardfp for F-15.
Peter
Jon asked me to join this list, here's my first contribution...
though that might be a bit hard since gcc will take some time to compile.
Here's a trick to speeding up arm compiles. Install distcc on the arm device. On your local x86 machines, install a hacked distccd that runs an arm cross compiler.
Now you can add your x86 boxen, which tend to be faster than arm devices, as additional build hosts, or in my case, the only build host - I let the arm device worry about preprocessing et al and let the desktop do the compiling.
Since there are no hacks on the arm side, this doesn't break the "Fedora must build native" rule - as far as the arm side knows, it *is* running native, and simply disabling distcc still works, just slower.
Use a non-standard port for these distcc's so you won't confuse any native distcc users.
Here's the distccd hack you need (change the path as appropriate for your local machine):
--- distcc-2.18.3/src/arg.c 2004-11-30 07:13:53.000000000 -0500 +++ distcc-2.18.3-dj/src/arg.c 2011-03-14 18:48:55.000000000 -0400 @@ -147,6 +147,19 @@ int dcc_scan_args(char *argv[], char **i
*input_file = *output_file = NULL;
+#define ARMPATH "/envy/dj/ges/arm/install/bin/armv5tel-redhat-linux-gnueabi-" + if (strcmp (argv[0], "gcc") == 0 + || strcmp (argv[0], "cc") == 0) { + argv[0] = strdup( ARMPATH "gcc"); + } + if (strcmp (argv[0], "g++") == 0 + || strcmp (argv[0], "c++") == 0) { + argv[0] = strdup(ARMPATH "g++"); + } + if (strcmp (argv[0], "as") == 0) { + argv[0] = strdup(ARMPATH "as"); + } + for (i = 0; (a = argv[i]); i++) { if (a[0] == '-') { if (!strcmp(a, "-E")) {
On Tuesday, May 10, 2011 01:13:42 PM DJ Delorie wrote:
Jon asked me to join this list, here's my first contribution...
though that might be a bit hard since gcc will take some time to compile.
Here's a trick to speeding up arm compiles. Install distcc on the arm device. On your local x86 machines, install a hacked distccd that runs an arm cross compiler.
We dont allow the use of any cross compiliation for any fedora bits. one of the requirements is full native compiliation.
thanks for the tip and idea. people doing their own compilation can choose to use it :)
Dennis
We dont allow the use of any cross compiliation for any fedora bits. one of the requirements is full native compiliation.
Yup, I understand that. I'm riding the philosophical fine line between "actual full native" and "indistinguishable from full native", but agree that the official distibutions should involve no cross compilers. My hack allows individuals to choose which side of the line they want to be on, but doesn't require any changes that *break* full native.
My goal (our goal?) is to get a miniminal set of RPMs in the shortest time, since that milestone enables so many more people to get stuff done. It's a blocker for massive parallelization. And we can always rebuild the "tainted" RPMs later.
I must say that I like this idea A LOT! :D This might just make things like LibreOffice builds actually plausible.
Gordan
DJ Delorie wrote:
Jon asked me to join this list, here's my first contribution...
though that might be a bit hard since gcc will take some time to compile.
Here's a trick to speeding up arm compiles. Install distcc on the arm device. On your local x86 machines, install a hacked distccd that runs an arm cross compiler.
Now you can add your x86 boxen, which tend to be faster than arm devices, as additional build hosts, or in my case, the only build host
- I let the arm device worry about preprocessing et al and let the
desktop do the compiling.
Since there are no hacks on the arm side, this doesn't break the "Fedora must build native" rule - as far as the arm side knows, it *is* running native, and simply disabling distcc still works, just slower.
Use a non-standard port for these distcc's so you won't confuse any native distcc users.
Here's the distccd hack you need (change the path as appropriate for your local machine):
--- distcc-2.18.3/src/arg.c 2004-11-30 07:13:53.000000000 -0500 +++ distcc-2.18.3-dj/src/arg.c 2011-03-14 18:48:55.000000000 -0400 @@ -147,6 +147,19 @@ int dcc_scan_args(char *argv[], char **i
*input_file = *output_file = NULL;
+#define ARMPATH "/envy/dj/ges/arm/install/bin/armv5tel-redhat-linux-gnueabi-"
- if (strcmp (argv[0], "gcc") == 0
|| strcmp (argv[0], "cc") == 0) {
argv[0] = strdup( ARMPATH "gcc");
- }
- if (strcmp (argv[0], "g++") == 0
|| strcmp (argv[0], "c++") == 0) {
argv[0] = strdup(ARMPATH "g++");
- }
- if (strcmp (argv[0], "as") == 0) {
argv[0] = strdup(ARMPATH "as");
- }
- for (i = 0; (a = argv[i]); i++) { if (a[0] == '-') { if (!strcmp(a, "-E")) {
arm mailing list arm@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/arm
Quoting Gordan Bobic gordan@bobich.net:
I must say that I like this idea A LOT! :D
I like this idea a lot too. It will speed up development. :)
If the -main- development is done this way, then a final compile for the release is done on actual hardware with the actual toolchain, it would also meet the specs of the Fedora Project. This is less frustrating to developers, and it should only require a 1 recompile since the build issues should be fixed.
While adding the armv7l flags, can we also double check for the existence of the armv5tel flags and add them if necessary especially in the f15 and f16 packages? :)
On Fri, May 13, 2011 at 1:44 PM, omalleys@msu.edu wrote:
Quoting Gordan Bobic gordan@bobich.net:
I must say that I like this idea A LOT! :D
I like this idea a lot too. It will speed up development. :)
If the -main- development is done this way, then a final compile for the release is done on actual hardware with the actual toolchain, it would also meet the specs of the Fedora Project. This is less frustrating to developers, and it should only require a 1 recompile since the build issues should be fixed.
Ultimately anything compiled in koji that is not a scratch build could conceivably become part of the final compose of the OS so there's no way of telling what is dev and what is final. In reality once things settle down all development should be done upstream in Fedora and koji-shadow then just follows mainline.
While adding the armv7l flags, can we also double check for the existence of the armv5tel flags and add them if necessary especially in the f15 and f16 packages? :)
They are already there. There should have been nothing changed of recent time. Ultimately they're an addition, arches don't randomly get removed.
Peter
Peter Robinson wrote:
I must say that I like this idea A LOT! :D
I like this idea a lot too. It will speed up development. :)
If the -main- development is done this way, then a final compile for the release is done on actual hardware with the actual toolchain, it would also meet the specs of the Fedora Project. This is less frustrating to developers, and it should only require a 1 recompile since the build issues should be fixed.
Ultimately anything compiled in koji that is not a scratch build could conceivably become part of the final compose of the OS so there's no way of telling what is dev and what is final. In reality once things settle down all development should be done upstream in Fedora and koji-shadow then just follows mainline.
I didn't think this was a suggestion for the official build koji, was it? The point is that it is very hard to meaningfully troubleshoot a process that takes 3 days to fail on a 1.2GHz ARM, and then has to be re-started from scratch. The incremental progress becomes too painfully slow. But if we can get that 3 day process into a 3 hour process (e.g. I've only got about 4GHz worth of ARM cores, but about 40GHz worth of x86-64 cores), then the problem is at least transitioned from the realm of unworkable into the realm of slow.
Once you know it builds and works, waiting for 3+ days so bad because at least you know something usable will come out of it.
Gordan
On Fri, May 13, 2011 at 2:06 PM, Gordan Bobic gordan@bobich.net wrote:
Peter Robinson wrote:
I must say that I like this idea A LOT! :D
I like this idea a lot too. It will speed up development. :)
If the -main- development is done this way, then a final compile for the release is done on actual hardware with the actual toolchain, it would also meet the specs of the Fedora Project. This is less frustrating to developers, and it should only require a 1 recompile since the build issues should be fixed.
Ultimately anything compiled in koji that is not a scratch build could conceivably become part of the final compose of the OS so there's no way of telling what is dev and what is final. In reality once things settle down all development should be done upstream in Fedora and koji-shadow then just follows mainline.
I didn't think this was a suggestion for the official build koji, was it? The point is that it is very hard to meaningfully troubleshoot a process that takes 3 days to fail on a 1.2GHz ARM, and then has to be re-started from scratch. The incremental progress becomes too painfully slow. But if we can get that 3 day process into a 3 hour process (e.g. I've only got about 4GHz worth of ARM cores, but about 40GHz worth of x86-64 cores), then the problem is at least transitioned from the realm of unworkable into the realm of slow.
Well there's nothing stopping people from doing that themselves.
I've tried QEMU emulation on an IBM x3850x5 I have at work with 64 2.8 ghz threads and 256 GB of RAM. Its still slower than the real arm boxes.
Once you know it builds and works, waiting for 3+ days so bad because at least you know something usable will come out of it.
Believe me I am painfully aware of that. I have got gcc to compile on one of the ARM boxes in mock. Using koji on the same device it doesn't compile and I have no idea why so its still not a guarantee it seems, very unfortunately!
Peter
Believe me I am painfully aware of that. I have got gcc to compile on one of the ARM boxes in mock. Using koji on the same device it doesn't compile and I have no idea why so its still not a guarantee it seems, very unfortunately!
I've build gcc RPMs on my pandaboard. Try overriding the -j2 to -j1, my board was running out of memory during certain key steps in the build.
On Fri, May 13, 2011 at 4:48 PM, DJ Delorie dj@redhat.com wrote:
Believe me I am painfully aware of that. I have got gcc to compile on one of the ARM boxes in mock. Using koji on the same device it doesn't compile and I have no idea why so its still not a guarantee it seems, very unfortunately!
I've build gcc RPMs on my pandaboard. Try overriding the -j2 to -j1, my board was running out of memory during certain key steps in the build.
Yes, this board has 3Gb (koji3 in the arm cluster). when monitoring from top the build topped out at around 2Gb in the gcj part of the build. Unfortunately its only armv5 spec. Jared (of FPL fame) and I got some ARM contacts at the RH Summit and we're working to get some armv7 multi core multi gig of ram devices we can add to koji and, as appropriate, other access to.
I don't want to override unless really necessary as ultimately we need to be able build native upstream Fedora. Whether that is getting patches/changes upstream or build arguments.
Peter
On 05/13/2011 06:50 PM, Peter Robinson wrote:
On Fri, May 13, 2011 at 4:48 PM, DJ Deloriedj@redhat.com wrote:
Believe me I am painfully aware of that. I have got gcc to compile on one of the ARM boxes in mock. Using koji on the same device it doesn't compile and I have no idea why so its still not a guarantee it seems, very unfortunately!
I've build gcc RPMs on my pandaboard. Try overriding the -j2 to -j1, my board was running out of memory during certain key steps in the build.
Yes, this board has 3Gb (koji3 in the arm cluster).
What ARM hardware has that much RAM on it??
Gordan
On Fri, May 13, 2011 at 7:47 PM, Gordan Bobic gordan@bobich.net wrote:
On 05/13/2011 06:50 PM, Peter Robinson wrote:
On Fri, May 13, 2011 at 4:48 PM, DJ Deloriedj@redhat.com wrote:
Believe me I am painfully aware of that. I have got gcc to compile on one of the ARM boxes in mock. Using koji on the same device it doesn't compile and I have no idea why so its still not a guarantee it seems, very unfortunately!
I've build gcc RPMs on my pandaboard. Try overriding the -j2 to -j1, my board was running out of memory during certain key steps in the build.
Yes, this board has 3Gb (koji3 in the arm cluster).
What ARM hardware has that much RAM on it??
Its a Marvell development board provided to OLPC to assist in their development of the XO 1.75.
Peter
On 05/13/2011 08:11 PM, Peter Robinson wrote:
On Fri, May 13, 2011 at 7:47 PM, Gordan Bobicgordan@bobich.net wrote:
On 05/13/2011 06:50 PM, Peter Robinson wrote:
On Fri, May 13, 2011 at 4:48 PM, DJ Deloriedj@redhat.com wrote:
Believe me I am painfully aware of that. I have got gcc to compile on one of the ARM boxes in mock. Using koji on the same device it doesn't compile and I have no idea why so its still not a guarantee it seems, very unfortunately!
I've build gcc RPMs on my pandaboard. Try overriding the -j2 to -j1, my board was running out of memory during certain key steps in the build.
Yes, this board has 3Gb (koji3 in the arm cluster).
What ARM hardware has that much RAM on it??
Its a Marvell development board provided to OLPC to assist in their development of the XO 1.75.
Ah, so not something one could actually just buy, I take it...
Gordan
Hi,
On Fri, May 13 2011, Gordan Bobic wrote:
Yes, this board has 3Gb (koji3 in the arm cluster).
What ARM hardware has that much RAM on it??
Its a Marvell development board provided to OLPC to assist in their development of the XO 1.75.
Ah, so not something one could actually just buy, I take it...
Yeah, that's correct.
- Chris.
slow. But if we can get that 3 day process into a 3 hour process (e.g.
I don't think you'll see *that* kind of improvement, unless you have a *really* slow arm board, or a *really* compile intensive build.
For a kernel RPM build, for example, distcc reduces my build time from 100 minutes to 60 minutes, a 40% improvement. The remaining time is preprocessing, linking, scripts, etc, and can't easily be done off-board. The x86 host is pretty much idle even during "heavy" compile periods - it's a 3.5GHz i7 so I'm counting compile time as "essentially zero" (native kernel "make -j12 zImage" takes about 45 seconds).
One other trick I'd like to try is to boot the arm board off an nfsroot, and wrap gcc itself to run on the x86 box - including preprocessing and linking. This will require careful mapping of arm and x86 filesystems, though.
No, I don't expect this to be used for official builds ;-)
Quoting DJ Delorie dj@redhat.com:
For a kernel RPM build, for example, distcc reduces my build time from 100 minutes to 60 minutes, a 40% improvement.
It took me about 50-60 minutes to compile the kernel on the guruplug, on a mounted NFS filesystem (hosted on x85 linux) with syncing(?), security, etc. turned off shared over GigE.
(that was a straight kernel compile, not an rpm.)