glibc performs a quick test run using valgrind as part of the build process.
Lately, this started crashing:
+ elf/ld.so --library-path .:elf:nptl:dlfcn /usr/bin/valgrind elf/ld.so --library-path .:elf:nptl:dlfcn /usr/bin/true ==924== Memcheck, a memory error detector ==924== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==924== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info ==924== Command: elf/ld.so --library-path .:elf:nptl:dlfcn /usr/bin/true ==924== ARM64 front end: branch_etc disInstr(arm64): unhandled instruction 0xD5380000 disInstr(arm64): 1101'0101 0011'1000 0000'0000 0000'0000 ==924== valgrind: Unrecognised instruction at address 0x11f548. ==924== at 0x11F548: init_cpu_features (cpu-features.c:32) ==924== by 0x11F548: dl_platform_init (dl-machine.h:241) ==924== by 0x11F548: _dl_sysdep_start (dl-sysdep.c:231) ==924== by 0x10981B: _dl_start_final (rtld.c:412) ==924== by 0x109AAB: _dl_start (rtld.c:520) ==924== by 0x108F47: ??? (in /builddir/build/BUILD/glibc-2.25-545-g9649350/build-aarch64-redhat-linux/elf/ld.so)
The line in question is:
asm volatile ("mrs %0, midr_el1" : "=r"(id));
That seems to match the instruction bit pattern, too. There is a check around it:
if (hwcap & HWCAP_CPUID) { register uint64_t id = 0; asm volatile ("mrs %0, midr_el1" : "=r"(id)); cpu_features->midr_el1 = id; } else cpu_features->midr_el1 = 0;
I think this code is fine. Unfortunately, I don't know if I'll be able to get a disassembly or debug this any further. There are a couple of potential causes (GLRO (dl_hwcap) is not initialized correctly in glibc, HWCAP_CPUID is not masked by the kernel or valgrind despite the lack of support, GCC schedule the volatile asm statement before the condition).
Is anyone else seeing this?
I will disable the valgrind sanity test during the Fedora build for the time being.
Thanks, Florian
On 22/06/17 12:14, Florian Weimer wrote:
glibc performs a quick test run using valgrind as part of the build process.
Lately, this started crashing:
- elf/ld.so --library-path .:elf:nptl:dlfcn /usr/bin/valgrind elf/ld.so
--library-path .:elf:nptl:dlfcn /usr/bin/true ==924== Memcheck, a memory error detector ==924== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==924== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info ==924== Command: elf/ld.so --library-path .:elf:nptl:dlfcn /usr/bin/true ==924== ARM64 front end: branch_etc disInstr(arm64): unhandled instruction 0xD5380000 disInstr(arm64): 1101'0101 0011'1000 0000'0000 0000'0000 ==924== valgrind: Unrecognised instruction at address 0x11f548. ==924== at 0x11F548: init_cpu_features (cpu-features.c:32) ==924== by 0x11F548: dl_platform_init (dl-machine.h:241) ==924== by 0x11F548: _dl_sysdep_start (dl-sysdep.c:231) ==924== by 0x10981B: _dl_start_final (rtld.c:412) ==924== by 0x109AAB: _dl_start (rtld.c:520) ==924== by 0x108F47: ??? (in /builddir/build/BUILD/glibc-2.25-545-g9649350/build-aarch64-redhat-linux/elf/ld.so)
The line in question is:
asm volatile ("mrs %0, midr_el1" : "=r"(id));
note that this is an instruction that is emulated by the kernel now (the register read is not available to userspace)
so it's understandable that valgrind does not understand it yet.
That seems to match the instruction bit pattern, too. There is a check around it:
if (hwcap & HWCAP_CPUID) { register uint64_t id = 0; asm volatile ("mrs %0, midr_el1" : "=r"(id)); cpu_features->midr_el1 = id; } else cpu_features->midr_el1 = 0;
I think this code is fine. Unfortunately, I don't know if I'll be able to get a disassembly or debug this any further. There are a couple of potential causes (GLRO (dl_hwcap) is not initialized correctly in glibc, HWCAP_CPUID is not masked by the kernel or valgrind despite the lack of support, GCC schedule the volatile asm statement before the condition).
Is anyone else seeing this?
I will disable the valgrind sanity test during the Fedora build for the time being.
Thanks, Florian
On Thu, Jun 22, 2017 at 12:14 PM, Florian Weimer fweimer@redhat.com wrote:
glibc performs a quick test run using valgrind as part of the build process.
Lately, this started crashing:
- elf/ld.so --library-path .:elf:nptl:dlfcn /usr/bin/valgrind elf/ld.so
--library-path .:elf:nptl:dlfcn /usr/bin/true ==924== Memcheck, a memory error detector ==924== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==924== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info ==924== Command: elf/ld.so --library-path .:elf:nptl:dlfcn /usr/bin/true ==924== ARM64 front end: branch_etc disInstr(arm64): unhandled instruction 0xD5380000 disInstr(arm64): 1101'0101 0011'1000 0000'0000 0000'0000 ==924== valgrind: Unrecognised instruction at address 0x11f548. ==924== at 0x11F548: init_cpu_features (cpu-features.c:32) ==924== by 0x11F548: dl_platform_init (dl-machine.h:241) ==924== by 0x11F548: _dl_sysdep_start (dl-sysdep.c:231) ==924== by 0x10981B: _dl_start_final (rtld.c:412) ==924== by 0x109AAB: _dl_start (rtld.c:520) ==924== by 0x108F47: ??? (in /builddir/build/BUILD/glibc-2.25-545-g9649350/build-aarch64-redhat-linux/elf/ld.so)
The line in question is:
asm volatile ("mrs %0, midr_el1" : "=r"(id));
This instruction actually traps to the kernel and comes back with the right value for MIDR with some emulation in the kernel. I suspect you are looking at a valgrind issue here.
regards Ramana
That seems to match the instruction bit pattern, too. There is a check around it:
if (hwcap & HWCAP_CPUID) { register uint64_t id = 0; asm volatile ("mrs %0, midr_el1" : "=r"(id)); cpu_features->midr_el1 = id; } else cpu_features->midr_el1 = 0;
I think this code is fine. Unfortunately, I don't know if I'll be able to get a disassembly or debug this any further. There are a couple of potential causes (GLRO (dl_hwcap) is not initialized correctly in glibc, HWCAP_CPUID is not masked by the kernel or valgrind despite the lack of support, GCC schedule the volatile asm statement before the condition).
Is anyone else seeing this?
I will disable the valgrind sanity test during the Fedora build for the time being.
Thanks, Florian
On Thursday 22 June 2017 04:44 PM, Florian Weimer wrote:
glibc performs a quick test run using valgrind as part of the build process.
Lately, this started crashing:
- elf/ld.so --library-path .:elf:nptl:dlfcn /usr/bin/valgrind elf/ld.so
--library-path .:elf:nptl:dlfcn /usr/bin/true ==924== Memcheck, a memory error detector ==924== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==924== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info ==924== Command: elf/ld.so --library-path .:elf:nptl:dlfcn /usr/bin/true ==924== ARM64 front end: branch_etc disInstr(arm64): unhandled instruction 0xD5380000 disInstr(arm64): 1101'0101 0011'1000 0000'0000 0000'0000 ==924== valgrind: Unrecognised instruction at address 0x11f548. ==924== at 0x11F548: init_cpu_features (cpu-features.c:32) ==924== by 0x11F548: dl_platform_init (dl-machine.h:241) ==924== by 0x11F548: _dl_sysdep_start (dl-sysdep.c:231) ==924== by 0x10981B: _dl_start_final (rtld.c:412) ==924== by 0x109AAB: _dl_start (rtld.c:520) ==924== by 0x108F47: ??? (in /builddir/build/BUILD/glibc-2.25-545-g9649350/build-aarch64-redhat-linux/elf/ld.so)
The line in question is:
asm volatile ("mrs %0, midr_el1" : "=r"(id));
That seems to match the instruction bit pattern, too. There is a check around it:
This needs a valgrind patch to identify reading midr_el1 (in VEX/priv/guest_arm64_toIR.c AFAICT), which is trapped and emulated by the kernel since 4.11.
Is someone already working on this or would you like me to do this?
Siddhesh
On 06/22/2017 01:28 PM, Siddhesh Poyarekar wrote:
This needs a valgrind patch to identify reading midr_el1 (in VEX/priv/guest_arm64_toIR.c AFAICT), which is trapped and emulated by the kernel since 4.11.
I'm puzzled why we are seeing this now. Has Linux 4.11 started to signal CPUID support in HWCAP? Our last successful build was on kernel 4.10.
Is someone already working on this or would you like me to do this?
Mark says you just volunteered. :)
Thanks, Florian
On Thursday 22 June 2017 05:08 PM, Florian Weimer wrote:
I'm puzzled why we are seeing this now. Has Linux 4.11 started to signal CPUID support in HWCAP? Our last successful build was on kernel 4.10.
Is someone already working on this or would you like me to do this?
Mark says you just volunteered. :)
I should have known it was a trap!
Siddhesh
On 22/06/17 12:38, Florian Weimer wrote:
On 06/22/2017 01:28 PM, Siddhesh Poyarekar wrote:
This needs a valgrind patch to identify reading midr_el1 (in VEX/priv/guest_arm64_toIR.c AFAICT), which is trapped and emulated by the kernel since 4.11.
I'm puzzled why we are seeing this now. Has Linux 4.11 started to signal CPUID support in HWCAP? Our last successful build was on kernel 4.10.
yes it's in 4.11:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/diff...
Is someone already working on this or would you like me to do this?
Mark says you just volunteered. :)
Thanks, Florian
On Thu, 2017-06-22 at 16:58 +0530, Siddhesh Poyarekar wrote:
On Thursday 22 June 2017 04:44 PM, Florian Weimer wrote:
glibc performs a quick test run using valgrind as part of the build process.
Lately, this started crashing:
- elf/ld.so --library-path .:elf:nptl:dlfcn /usr/bin/valgrind elf/ld.so
--library-path .:elf:nptl:dlfcn /usr/bin/true ==924== Memcheck, a memory error detector ==924== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==924== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info ==924== Command: elf/ld.so --library-path .:elf:nptl:dlfcn /usr/bin/true ==924== ARM64 front end: branch_etc disInstr(arm64): unhandled instruction 0xD5380000 disInstr(arm64): 1101'0101 0011'1000 0000'0000 0000'0000 ==924== valgrind: Unrecognised instruction at address 0x11f548. ==924== at 0x11F548: init_cpu_features (cpu-features.c:32) ==924== by 0x11F548: dl_platform_init (dl-machine.h:241) ==924== by 0x11F548: _dl_sysdep_start (dl-sysdep.c:231) ==924== by 0x10981B: _dl_start_final (rtld.c:412) ==924== by 0x109AAB: _dl_start (rtld.c:520) ==924== by 0x108F47: ??? (in /builddir/build/BUILD/glibc-2.25-545-g9649350/build-aarch64-redhat-linux/elf/ld.so)
The line in question is:
asm volatile ("mrs %0, midr_el1" : "=r"(id));
That seems to match the instruction bit pattern, too. There is a check around it:
This needs a valgrind patch to identify reading midr_el1 (in VEX/priv/guest_arm64_toIR.c AFAICT), which is trapped and emulated by the kernel since 4.11.
Is someone already working on this or would you like me to do this?
As far as I know, nobody is working on this yet. So if you can then please do. Even if you don't know how to fix the valgrind side just filing a bug report with the expected behavior of the missing emulation would be appreciated. https://bugs.kde.org/enter_bug.cgi?product=valgrind
Maybe valgrind should just filter out the HWCAP_CPUID? Trying to figure out why this started failing. I assume 4.10 didn't set that, but that it is now set by the 4.11 kernel?
Thanks,
Mark
On Thursday 22 June 2017 05:09 PM, Mark Wielaard wrote:
As far as I know, nobody is working on this yet. So if you can then please do. Even if you don't know how to fix the valgrind side just filing a bug report with the expected behavior of the missing emulation would be appreciated. https://bugs.kde.org/enter_bug.cgi?product=valgrind
I'll do that today.
Maybe valgrind should just filter out the HWCAP_CPUID? Trying to figure out why this started failing. I assume 4.10 didn't set that, but that it is now set by the 4.11 kernel?
Yes, HWCAP_CPUID is set in 4.11.
Siddhesh
On Thu, 2017-06-22 at 13:14 +0200, Florian Weimer wrote:
glibc performs a quick test run using valgrind as part of the build process.
Lately, this started crashing: [...] I think this code is fine. Unfortunately, I don't know if I'll be able to get a disassembly or debug this any further. There are a couple of potential causes (GLRO (dl_hwcap) is not initialized correctly in glibc, HWCAP_CPUID is not masked by the kernel or valgrind despite the lack of support, GCC schedule the volatile asm statement before the condition).
Is anyone else seeing this?
I will disable the valgrind sanity test during the Fedora build for the time being.
For now the auxv HWCAP is masked off on arm64 in valgrind fedora. https://bugzilla.redhat.com/show_bug.cgi?id=1464211#c1 So you can reenable the sanity check again on fedora rawhide.
Upstream bug is https://bugs.kde.org/show_bug.cgi?id=381556 arm64: Handle feature registers access on 4.11 Linux kernel or later
Thanks,
Mark
On Friday 23 June 2017 04:31 PM, Mark Wielaard wrote:
For now the auxv HWCAP is masked off on arm64 in valgrind fedora. https://bugzilla.redhat.com/show_bug.cgi?id=1464211#c1 So you can reenable the sanity check again on fedora rawhide.
Upstream bug is https://bugs.kde.org/show_bug.cgi?id=381556 arm64: Handle feature registers access on 4.11 Linux kernel or later
A better workaround would be to mask out HWCAP_CPUID (0x800) from the HWCAP.
Siddhesh
On 06/23/2017 01:10 PM, Siddhesh Poyarekar wrote:
On Friday 23 June 2017 04:31 PM, Mark Wielaard wrote:
For now the auxv HWCAP is masked off on arm64 in valgrind fedora. https://bugzilla.redhat.com/show_bug.cgi?id=1464211#c1 So you can reenable the sanity check again on fedora rawhide.
Upstream bug is https://bugs.kde.org/show_bug.cgi?id=381556 arm64: Handle feature registers access on 4.11 Linux kernel or later
A better workaround would be to mask out HWCAP_CPUID (0x800) from the HWCAP.
valgrind needs to mask out all unknown/unimplemented flags. And I thought it was 1? LD_HWCAP_MASK=1 acts as a workaround, after all.
Thanks, Florian
On Friday 23 June 2017 04:43 PM, Florian Weimer wrote:
valgrind needs to mask out all unknown/unimplemented flags. And I thought it was 1? LD_HWCAP_MASK=1 acts as a workaround, after all.
The remaining flags shouldn't actually matter to glibc since they're essentially assumed features (asimd, fp) but there may be programs out there that might read them.
Siddhesh
On Friday 23 June 2017 04:47 PM, Siddhesh Poyarekar wrote:
On Friday 23 June 2017 04:43 PM, Florian Weimer wrote:
valgrind needs to mask out all unknown/unimplemented flags. And I thought it was 1? LD_HWCAP_MASK=1 acts as a workaround, after all.
The remaining flags shouldn't actually matter to glibc since they're essentially assumed features (asimd, fp) but there may be programs out there that might read them.
To be clear, it's a technicality and not an actual known issue, it is your (and Mark's) call in the end. If you want to only enable the known features then mask = 0x7 should do it I think.
Siddhesh
On Fri, 2017-06-23 at 16:47 +0530, Siddhesh Poyarekar wrote:
On Friday 23 June 2017 04:43 PM, Florian Weimer wrote:
valgrind needs to mask out all unknown/unimplemented flags. And I thought it was 1? LD_HWCAP_MASK=1 acts as a workaround, after all.
The remaining flags shouldn't actually matter to glibc since they're essentially assumed features (asimd, fp) but there may be programs out there that might read them.
I found the following arm HWCAP bits in the kernel:
#define HWCAP_FP (1 << 0) #define HWCAP_ASIMD (1 << 1) #define HWCAP_EVTSTRM (1 << 2) #define HWCAP_AES (1 << 3) #define HWCAP_PMULL (1 << 4) #define HWCAP_SHA1 (1 << 5) #define HWCAP_SHA2 (1 << 6) #define HWCAP_CRC32 (1 << 7) #define HWCAP_ATOMICS (1 << 8) #define HWCAP_FPHP (1 << 9) #define HWCAP_ASIMDHP (1 << 10) #define HWCAP_CPUID (1 << 11) #define HWCAP_ASIMDRDM (1 << 12) #define HWCAP_JSCVT (1 << 13) #define HWCAP_FCMA (1 << 14) #define HWCAP_LRCPC (1 << 15)
BTW the glibc linux/aarch64/bitshwcap.h only go up to HWCAP_ASIMDRDM.
Is there are corresponding ARM abi document that maps those values to the corresponding arm64 cpu instruction sets? Valgrind supports some, but certainly not all. Since valgrind emulates/translates all instructions explicitly it makes sense to mask off anything unknown.
Thanks,
Mark
On Friday 23 June 2017 05:10 PM, Mark Wielaard wrote:
On Fri, 2017-06-23 at 16:47 +0530, Siddhesh Poyarekar wrote:
On Friday 23 June 2017 04:43 PM, Florian Weimer wrote:
valgrind needs to mask out all unknown/unimplemented flags. And I thought it was 1? LD_HWCAP_MASK=1 acts as a workaround, after all.
The remaining flags shouldn't actually matter to glibc since they're essentially assumed features (asimd, fp) but there may be programs out there that might read them.
I found the following arm HWCAP bits in the kernel:
#define HWCAP_FP (1 << 0) #define HWCAP_ASIMD (1 << 1) #define HWCAP_EVTSTRM (1 << 2) #define HWCAP_AES (1 << 3) #define HWCAP_PMULL (1 << 4) #define HWCAP_SHA1 (1 << 5) #define HWCAP_SHA2 (1 << 6) #define HWCAP_CRC32 (1 << 7) #define HWCAP_ATOMICS (1 << 8) #define HWCAP_FPHP (1 << 9) #define HWCAP_ASIMDHP (1 << 10) #define HWCAP_CPUID (1 << 11) #define HWCAP_ASIMDRDM (1 << 12) #define HWCAP_JSCVT (1 << 13) #define HWCAP_FCMA (1 << 14) #define HWCAP_LRCPC (1 << 15)
BTW the glibc linux/aarch64/bitshwcap.h only go up to HWCAP_ASIMDRDM.
Is there are corresponding ARM abi document that maps those values to the corresponding arm64 cpu instruction sets? Valgrind supports some, but certainly not all. Since valgrind emulates/translates all instructions explicitly it makes sense to mask off anything unknown.
Yeah I assumed that anything before CPUID was probably implemented in valgrind already, but if that's the conservative way to go then so be it.
So does this mean that if there are specific hwcaps we know are implemented in valgrind (now or in future), that the flags should be enabled one by one? For example, if valgrind disables hwcap_cpuid then bugs in micro-architecture specific routines may get masked out since they will never get called (unless you're using the not-merged-yet glibc.tune.cpu tunable) and it would change program behaviour considerably. So once support for midr_el1 is in place, maybe hwcap_cpuid should be brought back. Likewise for other flags.
Siddhesh
On 23/06/17 12:13, Florian Weimer wrote:
On 06/23/2017 01:10 PM, Siddhesh Poyarekar wrote:
On Friday 23 June 2017 04:31 PM, Mark Wielaard wrote:
For now the auxv HWCAP is masked off on arm64 in valgrind fedora. https://bugzilla.redhat.com/show_bug.cgi?id=1464211#c1 So you can reenable the sanity check again on fedora rawhide.
Upstream bug is https://bugs.kde.org/show_bug.cgi?id=381556 arm64: Handle feature registers access on 4.11 Linux kernel or later
A better workaround would be to mask out HWCAP_CPUID (0x800) from the HWCAP.
valgrind needs to mask out all unknown/unimplemented flags. And I thought it was 1? LD_HWCAP_MASK=1 acts as a workaround, after all.
because LD_HWCAP_MASK is the 'important hwcap bits' and now that is 0x800 by default on aarch64, if you unset that flag then the cpuid check is disabled.