On 1/4/24 10:47, Florian Weimer wrote:
* Jarek Prokop:

This spawns a few questions for me:
1. Since [1] the `-mbranch-protection=pac-ret` is needed in both
CFLAGS and ASFLAGS, I am unsure how it interacts with the Fedora
defaults,
I see default CFLAGS contain `-mbranch-protection=standard` and the
flag with pac-ret seems to be appended to libruby.so in the case of
the upstream fix [2].

From what I understand, it shouldn't cause problems to have these 2
flags at the same time on the correct compilation artifacts, is that
correct?
I wouldn't use both flags at the same time because the meaning is
unclear.  Is BTI still enabled if you do that?

Not sure how I would check that.

But `-mbranch-protection=standard` should currently be BTI+PAC:

"standard turns on all types of branch protection.
    Currently standard implies pac-ret+bti."
-- https://developer.arm.com/documentation/102433/0200/Applying-these-techniques-to-real-code

Whether we end up with generated code that's also BTI is not certain. In any case it actually seems like
we want to not use that configure branch at all for now, as we can use just =standard and achieve better effect
than with the uncertain result of the combined flags that is happening now.

and if I understand the text correctly, we on Fedora with that =standard do not actually therefore need =pac-ret
for the -mbranch-protection flag.

Digging deeper in Arm's documentation regarding this, it seems the generated assembly is actually reasonably compatible
across CPU variants, or well... has the ability to be if it is handwritten


Can't you put -mbranch-protection=standard into ASFLAGS?
We probably can, and it might be the way to go for Fedora Ruby based on the mentioned ARM developer article. At least for the moment.
I'll inspect and see.

The check is a bit weird:

diff --git a/configure.ac b/configure.ac
index 9286946fc15f4..18b4247991d42 100644
--- a/configure.ac
+++ b/configure.ac
@@ -830,7 +830,10 @@ AS_IF([test "$GCC" = yes], [
 	AS_FOR(option, opt, [-mbranch-protection=pac-ret -msign-return-address=all], [
             RUBY_TRY_CFLAGS(option, [branch_protection=yes], [branch_protection=no])
             AS_IF([test "x$branch_protection" = xyes], [
+                # C compiler and assembler must be consistent for -mbranch-protection
+                # since they both check `__ARM_FEATURE_PAC_DEFAULT` definition.
                 RUBY_APPEND_OPTION(XCFLAGS, option)
+                RUBY_APPEND_OPTION(ASFLAGS, option)
                 break
             ])
         ])

<https://github.com/ruby/ruby/commit/1f89a670381859d1c1d1c640359f169b6db6759c.patch>

The reliable way to do this would be to compile a C file and check
whether that enables __ARM_FEATURE_PAC_DEFAULT, and if that's the case,
define a *different* macro for use in the assembler implementation.
This way, you don't need to care about the exact name of the option.
That does sound better, I'll do some exploring for this.
Thanks for taking a look.

2. Since files compiled with `-mbranch-protection=pac-ret` seem to end
up in the .so library and Ruby binary extensions link against that
solib, do the binary extensions also have to be compiled with that
exact option?
It depends on how fiber handling is implemented.  If fiber switching
relies on code in header files or that is statically linked into Ruby
extensions, then yes, these extensions will need to be compiled with PAC
support as well.  But I expect that this isn't the case, but haven't
checked it.  The relevant code should be in the Ruby DSO only and be
linked dynamically.
That seems to be our case, we do not ship static artifacts to link against AFAICT.
It also seems unlikely that many C extensions would be aware of fiber
switching, but I haven't checked that, either.
Quickly grepping through sources suggests that if extensions are aware of fiber switching,
they are doing it through the Ruby C-api that has responsible code in the DSO.

So I can probably scratch that concern off of my list.
3. If we do not fix this bug in Ruby 3.3.0 but wait with this for
3.3.1 where the fix will most probably land, will we by effect exclude
a subset of ARM CPUs, that actually have the PAC capability, for that
in-between period?
I think you should fix this with a backport.  It's going to impact quite
a few users.

4. Why do koji and copr have CPU flag set that differs so much? Is our
koji infra OK?
It's different machines, so they are expect to have different
capabilities.  This isn't even aarch64-specific.
I see. I didn't know how much different actually which was answered by other people here.

5. Why does it fail on copr and does not fail on koji? It seems the
paca/pacg have to be present and set on the CPU flags for the
segfaults to occur.
PAC is not process-switched on Linux.  If it is turned on at the
kernel/hypervisor level, it's currently impossible to turn off.  This is
not something the buildroot can control.  It's not actually a hardware
limitation, it's just how it has been implemented on Linux.  (Although
we would turn it on for the Fedora buildroots at this point.)

Thanks for your insight into this issue!

Regards,
Jarek