We currently use -mno-omit-leaf-frame-pointer on various architectures. I think we should remove it because it does not work as expected.
Obviously, this does not work for many glibc string functions because they use hand-written assembly that does not set up a frame pointer.
The happy path of the malloc implementation in Fedora rawhide implementation looks like this:
Dump of assembler code for function __GI___libc_malloc: <+0>: endbr64 # special case for zero size argument <+4>: test %rdi,%rdi <+7>: js 0x82af8 <__GI___libc_malloc+264> # compute true allocation size <+13>: lea 0x17(%rdi),%rdx <+17>: mov %rdx,%rax <+20>: and $0xfffffffffffffff0,%rax <+24>: cmp $0x1f,%rdx <+28>: mov $0x20,%edx <+33>: cmovbe %rdx,%rax # check if size argument is in tcache range <+37>: cmp 0x1647d4(%rip),%rax # 0x1e71f0 <mp_+112> <+44>: jae 0x82a80 <__GI___libc_malloc+144> # load tcache pointer <+46>: mov 0x16435b(%rip),%rdx # 0x1e6d80 <+53>: mov %fs:(%rdx),%rcx # check for tcache initialization <+57>: test %rcx,%rcx <+60>: je 0x82b10 <__GI___libc_malloc+288> # compute tcache bin <+66>: mov %rax,%rdx <+69>: shr $0x4,%rdx <+73>: lea -0x2(%rdx),%rsi # check for basic tcache range <+77>: cmp $0x3f,%rsi <+81>: ja 0x82a88 <__GI___libc_malloc+152> <+83>: add $0x10,%rdx # load tcache free list and check if it is empty <+87>: mov 0x8(%rcx,%rdx,8),%rax <+92>: test %rax,%rax <+95>: je 0x82a80 <__GI___libc_malloc+144> # allocation alignment check <+97>: test $0xf,%al <+99>: jne 0x82b40 <__GI___libc_malloc+336> # pointer decoding <+105>: mov %rax,%rdi <+108>: shr $0xc,%rdi <+112>: xor (%rax),%rdi # remove from tcache free list and record that there is more room in it <+115>: mov %rdi,0x8(%rcx,%rdx,8) <+120>: addw $0x1,(%rcx,%rsi,2) # prevent leakage of double-free token in user allocation <+125>: movq $0x0,0x8(%rax) <+133>: ret
This is compiler-generated code. The -mno-omit-leaf-frame-pointer flag was used in the build flags as requested:
gcc -m32 malloc.c -c -std=gnu11 -fgnu89-inline -O2 -g -Wall -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -fasynchronous-unwind-tables -fstack-clash-protection -Wall -Wwrite-strings -Wundef -Wimplicit-fallthrough -Werror -fmerge-all-constants -frounding-math -fstack-protector-strong -fno-common -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -Wstrict-prototypes -Wold-style-definition -Wfree-labels -Wmissing-parameter-name -fmath-errno -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -fPIC -DMORECORE_CLEARS=2 -ftls-model=initial-exec -I../include -I…/build-x86_64-redhat-linux-32/malloc -I…/build-x86_64-redhat-linux-32 -I../sysdeps/unix/sysv/linux/i386/i686 -I../sysdeps/i386/i686/nptl -I../sysdeps/unix/sysv/linux/i386 -I../sysdeps/unix/sysv/linux/x86/include -I../sysdeps/unix/sysv/linux/x86 -I../sysdeps/x86/nptl -I../sysdeps/i386/nptl -I../sysdeps/unix/sysv/linux/include -I../sysdeps/unix/sysv/linux -I../sysdeps/nptl -I../sysdeps/pthread -I../sysdeps/gnu -I../sysdeps/unix/inet -I../sysdeps/unix/sysv -I../sysdeps/unix/i386 -I../sysdeps/unix -I../sysdeps/posix -I../sysdeps/i386/i686/fpu/multiarch -I../sysdeps/i386/i686/fpu -I../sysdeps/i386/i686/multiarch -I../sysdeps/i386/i686 -I../sysdeps/i386/fpu -I../sysdeps/x86/fpu -I../sysdeps/i386 -I../sysdeps/x86/include -I../sysdeps/x86 -I../sysdeps/wordsize-32 -I../sysdeps/ieee754/float128 -I../sysdeps/ieee754/ldbl-96/include -I../sysdeps/ieee754/ldbl-96 -I../sysdeps/ieee754/dbl-64 -I../sysdeps/ieee754/flt-32 -I../sysdeps/ieee754 -I../sysdeps/generic -I.. -I../libio -I. -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/15/include -isystem /usr/include -D_LIBC_REENTRANT -include …/build-x86_64-redhat-linux-32/libc-modules.h -DMODULE_NAME=libc -include ../include/libc-symbols.h -DPIC -DSHARED -DUSE_TCACHE=1 -DTOP_NAMESPACE=glibc -o …/build-x86_64-redhat-linux-32/malloc/malloc.os -MD -MP -MF …/build-x86_64-redhat-linux-32/malloc/malloc.os.dt -MT …/build-x86_64-redhat-linux-32/malloc/malloc.os
But clearly it has not the desired effect of setting up a frame pointer. The reason is what GCC calls shrink-wrapping. It's related to this old optimization:
Minimizing register usage penalty at procedure calls https://dl.acm.org/doi/abs/10.1145/53990.53999
Basically it pushes optional function prologue instructions (such as saving callee-saved registers, or setting up the frame pointer) to the code paths that actually need them.
Given that most of the hot glibc functions do not use frame pointers (at least not on their happy paths), PLT stubs do not have them, and there is a race condition at the start of functions until the frame pointer is set up, tools need to be aware that the top-most interrupted frame (or the first frame after a signal frame) may use the caller's frame pointer. This does not prevent frame pointer based unwinding because in the impacted frames, the frame pointer register is not changed. Without countermeasures, the immediate caller's frame just vanishes from backtraces.
Given how pervasive this effect is and that no problems have been reported so far, I think we can drop -mno-omit-leaf-frame-pointer from the build flags.
Thoughts?
Thanks, Florian
On Sun, Jun 22, 2025 at 11:41:48AM +0200, Florian Weimer wrote:
The reason is what GCC calls shrink-wrapping. It's related to this old optimization:
Minimizing register usage penalty at procedure calls https://dl.acm.org/doi/abs/10.1145/53990.53999
Basically it pushes optional function prologue instructions (such as saving callee-saved registers, or setting up the frame pointer) to the code paths that actually need them.
Given that most of the hot glibc functions do not use frame pointers (at least not on their happy paths), PLT stubs do not have them, and there is a race condition at the start of functions until the frame pointer is set up, tools need to be aware that the top-most interrupted frame (or the first frame after a signal frame) may use the caller's frame pointer. This does not prevent frame pointer based unwinding because in the impacted frames, the frame pointer register is not changed. Without countermeasures, the immediate caller's frame just vanishes from backtraces.
Given how pervasive this effect is and that no problems have been reported so far, I think we can drop -mno-omit-leaf-frame-pointer from the build flags.
Thoughts?
Seems reasonable.
The benefit of frame pointers is to get fast, very low overhead stack traces that point you to the general area of the problem. It's not an issue (in practice) that they're not completely accurate at the very tip of the stack.
Unrelated to this, do you know how SFrame support (ie. in userspace) is going? This was/is the great hope for fast, low overhead stack traces without frame pointers, but it's all gone a bit quiet. The last thing I heard was a talk by Steve Rostedt a few years ago.
Rich.
* Richard W. M. Jones:
On Sun, Jun 22, 2025 at 11:41:48AM +0200, Florian Weimer wrote:
The reason is what GCC calls shrink-wrapping. It's related to this old optimization:
Minimizing register usage penalty at procedure calls https://dl.acm.org/doi/abs/10.1145/53990.53999
Basically it pushes optional function prologue instructions (such as saving callee-saved registers, or setting up the frame pointer) to the code paths that actually need them.
Given that most of the hot glibc functions do not use frame pointers (at least not on their happy paths), PLT stubs do not have them, and there is a race condition at the start of functions until the frame pointer is set up, tools need to be aware that the top-most interrupted frame (or the first frame after a signal frame) may use the caller's frame pointer. This does not prevent frame pointer based unwinding because in the impacted frames, the frame pointer register is not changed. Without countermeasures, the immediate caller's frame just vanishes from backtraces.
Given how pervasive this effect is and that no problems have been reported so far, I think we can drop -mno-omit-leaf-frame-pointer from the build flags.
Thoughts?
Seems reasonable.
Thanks for the feedback.
The benefit of frame pointers is to get fast, very low overhead stack traces that point you to the general area of the problem. It's not an issue (in practice) that they're not completely accurate at the very tip of the stack.
I think it depends on the application. Some developers targeting specific libraries (instead of full applications) might be more interested in the top of the stack.
Unrelated to this, do you know how SFrame support (ie. in userspace) is going? This was/is the great hope for fast, low overhead stack traces without frame pointers, but it's all gone a bit quiet. The last thing I heard was a talk by Steve Rostedt a few years ago.
I plan to write up a Fedora 43 proposal for that. For realistic experiments, we need at least one distribution that does not have frame pointers (otherwise the SFrame data is rather different and not representative at all of a frame-pointer-less environment), so I plan to propose to disable them again in ELN.
Thanks, Florian
On 2025-06-22 16:17, Florian Weimer wrote:
Unrelated to this, do you know how SFrame support (ie. in userspace) is going? This was/is the great hope for fast, low overhead stack traces without frame pointers, but it's all gone a bit quiet. The last thing I heard was a talk by Steve Rostedt a few years ago.
I plan to write up a Fedora 43 proposal for that. For realistic experiments, we need at least one distribution that does not have frame pointers (otherwise the SFrame data is rather different and not representative at all of a frame-pointer-less environment), so I plan to propose to disable them again in ELN.
Would it make sense to leverage the CentOS ISA SIG instead to run these experiments? While CentOS Stream 11 is still ways out, it seems premature to disable frame pointers in ELN now, as my understanding is that while SFrame is being developed, it is not a viable replacement yet, and it might be a while till it'll get there.
Cheers Davide
* Davide Cavalca:
On 2025-06-22 16:17, Florian Weimer wrote:
Unrelated to this, do you know how SFrame support (ie. in userspace) is going? This was/is the great hope for fast, low overhead stack traces without frame pointers, but it's all gone a bit quiet. The last thing I heard was a talk by Steve Rostedt a few years ago.
I plan to write up a Fedora 43 proposal for that. For realistic experiments, we need at least one distribution that does not have frame pointers (otherwise the SFrame data is rather different and not representative at all of a frame-pointer-less environment), so I plan to propose to disable them again in ELN.
Would it make sense to leverage the CentOS ISA SIG instead to run these experiments? While CentOS Stream 11 is still ways out, it seems premature to disable frame pointers in ELN now, as my understanding is that while SFrame is being developed, it is not a viable replacement yet, and it might be a while till it'll get there.
Are you suggesting to move RHEL 11 development away from Fedora/ELN to CentOS Stream? Wouldn't that be implied by this approach?
I think we should align the ELN build configuration with the current plans for RHEL 11 instead. Enabling frame pointers on x86-64 in ELN when there is no such plan for RHEL has always been against ELN's purpose to approximate a RHEL configuration.
Thanks, Florian
On 2025-06-24 22:02, Florian Weimer wrote:
Are you suggesting to move RHEL 11 development away from Fedora/ELN to CentOS Stream? Wouldn't that be implied by this approach?
Not at all. My point was that if you want a platform to run experiments, the CentOS ISA SIG seems like a better fit, as it would give you more freedom to try things out. Changing things in ELN has a larger blast radius, and IMO it'd be best to do that once we're closer to a settled solution.
I think we should align the ELN build configuration with the current plans for RHEL 11 instead. Enabling frame pointers on x86-64 in ELN when there is no such plan for RHEL has always been against ELN's purpose to approximate a RHEL configuration.
For reference, frame pointers were enabled in ELN in https://src.fedoraproject.org/rpms/redhat-rpm-config/c/a65dee421c05707d6aaae... based on the discussion in https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/... with the plan to revisit things closer to RHEL 11 branching.
Cheers Davide
* Davide Cavalca:
On 2025-06-24 22:02, Florian Weimer wrote:
Are you suggesting to move RHEL 11 development away from Fedora/ELN to CentOS Stream? Wouldn't that be implied by this approach?
Not at all. My point was that if you want a platform to run experiments, the CentOS ISA SIG seems like a better fit, as it would give you more freedom to try things out. Changing things in ELN has a larger blast radius, and IMO it'd be best to do that once we're closer to a settled solution.
It's an experiment only in the sense that all RHEL 11 development is experimental at this stage.
I think we should align the ELN build configuration with the current plans for RHEL 11 instead. Enabling frame pointers on x86-64 in ELN when there is no such plan for RHEL has always been against ELN's purpose to approximate a RHEL configuration.
For reference, frame pointers were enabled in ELN in https://src.fedoraproject.org/rpms/redhat-rpm-config/c/a65dee421c05707d6aaae... based on the discussion in https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/... with the plan to revisit things closer to RHEL 11 branching.
And new information has come up since then: enabling frame pointers produces completely different unwind information. As a result, a distribution built with frame pointers and SFrame does not give us much data whether a distribution built without frame pointers and SFrame could serve the same use cases.
The SFrame work needs an x86-64 distribution built without frame pointers. Not having frame pointers is still the plan on record for RHELL 11. Why doesn't ELN match? Why should we build another distribution like ELN for RHEL 11 development?
Thanks, Florian
Richard W.M. Jones wrote:
On Sun, Jun 22, 2025 at 11:41:48AM +0200, Florian Weimer wrote:
The reason is what GCC calls shrink-wrapping. It's related to this old optimization: Minimizing register usage penalty at procedure calls https://dl.acm.org/doi/abs/10.1145/53990.53999 Basically it pushes optional function prologue instructions (such as saving callee-saved registers, or setting up the frame pointer) to the code paths that actually need them. Given that most of the hot glibc functions do not use frame pointers (at least not on their happy paths), PLT stubs do not have them, and there is a race condition at the start of functions until the frame pointer is set up, tools need to be aware that the top-most interrupted frame (or the first frame after a signal frame) may use the caller's frame pointer. This does not prevent frame pointer based unwinding because in the impacted frames, the frame pointer register is not changed. Without countermeasures, the immediate caller's frame just vanishes from backtraces. Given how pervasive this effect is and that no problems have been reported so far, I think we can drop -mno-omit-leaf-frame-pointer from the build flags. Thoughts? Seems reasonable.
The benefit of frame pointers is to get fast, very low overhead stack traces that point you to the general area of the problem. It's not an issue (in practice) that they're not completely accurate at the very tip of the stack. Unrelated to this, do you know how SFrame support (ie. in userspace) is going? This was/is the great hope for fast, low overhead stack traces without frame pointers, but it's all gone a bit quiet. The last thing I heard was a talk by Steve Rostedt a few years ago. Rich.
For updated information about SFrame https://sourceware.org/binutils/wiki/sframe. There is active involvement in many of the projects involving SFrame at this time, and the status is evolving.
Indu
On Tue, Jun 24, 2025 at 07:48:56AM -0000, Indu Bhagat wrote:
For updated information about SFrame https://sourceware.org/binutils/wiki/sframe. There is active involvement in many of the projects involving SFrame at this time, and the status is evolving.
Bookmarked it, thanks!
Rich.
LWN this week has a good article on the state of SFrame:
https://lwn.net/Articles/1029189/ (gift link: https://lwn.net/SubscriberLink/1029189/03199ce0a0862a83/)
Rich.
On Sun, Jun 22, 2025 at 11:41:48AM +0200, Florian Weimer wrote:
We currently use -mno-omit-leaf-frame-pointer on various architectures. I think we should remove it because it does not work as expected.
I was confused for a minute as my eye tuned out the word '-leaf' in the arg above, so I mistakenly thought you were proposing to remove frame pointers entirely.
Given how pervasive this effect is and that no problems have been reported so far, I think we can drop -mno-omit-leaf-frame-pointer from the build flags.
Thoughts?
I recall the previous discussion around frame pointers was almost exclusively around the main concept of frame pointers in general (ie the main -mno-omit-frame-pointers arg). I don't recall any specific points around the usage of the extra -mno-omit-leaf-frame-pointer arg, though it was indeed mentioned in the Fedora change proposal[2].
I guess I'd want to hear from the owners of the original change proposal about their rationale / experiance with enabling both args, as opposed to only -mno-omit-frame-pointers.
With regards, Daniel
[1] https://fedoraproject.org/wiki/Changes/fno-omit-frame-pointer
* Daniel P. Berrangé:
Given how pervasive this effect is and that no problems have been reported so far, I think we can drop -mno-omit-leaf-frame-pointer from the build flags.
Thoughts?
I recall the previous discussion around frame pointers was almost exclusively around the main concept of frame pointers in general (ie the main -mno-omit-frame-pointers arg). I don't recall any specific points around the usage of the extra -mno-omit-leaf-frame-pointer arg, though it was indeed mentioned in the Fedora change proposal[2].
I brought up -mno-omit-leaf-frame-pointer at one point to illustrate that -fno-omit-frame-pointer doesn't really do what was expected. But it seems that this didn't matter to most people. They only want the frame pointer register dedicated to the frame pointer if changed, and -fno-omit-frame-pointer achieves that, with or without -mno-omit-leaf-frame-pointer.
I guess I'd want to hear from the owners of the original change proposal about their rationale / experiance with enabling both args, as opposed to only -mno-omit-frame-pointers.
I tried to contact at least one of them (now dropped from Cc:). The change proposal didn't link to Fedora wiki/accounts for the others.
Thansk, Florian
On Sun, Jun 22, 2025 at 11:43 AM Florian Weimer fweimer@redhat.com wrote:
We currently use -mno-omit-leaf-frame-pointer on various architectures. I think we should remove it because it does not work as expected.
Obviously, this does not work for many glibc string functions because they use hand-written assembly that does not set up a frame pointer.
The happy path of the malloc implementation in Fedora rawhide implementation looks like this:
Dump of assembler code for function __GI___libc_malloc: <+0>: endbr64 # special case for zero size argument <+4>: test %rdi,%rdi <+7>: js 0x82af8 <__GI___libc_malloc+264> # compute true allocation size <+13>: lea 0x17(%rdi),%rdx <+17>: mov %rdx,%rax <+20>: and $0xfffffffffffffff0,%rax <+24>: cmp $0x1f,%rdx <+28>: mov $0x20,%edx <+33>: cmovbe %rdx,%rax # check if size argument is in tcache range <+37>: cmp 0x1647d4(%rip),%rax # 0x1e71f0 <mp_+112> <+44>: jae 0x82a80 <__GI___libc_malloc+144> # load tcache pointer <+46>: mov 0x16435b(%rip),%rdx # 0x1e6d80 <+53>: mov %fs:(%rdx),%rcx # check for tcache initialization <+57>: test %rcx,%rcx <+60>: je 0x82b10 <__GI___libc_malloc+288> # compute tcache bin <+66>: mov %rax,%rdx <+69>: shr $0x4,%rdx <+73>: lea -0x2(%rdx),%rsi # check for basic tcache range <+77>: cmp $0x3f,%rsi <+81>: ja 0x82a88 <__GI___libc_malloc+152> <+83>: add $0x10,%rdx # load tcache free list and check if it is empty <+87>: mov 0x8(%rcx,%rdx,8),%rax <+92>: test %rax,%rax <+95>: je 0x82a80 <__GI___libc_malloc+144> # allocation alignment check <+97>: test $0xf,%al <+99>: jne 0x82b40 <__GI___libc_malloc+336> # pointer decoding <+105>: mov %rax,%rdi <+108>: shr $0xc,%rdi <+112>: xor (%rax),%rdi # remove from tcache free list and record that there is more room in it <+115>: mov %rdi,0x8(%rcx,%rdx,8) <+120>: addw $0x1,(%rcx,%rsi,2) # prevent leakage of double-free token in user allocation <+125>: movq $0x0,0x8(%rax) <+133>: ret
This is compiler-generated code. The -mno-omit-leaf-frame-pointer flag was used in the build flags as requested:
gcc -m32 malloc.c -c -std=gnu11 -fgnu89-inline -O2 -g -Wall -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -fasynchronous-unwind-tables -fstack-clash-protection -Wall -Wwrite-strings -Wundef -Wimplicit-fallthrough -Werror -fmerge-all-constants -frounding-math -fstack-protector-strong -fno-common -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -Wstrict-prototypes -Wold-style-definition -Wfree-labels -Wmissing-parameter-name -fmath-errno -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -fPIC -DMORECORE_CLEARS=2 -ftls-model=initial-exec -I../include -I…/build-x86_64-redhat-linux-32/malloc -I…/build-x86_64-redhat-linux-32 -I../sysdeps/unix/sysv/linux/i386/i686 -I../sysdeps/i386/i686/nptl -I../sysdeps/unix/sysv/linux/i386 -I../sysdeps/unix/sysv/linux/x86/include -I../sysdeps/unix/sysv/linux/x86 -I../sysdeps/x86/nptl -I../sysdeps/i386/nptl -I../sysdeps/unix/sysv/linux/include -I../sysdeps/unix/sysv/linux -I../sysdeps/nptl -I../sysdeps/pthread -I../sysdeps/gnu -I../sysdeps/unix/inet -I../sysdeps/unix/sysv -I../sysdeps/unix/i386 -I../sysdeps/unix -I../sysdeps/posix -I../sysdeps/i386/i686/fpu/multiarch -I../sysdeps/i386/i686/fpu -I../sysdeps/i386/i686/multiarch -I../sysdeps/i386/i686 -I../sysdeps/i386/fpu -I../sysdeps/x86/fpu -I../sysdeps/i386 -I../sysdeps/x86/include -I../sysdeps/x86 -I../sysdeps/wordsize-32 -I../sysdeps/ieee754/float128 -I../sysdeps/ieee754/ldbl-96/include -I../sysdeps/ieee754/ldbl-96 -I../sysdeps/ieee754/dbl-64 -I../sysdeps/ieee754/flt-32 -I../sysdeps/ieee754 -I../sysdeps/generic -I.. -I../libio -I. -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/15/include -isystem /usr/include -D_LIBC_REENTRANT -include …/build-x86_64-redhat-linux-32/libc-modules.h -DMODULE_NAME=libc -include ../include/libc-symbols.h -DPIC -DSHARED -DUSE_TCACHE=1 -DTOP_NAMESPACE=glibc -o …/build-x86_64-redhat-linux-32/malloc/malloc.os -MD -MP -MF …/build-x86_64-redhat-linux-32/malloc/malloc.os.dt -MT …/build-x86_64-redhat-linux-32/malloc/malloc.os
But clearly it has not the desired effect of setting up a frame pointer. The reason is what GCC calls shrink-wrapping. It's related to this old optimization:
Minimizing register usage penalty at procedure calls https://dl.acm.org/doi/abs/10.1145/53990.53999
Basically it pushes optional function prologue instructions (such as saving callee-saved registers, or setting up the frame pointer) to the code paths that actually need them.
Given that most of the hot glibc functions do not use frame pointers (at least not on their happy paths), PLT stubs do not have them, and there is a race condition at the start of functions until the frame pointer is set up, tools need to be aware that the top-most interrupted frame (or the first frame after a signal frame) may use the caller's frame pointer. This does not prevent frame pointer based unwinding because in the impacted frames, the frame pointer register is not changed. Without countermeasures, the immediate caller's frame just vanishes from backtraces.
Given how pervasive this effect is and that no problems have been reported so far, I think we can drop -mno-omit-leaf-frame-pointer from the build flags.
Thoughts?
So my semi-naive interpretation of the problem here is that glibc is special in that it doesn't provide any support for real-time tracing or profiling. I'm not sure that justifies removing this flag globally, as there are plenty of middleware stacks that do their own thing or even things that don't use glibc string functions at all.
So if you think it makes sense to omit the flag for glibc, sure, but I am not convinced it should be done system-wide.
-- 真実はいつも一つ!/ Always, there's only one truth!
On Tue, Jun 24, 2025 at 12:17:18PM +0200, Neal Gompa wrote:
On Sun, Jun 22, 2025 at 11:43 AM Florian Weimer fweimer@redhat.com wrote:
We currently use -mno-omit-leaf-frame-pointer on various architectures. I think we should remove it because it does not work as expected.
Obviously, this does not work for many glibc string functions because they use hand-written assembly that does not set up a frame pointer.
The happy path of the malloc implementation in Fedora rawhide implementation looks like this:
Dump of assembler code for function __GI___libc_malloc: <+0>: endbr64 # special case for zero size argument <+4>: test %rdi,%rdi <+7>: js 0x82af8 <__GI___libc_malloc+264> # compute true allocation size <+13>: lea 0x17(%rdi),%rdx <+17>: mov %rdx,%rax <+20>: and $0xfffffffffffffff0,%rax <+24>: cmp $0x1f,%rdx <+28>: mov $0x20,%edx <+33>: cmovbe %rdx,%rax # check if size argument is in tcache range <+37>: cmp 0x1647d4(%rip),%rax # 0x1e71f0 <mp_+112> <+44>: jae 0x82a80 <__GI___libc_malloc+144> # load tcache pointer <+46>: mov 0x16435b(%rip),%rdx # 0x1e6d80 <+53>: mov %fs:(%rdx),%rcx # check for tcache initialization <+57>: test %rcx,%rcx <+60>: je 0x82b10 <__GI___libc_malloc+288> # compute tcache bin <+66>: mov %rax,%rdx <+69>: shr $0x4,%rdx <+73>: lea -0x2(%rdx),%rsi # check for basic tcache range <+77>: cmp $0x3f,%rsi <+81>: ja 0x82a88 <__GI___libc_malloc+152> <+83>: add $0x10,%rdx # load tcache free list and check if it is empty <+87>: mov 0x8(%rcx,%rdx,8),%rax <+92>: test %rax,%rax <+95>: je 0x82a80 <__GI___libc_malloc+144> # allocation alignment check <+97>: test $0xf,%al <+99>: jne 0x82b40 <__GI___libc_malloc+336> # pointer decoding <+105>: mov %rax,%rdi <+108>: shr $0xc,%rdi <+112>: xor (%rax),%rdi # remove from tcache free list and record that there is more room in it <+115>: mov %rdi,0x8(%rcx,%rdx,8) <+120>: addw $0x1,(%rcx,%rsi,2) # prevent leakage of double-free token in user allocation <+125>: movq $0x0,0x8(%rax) <+133>: ret
This is compiler-generated code. The -mno-omit-leaf-frame-pointer flag was used in the build flags as requested:
gcc -m32 malloc.c -c -std=gnu11 -fgnu89-inline -O2 -g -Wall -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -fasynchronous-unwind-tables -fstack-clash-protection -Wall -Wwrite-strings -Wundef -Wimplicit-fallthrough -Werror -fmerge-all-constants -frounding-math -fstack-protector-strong -fno-common -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -Wstrict-prototypes -Wold-style-definition -Wfree-labels -Wmissing-parameter-name -fmath-errno -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -fPIC -DMORECORE_CLEARS=2 -ftls-model=initial-exec -I../include -I…/build-x86_64-redhat-linux-32/malloc -I…/build-x86_64-redhat-linux-32 -I../sysdeps/unix/sysv/linux/i386/i686 -I../sysdeps/i386/i686/nptl -I../sysdeps/unix/sysv/linux/i386 -I../sysdeps/unix/sysv/linux/x86/include -I../sysdeps/unix/sysv/linux/x86 -I../sysdeps/x86/nptl -I../sysdeps/i386/nptl -I../sysdeps/unix/sysv/linux/include -I../sysdeps/unix/sysv/linux -I../sysdeps/nptl -I../sysdeps/pthread -I../sysdeps/gnu -I../sysdeps/unix/inet -I../sysdeps/unix/sysv -I../sysdeps/unix/i386 -I../sysdeps/unix -I../sysdeps/posix -I../sysdeps/i386/i686/fpu/multiarch -I../sysdeps/i386/i686/fpu -I../sysdeps/i386/i686/multiarch -I../sysdeps/i386/i686 -I../sysdeps/i386/fpu -I../sysdeps/x86/fpu -I../sysdeps/i386 -I../sysdeps/x86/include -I../sysdeps/x86 -I../sysdeps/wordsize-32 -I../sysdeps/ieee754/float128 -I../sysdeps/ieee754/ldbl-96/include -I../sysdeps/ieee754/ldbl-96 -I../sysdeps/ieee754/dbl-64 -I../sysdeps/ieee754/flt-32 -I../sysdeps/ieee754 -I../sysdeps/generic -I.. -I../libio -I. -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/15/include -isystem /usr/include -D_LIBC_REENTRANT -include …/build-x86_64-redhat-linux-32/libc-modules.h -DMODULE_NAME=libc -include ../include/libc-symbols.h -DPIC -DSHARED -DUSE_TCACHE=1 -DTOP_NAMESPACE=glibc -o …/build-x86_64-redhat-linux-32/malloc/malloc.os -MD -MP -MF …/build-x86_64-redhat-linux-32/malloc/malloc.os.dt -MT …/build-x86_64-redhat-linux-32/malloc/malloc.os
But clearly it has not the desired effect of setting up a frame pointer. The reason is what GCC calls shrink-wrapping. It's related to this old optimization:
Minimizing register usage penalty at procedure calls https://dl.acm.org/doi/abs/10.1145/53990.53999
Basically it pushes optional function prologue instructions (such as saving callee-saved registers, or setting up the frame pointer) to the code paths that actually need them.
Given that most of the hot glibc functions do not use frame pointers (at least not on their happy paths), PLT stubs do not have them, and there is a race condition at the start of functions until the frame pointer is set up, tools need to be aware that the top-most interrupted frame (or the first frame after a signal frame) may use the caller's frame pointer. This does not prevent frame pointer based unwinding because in the impacted frames, the frame pointer register is not changed. Without countermeasures, the immediate caller's frame just vanishes from backtraces.
Given how pervasive this effect is and that no problems have been reported so far, I think we can drop -mno-omit-leaf-frame-pointer from the build flags.
Thoughts?
So my semi-naive interpretation of the problem here is that glibc is special in that it doesn't provide any support for real-time tracing or profiling. I'm not sure that justifies removing this flag globally, as there are plenty of middleware stacks that do their own thing or even things that don't use glibc string functions at all.
So if you think it makes sense to omit the flag for glibc, sure, but I am not convinced it should be done system-wide.
My understanding of Florian's post [and maybe I'm wrong about that] was that the compiler flag doesn't do what we think it does, and that problem is more general and affects more than just glibc, anything C code that is compiled by GCC. glibc has a separate issue that hot paths which are written in assembler don't include frame pointers, and I think we all understand why that is and are fine with it.
Florian, could you clarify if that's right?
Rich.
* Neal Gompa:
So my semi-naive interpretation of the problem here is that glibc is special in that it doesn't provide any support for real-time tracing or profiling. I'm not sure that justifies removing this flag globally, as there are plenty of middleware stacks that do their own thing or even things that don't use glibc string functions at all.
Sorry, I tried to make clear that this is regular C code compiled with the -mno-omit-leaf-frame-pointer flag, like the rest of the distribution.
The only way in which glibc is special is that Wilco Dijkstra from Arm has recently optimized malloc to make the happy paths as fast as possible, intentionally triggering the shrink-wrapping optimization through careful use of tail-calls to no-inline functions. Any other packages can do the same, either manually or even automatically in some cases, with profile-guided optimization.
I think even before these malloc changes, typical workloads had like 10% of their profiling samples in code regions where the frame pointer was not yet set up. And dropping -mno-omit-leaf-frame-pointer might not even increase that percentage because short functions without memory accesses are more likely to drop from profiles altogether.
Thanks, Florian
On 2025-06-24 12:06, Florian Weimer wrote:
Sorry, I tried to make clear that this is regular C code compiled with the -mno-omit-leaf-frame-pointer flag, like the rest of the distribution.
The only way in which glibc is special is that Wilco Dijkstra from Arm has recently optimized malloc to make the happy paths as fast as possible, intentionally triggering the shrink-wrapping optimization through careful use of tail-calls to no-inline functions. Any other packages can do the same, either manually or even automatically in some cases, with profile-guided optimization.
I think even before these malloc changes, typical workloads had like 10% of their profiling samples in code regions where the frame pointer was not yet set up. And dropping -mno-omit-leaf-frame-pointer might not even increase that percentage because short functions without memory accesses are more likely to drop from profiles altogether.
I agree that based on these findings it can make sense to drop -mno-omit-leaf-frame-pointer for glibc, but I don't this it justifies dropping it distro-wide, at least not without further testing and collecting data on the potential impact.
Cheers Davide
On Tue, Jun 24, 2025 at 11:08 AM Davide Cavalca dcavalca@fedoraproject.org wrote:
On 2025-06-24 12:06, Florian Weimer wrote:
Sorry, I tried to make clear that this is regular C code compiled with the -mno-omit-leaf-frame-pointer flag, like the rest of the distribution.
The only way in which glibc is special is that Wilco Dijkstra from Arm has recently optimized malloc to make the happy paths as fast as possible, intentionally triggering the shrink-wrapping optimization through careful use of tail-calls to no-inline functions. Any other packages can do the same, either manually or even automatically in some cases, with profile-guided optimization.
I think even before these malloc changes, typical workloads had like 10% of their profiling samples in code regions where the frame pointer was not yet set up. And dropping -mno-omit-leaf-frame-pointer might not even increase that percentage because short functions without memory accesses are more likely to drop from profiles altogether.
I agree that based on these findings it can make sense to drop -mno-omit-leaf-frame-pointer for glibc, but I don't this it justifies dropping it distro-wide, at least not without further testing and collecting data on the potential impact.
+1, I'd start with omitting leaf frame pointers only in glibc, if we absolutely must.
Unfortunately I'm not a compiler hacker, so I can't really address -mno-omit-leaf-frame-pointer not working (at least in some situations). But in general, do you notice a measurable effect from removing -mno-omit-leaf-frame-pointer performance-wise? Or is there any other associated downside to leaving it on, even if it might not kick in in some leaf functions?
In our (Meta) production, we have both -fno-omit-frame-pointer and -mno-omit-leaf-frame-pointer, and I haven't seen any particular complaints around that. So unless there is a really good reason to drop leaf stuff, I'd keep it, of course.
Cheers Davide
* Andrii Nakryiko:
On Tue, Jun 24, 2025 at 11:08 AM Davide Cavalca dcavalca@fedoraproject.org wrote:
On 2025-06-24 12:06, Florian Weimer wrote:
Sorry, I tried to make clear that this is regular C code compiled with the -mno-omit-leaf-frame-pointer flag, like the rest of the distribution.
The only way in which glibc is special is that Wilco Dijkstra from Arm has recently optimized malloc to make the happy paths as fast as possible, intentionally triggering the shrink-wrapping optimization through careful use of tail-calls to no-inline functions. Any other packages can do the same, either manually or even automatically in some cases, with profile-guided optimization.
I think even before these malloc changes, typical workloads had like 10% of their profiling samples in code regions where the frame pointer was not yet set up. And dropping -mno-omit-leaf-frame-pointer might not even increase that percentage because short functions without memory accesses are more likely to drop from profiles altogether.
I agree that based on these findings it can make sense to drop -mno-omit-leaf-frame-pointer for glibc, but I don't this it justifies dropping it distro-wide, at least not without further testing and collecting data on the potential impact.
+1, I'd start with omitting leaf frame pointers only in glibc, if we absolutely must.
I explained multiple times that we are effectively doing this already for the hottest functions in glibc because -mno-omit-leaf-frame-pointer does not have the intended effect. It's about the rest of the distribution.
I don't understand why people assume this is about glibc. I just used it as an example because it was easily accessible to me.
Unfortunately I'm not a compiler hacker, so I can't really address -mno-omit-leaf-frame-pointer not working (at least in some situations). But in general, do you notice a measurable effect from removing -mno-omit-leaf-frame-pointer performance-wise? Or is there any other associated downside to leaving it on, even if it might not kick in in some leaf functions?
In our (Meta) production, we have both -fno-omit-frame-pointer and -mno-omit-leaf-frame-pointer, and I haven't seen any particular complaints around that. So unless there is a really good reason to drop leaf stuff, I'd keep it, of course.
I think there are multiple reasons:
It's an unusual configuration, not promoted by upstream and not even used by people who build with -fno-omit-frame-pointer. Fedora usually goes with upstream defaults.
The option does not have the intended effect because we do not disable GCC's shrink-wrapping optimization.
Due to the way the x86-64 architecture works, there is always a number of instructions at the beginning of the functions where the frame pointer has not been set up. Tools already have to accept the frame lossage, or use DWARF or heuristics to detect this situation.
Thanks, Florian
On Tue, Jun 24, 2025 at 10:13 PM Florian Weimer fweimer@redhat.com wrote:
- Andrii Nakryiko:
On Tue, Jun 24, 2025 at 11:08 AM Davide Cavalca dcavalca@fedoraproject.org wrote:
On 2025-06-24 12:06, Florian Weimer wrote:
Sorry, I tried to make clear that this is regular C code compiled with the -mno-omit-leaf-frame-pointer flag, like the rest of the distribution.
The only way in which glibc is special is that Wilco Dijkstra from Arm has recently optimized malloc to make the happy paths as fast as possible, intentionally triggering the shrink-wrapping optimization through careful use of tail-calls to no-inline functions. Any other packages can do the same, either manually or even automatically in some cases, with profile-guided optimization.
I think even before these malloc changes, typical workloads had like 10% of their profiling samples in code regions where the frame pointer was not yet set up. And dropping -mno-omit-leaf-frame-pointer might not even increase that percentage because short functions without memory accesses are more likely to drop from profiles altogether.
I agree that based on these findings it can make sense to drop -mno-omit-leaf-frame-pointer for glibc, but I don't this it justifies dropping it distro-wide, at least not without further testing and collecting data on the potential impact.
+1, I'd start with omitting leaf frame pointers only in glibc, if we absolutely must.
I explained multiple times that we are effectively doing this already for the hottest functions in glibc because -mno-omit-leaf-frame-pointer does not have the intended effect. It's about the rest of the distribution.
I don't understand why people assume this is about glibc. I just used it as an example because it was easily accessible to me.
Unfortunately I'm not a compiler hacker, so I can't really address -mno-omit-leaf-frame-pointer not working (at least in some situations). But in general, do you notice a measurable effect from removing -mno-omit-leaf-frame-pointer performance-wise? Or is there any other associated downside to leaving it on, even if it might not kick in in some leaf functions?
In our (Meta) production, we have both -fno-omit-frame-pointer and -mno-omit-leaf-frame-pointer, and I haven't seen any particular complaints around that. So unless there is a really good reason to drop leaf stuff, I'd keep it, of course.
I think there are multiple reasons:
It's an unusual configuration, not promoted by upstream and not even used by people who build with -fno-omit-frame-pointer. Fedora usually goes with upstream defaults.
I think Meta engineers are people, so your statement is not very accurate. But no offense taken :)
The option does not have the intended effect because we do not disable GCC's shrink-wrapping optimization.
For my own education. Would -mno-omit-leaf-frame-pointer have no effect always (100%, guaranteed) because of that shrink-wrapping optimization? Or it *might not* have an effect?
But even if it's 100% of the time, can't we have some projects opting out from shrink-wrapping and thus getting a benefit of -mno-omit-leaf-frame-pointer? And where shrink-wrapping optimization is active, this setting doesn't hurt, right?
So I guess I'm just confused why we need to fix something that isn't broken...
Due to the way the x86-64 architecture works, there is always a number of instructions at the beginning of the functions where the frame pointer has not been set up. Tools already have to accept the frame lossage, or use DWARF or heuristics to detect this situation.
Yes, that's true and is a fact of life with frame pointers. But that doesn't mean that we should artificially worsen the situation by removing leaf frame pointers, right?
Thanks, Florian
Neal Gompa ngompa13@gmail.com writes:
[...] So my semi-naive interpretation of the problem here is that glibc is special in that it doesn't provide any support for real-time tracing or profiling. [...]
Note that glibc does support unwinding-based tracing/profiling via the .cfi directives even in assembly code, which let an unwinder compute backtraces from fully-optimized binaries. Recent work in elfutils & sysprof have shown lowish and decreasing overheads of that approach.
- FChE
On Mon, Jul 7, 2025 at 6:43 PM Frank Ch. Eigler fche@redhat.com wrote:
Neal Gompa ngompa13@gmail.com writes:
[...] So my semi-naive interpretation of the problem here is that glibc is special in that it doesn't provide any support for real-time tracing or profiling. [...]
Note that glibc does support unwinding-based tracing/profiling via the .cfi directives even in assembly code, which let an unwinder compute backtraces from fully-optimized binaries. Recent work in elfutils & sysprof have shown lowish and decreasing overheads of that approach.
But those are hand-written directives, no? That approach doesn't scale at all.
* Kevin Kofler via devel:
Neal Gompa wrote:
But those are hand-written directives, no? That approach doesn't scale at all.
Handwritten unwinding information for handwritten assembly. C code has compiler-generated unwinding information.
And if you don't manipulate the stack pointer, you only have to add .cfi_startproc and .cfi_endproc. It's not that bad.
Thanks, Florian