On Tue, Dec 06, 2022 at 05:46:11PM -0000, Andrii Nakryiko wrote:
Depends on how large functions are. If you have lots of small functions,
it can be more than 50% of instructions you get the frame pointer wrong.
You might be technically correct, but if some application is spending 50% of the time
entering and exiting functions, then it's not doing a ton of useful work efficiently
anyways and should be rethought and improved. And profiling data will point this out very
clearly.
Let's just not use these convoluted unrealistic corner cases as a generalized claims
about how frame pointers are useless, ok?
You haven't looked carefully enough.
Oh, I didn't claim I studied glibc's assembly code very carefully for sure. I did
a very similar grepping to yours and found just the same **very few** functions that do
use %rbp.
find . -name \*.S | xargs grep -l %rbp | xargs grep -L
movq.*%rsp.*%rbp
./sysdeps/x86_64/fpu/multiarch/svml_s_tanhf16_core_avx512.S
./sysdeps/x86_64/fpu/multiarch/svml_s_tanhf8_core_avx2.S
./sysdeps/x86_64/fpu/multiarch/svml_s_atanhf16_core_avx512.S
./sysdeps/x86_64/fpu/multiarch/svml_s_atanhf8_core_avx2.S
./sysdeps/x86_64/fpu/svml_d_sincos2_core.S
./sysdeps/x86_64/fpu/svml_s_sincosf4_core.S
./sysdeps/x86_64/addmul_1.S
./sysdeps/x86_64/__longjmp.S
./sysdeps/x86_64/setjmp.S
./sysdeps/x86_64/_mcount.S
./sysdeps/unix/sysv/linux/x86_64/____longjmp_chk.S
./sysdeps/unix/sysv/linux/x86_64/setcontext.S
./sysdeps/unix/sysv/linux/x86_64/swapcontext.S
./sysdeps/unix/sysv/linux/x86_64/getcontext.S
Out of all of these, we have addmul_1.S as seemingly generic. But grepping around glibc
sources I could find any place that's actually calling into __mpn_addmul_1, so there
must be some more magic happening. I'll believe you that it's used somewhere in
printf family of functions, though I certainly didn't see any mention of
__mpn_addmul_1 in llvm-objdump and llvm-readelf outputs for a sample program that is just
doing printf or snprintf.
But all that aside. You've found 14 files that do use %rbp and again are using this as
a point that frame pointers are useless.
And I'm telling you that **in practice**, in real world applications, they are very
much useful and are relied upon, even if in some circumstances we fail to get frame
pointers. I don't understand how one can honestly can use this line of argument to say
that frame pointers are useless.
And by enabling frame pointers even wider we make them even more useful.
E.g. mpn_addmul_1 is used by printf, which certainly shows up a lot in
normal workloads. The above cases mean the bp register contains total
garbage if one tries to use it for backtrace purposes. And obviously
glibc isn't the only library with inline assembly out there...
As I said, you can't rely on frame pointers making sense, with unwind info
See my replies. Yes, frame pointers might be broken in some rare situations, yes we
don't have 100% success rate capturing stack traces precisely because there is
embedded asm that clobbers %rbp, or binaries and libraries are build with
-fomit-frame-pointers. We are asking to minimize the latter.
And for the former, it's on a case-by-case basis to try to mitigate this, but
ultimately no one expect that stack traces will be captured with 100% success rate. We
want this rate to be as high as possible though. And arguments like yours are not helping,
but also misleading people that are not familiar with the matters at hand, but trust
people like you just because you are "a compiler/toolchain engineer with considerable
experience", while it's clear that compiler/toolchain engineers would rather
spend time arguing about petty low-probability counter-examples than be helpful and see
the problem in a wider context.
such as in DWARF you know that in this case rbp is used as a frame
pointer
and in this case it is not, use some other register or this offset from
stack pointer etc.
AFAIK systemtap already uses DWARF unwinder in the kernel and as I said, for
the profiling purposes one can use only a small subset of that for the vast
majority of cases.
This does not work for any of multitude of BPF-based tools. And even with perf, that
supports very expensive DWARF-based unwinding, this doesn't work very well in
practice. There is evidence even from people commenting in this thread. And all this was
explained at length in
https://pagure.io/fesco/issue/2817.
>
> Jakub