That is the problem right here: .eh_frame-based unwinding is too
slow, so it has to be
done offline in userspace. What about instead adding ORC information to userspace?
That
would be much faster to use.
I'm not familiar with ORC, but there are a few things that initially come to
mind in looking towards such a solution.
First, are there any examples of perf being able to reference ORC data coming
from user-space or is it currently limited to PERF_CONTEXT_KERNEL? For
system-wide profiling, we still require that the kernel can do high-velocity
unwinding across address contexts.
My (limited) understanding of ORC is that the result produced by objtool gets
you a series of unwind tables, but those tables require further processing by
the kernel at boot.
Again, I have limited understanding, but wouldn't something need to
be processed as part of spawning and loading executable pages? There are both
.orc_unwind and .orc_unwind_ip sections, both of which need to be sorted. I
don't know what layer would be responsible for that, or how it adapts to
dlopen(), double-mapping pages like libffi, etc... but I'm sure people will
have opinions about it.
I don't know if this is limited to generating ORC data from DWARF, but the
orc-unwinder documentation also refers to difficulty when dealing with inline
assembly. That would perhaps mean that this could end up being a lot of work
and still not fix the minor-annoyance of strlen/etc not showing up correctly.
There is also a risk that ORC data cannot represent the ever-increasing
optimizations from GCC.