On 7/8/22 15:29, Christian Hergert wrote:
> Frank Ch. Eigler mentions that elfutils has a more modern
unwinding library.
> Could that perhaps solve your performance issues with libunwind?
I don't think so. The problem is two-fold.
First, we have to capture enough of the stack to do offline unwinding. I think the
default many people do here is about 8kb of stack. While the instruction pointer array
might fit in a couple cachelines, you now have an additional few pages to copy as well.
And you probably want those pages aligned in your capture format. So no you need to
interleave multiple types of data frames while padding for alignment.
Now do that a few thousand times a second.
The overhead here can be so great that it obscures what you're trying to find.
Furthermore, it's a good chance that you'll cause CPU packages to spin up to a
higher frequency, thusly hiding the exact performance issues you want to find or reduce to
avoid that.
Now, say you've done the work and captured stacks (what has now turned from a few MB
recording to a few GB recording) you need to decode them. We keep many
lookaside-maps/interval-trees in Sysprof to keep this overhead low, but now you have to
reference .eh/DWARF data. This is the slowest part of the whole process. What currently
takes a second or two could take you easily 10 minutes.
That is the problem right here: .eh_frame-based unwinding is too slow, so it has to be
done offline in userspace. What about instead adding ORC information to userspace? That
would be much faster to use.
--
Sincerely,
Demi Marie Obenour (she/her/hers)