> I strongly prefer the latter approach.  I believe the unwinder
> executes in NMI context, meaning that it must not block and must finish
> executing in a bounded amount of time.  Furthermore, any oops becomes
> an immediate kernel panic.  The eBPF verifier can trivially guarantee
> that the unwinder satisfies the properties needed here.  For security
> reasons, submitting eBPF programs is a privileged operation, but some
> programs could be compiled into the kernel and thus considered trusted.
> Such programs could be used without any special privileges.
>
> The key advantage of this approach is that privileged user-mode
> profiling tools, such as sysprof, can submit their own eBPF unwinders.
> This means that the kernel does not need to support whatever unwind
> info format userspace uses.  One could use DWARF, ORC, or any other
> format one wishes.

BPF programs do not have access to arbitrary ELF sections AFAIK. Every EBPF
unwinder that I've found is implemented via preprocessing the unwind format
in userspace and storing that in BPF maps so that it can be accessed from the 
BPF program.

Effectively, this means that every program that wants to do unwinding
in BPF has to do this preprocessing and store all the required information
in BPF maps. When you don't know which program you're going to be
requesting a stacktrace for, this effectively means userspace has to provide
this information for every program that might run on the system. While this
might work for dedicated long-running system profiling daemons, it is not
an option for software such as perf or bpftrace since it would drastically
increase their startup time, as well as their overall resource usage.

Cheers,

Daan

________________________________________
From: Demi Marie Obenour <demiobenour@gmail.com>
Sent: 09 July 2022 04:02
To: devel@lists.fedoraproject.org
Subject: Re: F37 proposal: Add -fno-omit-frame-pointer to default compilation flags (System-Wide Change proposal)

On 7/8/22 20:18, Christian Hergert wrote:
>> That is the problem right here: .eh_frame-based unwinding is too slow, so it has to be
>> done offline in userspace.  What about instead adding ORC information to userspace?  That
>> would be much faster to use.
>
> I'm not familiar with ORC, but there are a few things that initially come to
> mind in looking towards such a solution.
>
> First, are there any examples of perf being able to reference ORC data coming
> from user-space or is it currently limited to PERF_CONTEXT_KERNEL? For
> system-wide profiling, we still require that the kernel can do high-velocity
> unwinding across address contexts.

Why does the unwinding need to happen in the kernel?  The kernel can
already asynchronously invoke userspace code in the form of signal
handlers.  Is the problem that it is necessary to collect profiling
information in the middle of a system call, where another syscall
would see inconsistent (and potentially exploitable) kernel state?

> My (limited) understanding of ORC is that the result produced by objtool gets
> you a series of unwind tables, but those tables require further processing by
> the kernel at boot.
>
> Again, I have limited understanding, but wouldn't something need to
> be processed as part of spawning and loading executable pages? There are both
> .orc_unwind and .orc_unwind_ip sections, both of which need to be sorted. I
> don't know what layer would be responsible for that, or how it adapts to
> dlopen(), double-mapping pages like libffi, etc... but I'm sure people will
> have opinions about it.

Ouch.  That is a serious problem for a number of reasons, not least
of which is security.  Having the kernel parse even more complex
untrusted input in C is a horrible idea.

I can think of at least two better options:

1. Wait for Rust support to be merged, and write the unwinder in Rust.
2. Implement the unwinder as an eBPF program.

I strongly prefer the latter approach.  I believe the unwinder
executes in NMI context, meaning that it must not block and must finish
executing in a bounded amount of time.  Furthermore, any oops becomes
an immediate kernel panic.  The eBPF verifier can trivially guarantee
that the unwinder satisfies the properties needed here.  For security
reasons, submitting eBPF programs is a privileged operation, but some
programs could be compiled into the kernel and thus considered trusted.
Such programs could be used without any special privileges.

The key advantage of this approach is that privileged user-mode
profiling tools, such as sysprof, can submit their own eBPF unwinders.
This means that the kernel does not need to support whatever unwind
info format userspace uses.  One could use DWARF, ORC, or any other
format one wishes.

Christian, would this be sufficient for your needs?
--
Sincerely,
Demi Marie Obenour (she/her/hers)
_______________________________________________
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-leave@lists.fedoraproject.org
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure