On Mon, Oct 23, 2023 at 12:15:06PM +0200, Michal Schmidt wrote:
On Sat, Oct 21, 2023 at 3:06 PM Richard W.M. Jones
<rjones(a)redhat.com> wrote:
> I was asked about the topic in the subject, and I think it's not very
> well known. The news is that since Fedora 38, whole system
> performance analysis is now easy to do. This can be used to identify
> hot spots in single applications, or what the (whole computer) is
> really doing during lengthy operations.
>
> You can visualise these in various ways - my favourite is Brendan
> Gregg's Flame Graphs tools, but perf has many alternate ways to
> capture and display the data:
>
>
https://www.brendangregg.com/linuxperf.html
>
https://www.brendangregg.com/flamegraphs.html
>
https://perf.wiki.kernel.org/index.php/Tutorial
>
> I did a 15 min talk on this topic, actually to an internal Red Hat
> audience, but I guess it's fine to open it up to everyone:
>
>
http://oirase.annexia.org/tmp/2023-03-08-flamegraphs.mp4 [57M, 15m41s]
Hello Richard,
Thank you for posting this.
In the talk you mentioned that the "--off-cpu" option was not yet available.
Has there been any progress to enable it since the talk was recorded?
I have just tried it in Rawhide. perf is still built without it:
Warning: option `off-cpu' is being ignored because no BUILD_BPF_SKEL=1
What is blocking the enablement of this feature? Are there some trade-offs?
Is there a thread or a Bugzilla ticket where it is discussed?
There have been a few technical issues. Here is, I think, the latest
patch series:
https://lkml.org/lkml/2023/9/14/970
You can see from the link in that email that it missed Linux 6.4
because it caused a build error. Linus went on to describe the whole
effort as "garbage", but there was further discussion starting around
here:
https://lore.kernel.org/lkml/ZFOSUab5XEJD0kxj@kernel.org/
Rich.
Michal
> To show the kind of thing which is possible I have captured three
> whole system flame graphs. First comes from doing "make -j32" in the
> qemu build tree:
>
>
http://oirase.annexia.org/tmp/2023-gcc-with-lto.svg
>
> 8% of the time is spent running the assembler. I seem to recall that
> Clang uses a different approach of integrating the assembler into the
> compiler and I guess it probably avoids most of that overhead.
>
> The second is an rpmbuild of the Fedora Rawhide kernel package:
>
>
http://oirase.annexia.org/tmp/2023-kernel-build.svg
>
> I think it's interesting that 6% of the time is spent compressing the
> RPMs, and another 6% running pahole (debuginfo generation?) But the
> most surprising thing is it appears 42% of the time is spent just
> parsing C code [if I'm reading that right, I actually can't believe
> parsing takes so much time]. If true there must be opportunities to
> optimize things here.
>
> Captures work across userspace and kernel code, as shown in the third
> example which is a KVM (ie. hardware assisted) virtual machine doing
> some highly parallel work inside:
>
>
http://oirase.annexia.org/tmp/2023-kvm-build.svg
>
> You can clearly see the 8 virtual (guest) CPUs on the left side, using
> KVM. More interesting is that this guest uses a qcow2 file for disk
> and there's a heck of an overhead writing to that file. There's
> nothing to fix here -- qcow2 files shouldn't be used in this
> situation; for best performance it would be better to use a local
> block device to back the guest.
>
>
> The overhead of frame pointers in my measurements is about 1%, so this
> enhanced visibility into the system seems well worthwhile. I use this
> all the time. This year I've used it to suggest optimizations in
> qemu, nbdkit and the kernel.
>
> Rich.
--
Richard Jones, Virtualization Group, Red Hat
http://people.redhat.com/~rjones
Read my programming and virtualization blog:
http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines. Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v