On 09. 11. 22 12:37, Petr Viktorin wrote:
tl;dr: Python 3.12 should be built with no-omit-frame-pointer if upstream recommends it.
Hello, You might be aware of a Fedora change proposal [0] (discussed on fedora-devel [1] and FESCo [2]) are discussing turning on C compiler flags that help with performance *measurement*, but might hurt performance itself: `-fno-omit-frame-pointer` and `-mno-omit-leaf-frame-pointer`. Apparently there are some benchmarks that make Python look extra slow when the flags are turned on -- which I don't quite understand.
Update: Andrii Nakryiko looked at the disassembly: https://pagure.io/fesco/issue/2817#comment-826636
Looks like the slowdown comes from a single function, _PyEval_EvalFrameDefault, which essentially *is* the Python interpreter -- so it's very unlikely to be indicative of other code in the distro (including Python extension modules). That function is being overhauled for Python 3.12 (in the main branch it's now autogenerated [5], which should hopefully allow optimizations across any common code, and perhaps things like switching to a register-based VM [6]).
Given these massive planned changes in the affected function, I think we should treat Python 3.11 and 3.12 as entirely separate when it comes to performance with no-omit-frame-pointer.
And since the Python slowdown comes from a single weird function, I think that Fedora should ignore the Python benchmarks when evaluating the distro default -- and if Fedora switches to no-omit-frame-pointer, Python 3.11 should be an exception (to be re-evaluated for 3.12). (Most of the current benchmarks [7] are from Python, so more might be needed.)
[5]: https://github.com/python/cpython/issues/98831 [6]: https://github.com/faster-cpython/ideas/issues/489 [7]: https://github.com/DaanDeMeyer/fpbench
Meanwhile, on the upstream side, Python 3.12 (due next year, main Python for Fedora 39 [3]) has support for `perf`. Upstream plans to recommend compiling with these flags when measuring performance [4], and AFAIK, the plan is to recommend *always* compiling with them. The idea is that possible speedups from "allowing anyone to profile/optimize their workflow" are worth the initial slowdown.
I'm not much of a performance expert myself, but I do get drawn into the relevant discussions on the CPython side. As far as I can see, performance geeks are enthusiastic for `perf` support, and I'd like to get them to (continue to) use Fedora builds. If CPython upstream does recommend these flags (or makes them default), I'm considering to turn the no-omit options on for Python 3.12 even if Fedora as a whole doesn't. Note that even a 2% slowdown will likely won back by general performance improvements – the Faster CPython team is targeting a 20% average speedup for pure-Python code in 3.12, on top of the ~25% for 3.11. And the people responsible for this speedup have a say in the upstream recommendations.
Technically there are three separate places where the flags can be set, I think we should turn them on everywhere:
- CPython itself & its standard library
- The debug build (/usr/bin/python3-debug)
- Default for libraries in Fedora (RPM macros)
- Befault for libraries built by users (sysconfig settings)