https://fedoraproject.org/wiki/Changes/PythonStaticSpeedup
== Summary == Python 3 traditionally in Fedora was built with a shared library libpython3.?.so and the final binary was dynamically linked against that shared library. This change is about creating the static library and linking the final python3 binary against it, as it provides significant performance improvement, up to 27% depending on the workload. The static library will not be shipped. The shared library will continue to exist in a separate subpackage. In essence, python3 will no longer depend on libpython.
== Owner == * Name: [[User:Cstratak| Charalampos Stratakis]], [[User:Vstinner| Victor Stinner]], [[User:Churchyard| Miro Hrončok]] * Email: python-maint@redhat.com
== Detailed Description ==
When we compile the python3 package on Fedora (prior to this change), we create the libpython3.?.so shared library and the final python3 binary (<code>/usr/bin/python3</code>) is dynamically linked against it. However by building the libpython3.?.a static library and statically linking the final binary against it, we can achieve a performance gain of 5% to 27% depending on the workload. Link time optimizations and profile guided optimizations also have a greater impact when python3 is linked statically.
Since Python 3.8, [https://docs.python.org/3.8/whatsnew/3.8.html#debug-build-uses-the-same-abi-... C extensions must no longer be linked to libpython by default]. Applications embedding Python now need to utilize the --embed flag for python3-config to be linked to libpython. During the [[Changes/Python3.8|Python 3.8 upgrade and rebuilds]] we've uncovered various cases of packages linking to libpython implicitly through various hacks within their buildsystems and fixed as many as possible. However, there are legitimate reasons to link an application to libpython and for those cases libpython should be provided so applications that embed Python can continue to do so.
This mirrors the Debian/Ubuntu way of building Python, where they offer a statically linked binary and an additional libpython subpackage. The libpython subpackage will be created and python3-devel will depend on it, so packages that embed Python will keep working.
The change was first done in Debian and Ubuntu years ago, followed by Python 3.8. manylinux1 and manylinux2010 ABI don't link C extensions to libpython either (to support Debian/Ubuntu).
By applying this change, libpython's namespace will be separated from Python's, so '''C extension which are still linked to libpython''' might experience side effects or break.
There is one exception for C extensions. If an application is linked to libpython in order to embed Python, C extensions used only within this application can continue to be linked to libpython.
Currently there is no upstream option to build the static library, as well as the shared one and statically link the final binary to it, so we have to rely on a downstream patch to achieve it. We plan to work with upstream to incorporate the changes there as well.
Before the change, python3.8 is dynamically linked to libpython3.8:
<pre> +-------------------+ | | | | +--------------------+ | libpython3.8.so <---------+ /usr/bin/python3.8 | | | +--------------------+ | | +-------------------+ </pre>
After the change, python3.8 is statically linked to libpython3.8:
<pre> +-----------------------+ | | | /usr/bin/python3.8 | | | +-------------------+ | +-------------------+ | | | | | | | | | | | | | | libpython3.8.so | | | libpython3.8.a | | | | | | | | | | | | | | +-------------------+ | +-------------------+ | +-----------------------+ </pre>
As a negative side effect, when both libpython3.8.so and /usr/bin/python3.8 are installed, the filesystem footprint will be slightly increased (libpython3.8.so on Python 3.8.0, x86_64 is ~3.4M). OTOH only a very small amount of packages will depend on libpython3.8.so.
== Benefit to Fedora == Python's performance will increase significantly depending on the workload. Since many core components of the OS also depend on Python this could lead to an increase in their performance as well, however individual benchmarks will need to be conducted to verify the performance gain for those components.
[https://pyperformance.readthedocs.io/ pyperformance] results, ignoring differences smaller than 5%:
(see wiki page for table)
== Scope == * Proposal owners: ** Review and merge the [https://src.fedoraproject.org/rpms/python3/pull-request/133 pull request with the implementation]. ** Go through the Python C extension packages that are linked to libpython and test if things work correctly. A copr repository will be provided for testing.
* Other developers: Other developers are encouraged to test the new statically linked python3 and check if their package works as expected * Release engineering: [https://pagure.io/releng/issue/8953 #8953] This change does not require a mass rebuild, however a rebuild of the affected packages will be required. The affected packages will be rebuilt in copr first. * Policies and guidelines: The packaging guidelines will need to be updated to explicitly mention that C extensions should not be linked to libpython, and that the python3 binary is statically linked. * Trademark approval: N/A (not needed for this Change)
== Upgrade/compatibility impact == Affected package maintainers should verify that their packages work as expected and the only impact the end users should see is a performance increase for workloads relying on Python.
== How To Test == Copr repo with instructions: https://copr.fedorainfracloud.org/coprs/g/python/Python3_statically_linked/
=== Package changes test === The change will bring the new <code>libpython3</code> subpackage as a dependency of <code>python3-devel</code>.
Test that it's installed: <pre> $ rpm -q libpython3 </pre>
Test that it's uninstalled if <code>python3-devel</code> is removed: <pre> $ dnf remove python3-devel </pre>
Test that <code>python3-libs</code> no longer includes the libpython shared library. <pre> $ rpm -ql python3-libs | grep libpython3 </pre>
=== Dynamic linker test ===
To check that the python3.8 program is not linked to libpython, ldd can be used. For example, Python 3.7 will still be linked to libpython:
<pre> $ ldd /usr/bin/python3.7|grep libpython libpython3.7m.so.1.0 => /lib64/libpython3.7m.so.1.0 (0x00007fbb57333000) </pre>
But python3.8 will no longer be linked to libpython:
<pre> $ ldd /usr/bin/python3.8|grep libpython </pre>
=== Performance test ===
The performance speedup can be measured using the official Python benchmark suite [https://pyperformance.readthedocs.io/ pyperformance]: see [https://pyperformance.readthedocs.io/usage.html#run-benchmarks Run benchmarks].
=== Namespace test ===
The following script can be used to verify that the change is in effect:
<pre> import ctypes import sys
EMPTY_TUPLE_SINGLETON = ()
def get_empty_tuple(lib): # Call PyTuple_New(0) func = lib.PyTuple_New func.argtypes = (ctypes.c_ssize_t,) func.restype = ctypes.py_object return func(0)
def test_lib(libname, lib): obj = get_empty_tuple(lib) if obj is EMPTY_TUPLE_SINGLETON: print("%s: SAME namespace" % libname) else: print("%s: DIFFERENT namespace" % libname)
def test(): program = ctypes.pythonapi
if hasattr(sys, 'abiflags'): abiflags = sys.abiflags else: # Python 2 abiflags = '' ver = sys.version_info filename = ('libpython%s.%s%s.so.1.0' % (ver.major, ver.minor, abiflags)) libpython = ctypes.cdll.LoadLibrary(filename)
test_lib('program', program) test_lib('libpython', libpython)
test() </pre>
Output before the change: <pre> program: SAME namespace libpython: SAME namespace </pre>
Output after the change:
<pre> program: SAME namespace libpython: DIFFERENT namespace </pre>
== User Experience == Python based workloads should see a performance gain of up to 27%.
== Dependencies == While this specific change is not dependent on anything else, we would like to ensure that all the packages that link to libpython continue to work as expected.
Currently (30/10/2019) 118 packages on rawhide depend on libpython.
Result of the "repoquery --repo=rawhide --source --whatrequires 'libpython3.8.so.1.0()(64bit)' " command on Fedora Rawhide, x86_64:
*COPASI *Io-language *OpenImageIO *YafaRay *antimony *blender *boost *calamares *calibre *cantor *ceph *clingo *condor *createrepo_c *csound *cvc4 *dionaea *dmlite *domoticz *fontforge *freecad *gdb *gdcm *gdl *getdp *glade *globus-net-manager *glom *gnucash *gpaw *hamlib *hokuyoaist *hugin *insight *kdevelop-python *kicad *kitty *krita *lammps *ldns *libCombine *libarcus https://src.fedoraproject.org/rpms/libarcus/pull-request/8 *libarcus-lulzbot *libbatch *libcec *'''libcomps''' *'''libdnf''' *libftdi *libkml *libkolabxml *libldb *libnuml *libpeas *libplist *libreoffice *librepo *libsavitar *libsbml *libsedml *libtalloc *libyang *libyui-bindings *link-grammar *lldb *mathgl *med *mod_wsgi *nautilus-python *nbdkit *nest *netgen-mesher *neuron *nextpnr *nordugrid-arc *nwchem *openbabel *openscap *opentrep *openvdb *pam_wrapper *paraview *perl-Inline-Python *pidgin *pitivi *plplot *postgresql *pynac *pyotherside *pythia8 *python-gstreamer1 *python-jep *python-qt5 *<del>python3</del> *qgis *qpid-dispatch *qpid-proton *rdkit *renderdoc *rmol *root *samba *scidavis *sigil *swift-lang *texworks *thunarx-python *trademgen *trellis *unbound *uwsgi *vdr-epg-daemon *vigra *'''vim''' *vrpn *vtk *weechat *znc
Packages in '''bold''' are the ones present in the default docker/podman "fedora:rawhide" image.
== Contingency Plan == * Contingency mechanism: If issues appear that cannot be fixed in a timely manner the change can be easily reverted and will be considered again for the next fedora release. Also a proper upgrade path mechanism will be provided in case of reversion, since libpython.3.?.so will be a separate package with this change. * Contingency deadline: Before the beta freeze of Fedora 32 (2020-02-25) * Blocks release? Yes * Blocks product? None
== Documentation == The documentation will be reflected in the changes for the python packaging guidelines.
Python 3 traditionally in Fedora was built with a shared library libpython3.?.so and the final binary was dynamically linked against that shared library. This change is about creating the static library and linking the final python3 binary against it,
I oppose this change, because this is yet another size increase:
As a negative side effect, when both libpython3.8.so and /usr/bin/python3.8 are installed, the filesystem footprint will be slightly increased (libpython3.8.so on Python 3.8.0, x86_64 is ~3.4M).
and while:
OTOH only a very small amount of packages will depend on libpython3.8.so.
in practice, that does not help because some of those packages are installed by default, e.g., the ones you mentioned being installed by default even on the Docker image:
*'''libcomps''' *'''libdnf''' *'''vim'''
but there are more, such as gdb, libreoffice, krita, boost, etc. that are installed on various live images, and calamares, which is popular on remixes. So all those images will be bloated as a result of your code duplicating change.
In addition:
By applying this change, libpython's namespace will be separated from Python's, so '''C extension which are still linked to libpython''' might experience side effects or break.
so compatibility is an issue too.
Kevin Kofler
On Tue, 2019-11-05 at 19:41 +0100, Kevin Kofler wrote:
Python 3 traditionally in Fedora was built with a shared library libpython3.?.so and the final binary was dynamically linked against that shared library. This change is about creating the static library and linking the final python3 binary against it,
I oppose this change, because this is yet another size increase:
Up to ~27% speed increase for extra ~3.4 MB storage used seems like a good trade-off to me...
As a negative side effect, when both libpython3.8.so and /usr/bin/python3.8 are installed, the filesystem footprint will be slightly increased (libpython3.8.so on Python 3.8.0, x86_64 is ~3.4M).
and while:
OTOH only a very small amount of packages will depend on libpython3.8.so.
in practice, that does not help because some of those packages are installed by default, e.g., the ones you mentioned being installed by default even on the Docker image:
*'''libcomps''' *'''libdnf''' *'''vim'''
but there are more, such as gdb, libreoffice, krita, boost, etc. that are installed on various live images, and calamares, which is popular on remixes. So all those images will be bloated as a result of your code duplicating change.
In addition:
By applying this change, libpython's namespace will be separated from Python's, so '''C extension which are still linked to libpython''' might experience side effects or break.
so compatibility is an issue too.
Kevin Kofler
devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
On Tuesday, November 5, 2019 12:09:55 PM MST Martin Kolman wrote:
On Tue, 2019-11-05 at 19:41 +0100, Kevin Kofler wrote:
Python 3 traditionally in Fedora was built with a shared library libpython3.?.so and the final binary was dynamically linked against that shared library. This change is about creating the static library and linking the final python3 binary against it,
I oppose this change, because this is yet another size increase:
Up to ~27% speed increase for extra ~3.4 MB storage used seems like a good trade-off to me...
As a negative side effect, when both libpython3.8.so and /usr/bin/python3.8 are installed, the filesystem footprint will be slightly increased (libpython3.8.so on Python 3.8.0, x86_64 is ~3.4M).
and while:
OTOH only a very small amount of packages will depend on libpython3.8.so.
in practice, that does not help because some of those packages are installed by default, e.g., the ones you mentioned being installed by default even on the Docker image:
*'''libcomps''' *'''libdnf''' *'''vim'''
but there are more, such as gdb, libreoffice, krita, boost, etc. that are
installed on various live images, and calamares, which is popular on remixes. So all those images will be bloated as a result of your code duplicating change.
In addition:
By applying this change, libpython's namespace will be separated from Python's, so '''C extension which are still linked to libpython''' might experience side effects or break.
so compatibility is an issue too.
Kevin Kofler
devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List
Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.o rg
devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Anyone that has ever worked with CD images understands that every megabyte counts.
On 11/8/19 5:16 PM, John M. Harris Jr wrote:
On Tuesday, November 5, 2019 12:09:55 PM MST Martin Kolman wrote:
On Tue, 2019-11-05 at 19:41 +0100, Kevin Kofler wrote:
Python 3 traditionally in Fedora was built with a shared library libpython3.?.so and the final binary was dynamically linked against that shared library. This change is about creating the static library and linking the final python3 binary against it,
I oppose this change, because this is yet another size increase:
Up to ~27% speed increase for extra ~3.4 MB storage used seems like a good trade-off to me...
As a negative side effect, when both libpython3.8.so and /usr/bin/python3.8 are installed, the filesystem footprint will be slightly increased (libpython3.8.so on Python 3.8.0, x86_64 is ~3.4M).
and while:
OTOH only a very small amount of packages will depend on libpython3.8.so.
in practice, that does not help because some of those packages are installed by default, e.g., the ones you mentioned being installed by default even on the Docker image:
*'''libcomps''' *'''libdnf''' *'''vim'''
but there are more, such as gdb, libreoffice, krita, boost, etc. that are
installed on various live images, and calamares, which is popular on remixes. So all those images will be bloated as a result of your code duplicating change.
In addition:
By applying this change, libpython's namespace will be separated from Python's, so '''C extension which are still linked to libpython''' might experience side effects or break.
so compatibility is an issue too.
Kevin Kofler
devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List
Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.o rg
devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Anyone that has ever worked with CD images understands that every megabyte counts.
I would almost always take speed over disk size.
On Friday, November 8, 2019 3:20:33 PM MST Daniel Walsh wrote:
On 11/8/19 5:16 PM, John M. Harris Jr wrote:
On Tuesday, November 5, 2019 12:09:55 PM MST Martin Kolman wrote:
On Tue, 2019-11-05 at 19:41 +0100, Kevin Kofler wrote:
Python 3 traditionally in Fedora was built with a shared library libpython3.?.so and the final binary was dynamically linked against that shared library. This change is about creating the static library and linking the final python3 binary against it,
I oppose this change, because this is yet another size increase:
Up to ~27% speed increase for extra ~3.4 MB storage used seems like a good trade-off to me...
As a negative side effect, when both libpython3.8.so and /usr/bin/python3.8 are installed, the filesystem footprint will be slightly increased (libpython3.8.so on Python 3.8.0, x86_64 is ~3.4M).
and while:
OTOH only a very small amount of packages will depend on libpython3.8.so.
in practice, that does not help because some of those packages are installed by default, e.g., the ones you mentioned being installed by default even on the Docker image:
*'''libcomps''' *'''libdnf''' *'''vim'''
but there are more, such as gdb, libreoffice, krita, boost, etc. that are
installed on various live images, and calamares, which is popular on remixes. So all those images will be bloated as a result of your code duplicating change.
In addition:
By applying this change, libpython's namespace will be separated from Python's, so '''C extension which are still linked to libpython''' might experience side effects or break.
so compatibility is an issue too.
Kevin Kofler
devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List
Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject. o rg
devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.o rg
Anyone that has ever worked with CD images understands that every megabyte counts.
I would almost always take speed over disk size. _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
If you would like such a thing, perhaps a non-default python3-static package is the way to go then.
* John M. Harris, Jr.:
Anyone that has ever worked with CD images understands that every megabyte counts.
It's clearly not a priority for Fedora. It wouldn't be too difficult to replace glibc-all-langpacks with glibc-locale-source in the installer, going from 26 MiB to less than 5 MiB compressed and from 208 MiB to about 20 MiB uncompressed (which, as I understand it, would affect memory requirements during installation). The proposal was rejected.
Thanks, Florian
On Mon, Nov 11, 2019 at 5:23 AM Florian Weimer fweimer@redhat.com wrote:
- John M. Harris, Jr.:
Anyone that has ever worked with CD images understands that every megabyte counts.
It's clearly not a priority for Fedora. It wouldn't be too difficult to replace glibc-all-langpacks with glibc-locale-source in the installer, going from 26 MiB to less than 5 MiB compressed and from 208 MiB to about 20 MiB uncompressed (which, as I understand it, would affect memory requirements during installation). The proposal was rejected.
It *is* a priority, but having less runtime scriptlets is a higher priority. Your proposal would force scriptlets to be added for handling langpacks everywhere. We're trying to drive towards statelessness, not more weird non-deterministic stateful things during installations and upgrades.
On 05. 11. 19 19:41, Kevin Kofler wrote:
Python 3 traditionally in Fedora was built with a shared library libpython3.?.so and the final binary was dynamically linked against that shared library. This change is about creating the static library and linking the final python3 binary against it,
I oppose this change, because this is yet another size increase:
It is a trade. Performance vs. size. Some use cases will not gain more performance, but most will. Some use cases will be affected by the size increase, but mos won't. Details below.
That said, it is a fair point. When Fedora decides whether to do this, this needs to be considered.
As a negative side effect, when both libpython3.8.so and /usr/bin/python3.8 are installed, the filesystem footprint will be slightly increased (libpython3.8.so on Python 3.8.0, x86_64 is ~3.4M).
and while:
OTOH only a very small amount of packages will depend on libpython3.8.so.
in practice, that does not help because some of those packages are installed by default, e.g., the ones you mentioned being installed by default even on the Docker image:
*'''libcomps''' *'''libdnf''' *'''vim'''
I haven't checked vim (but work can be done to get rid of the dependency, it is vim-minimal -> libpython). For the dnf stack, is is mostly a matter of adapting the cmake files to not link extension modules to libpython. An example:
https://src.fedoraproject.org/rpms/libarcus/pull-request/8
Not being able to make the packages listed in bold libpython-less is a problem that would activate the contingency plan (revert).
but there are more, such as gdb, libreoffice, krita, boost, etc. that are installed on various live images, and calamares, which is popular on remixes. So all those images will be bloated as a result of your code duplicating change.
gdb Python support is optional.
krita is IMHO big enough to not notice the filesize increase.
So is libreoffice, but in fact only libreoffice-pyuno is doing this and it might be adapted or the dependency of libreoffice on libreoffice-pyuno might be made optional.
For boost, only the Python modules are affected and I'm confident it's the same problem as in the most of the rest of the list.
Extension modules should not link to libpython. Packages need to be adapted.
Only applications embedding Python need to link to libpython. That is what software like krita or blender are most likely doing.
In addition:
By applying this change, libpython's namespace will be separated from Python's, so '''C extension which are still linked to libpython''' might experience side effects or break.
so compatibility is an issue too.
It is an issue. We will look into this issue and provide help fixing the affected software. Python extension modules should not link to libpython and the packages need to be adapted not to do that.
Only Python extension modules that embed Python will truly be problematic to handle.
On Tue, Nov 5, 2019 at 2:17 PM Miro Hrončok mhroncok@redhat.com wrote:
On 05. 11. 19 19:41, Kevin Kofler wrote:
Python 3 traditionally in Fedora was built with a shared library libpython3.?.so and the final binary was dynamically linked against that shared library. This change is about creating the static library and linking the final python3 binary against it,
I oppose this change, because this is yet another size increase:
It is a trade. Performance vs. size. Some use cases will not gain more performance, but most will. Some use cases will be affected by the size increase, but mos won't. Details below.
That said, it is a fair point. When Fedora decides whether to do this, this needs to be considered.
As a negative side effect, when both libpython3.8.so and /usr/bin/python3.8 are installed, the filesystem footprint will be slightly increased (libpython3.8.so on Python 3.8.0, x86_64 is ~3.4M).
and while:
OTOH only a very small amount of packages will depend on libpython3.8.so.
in practice, that does not help because some of those packages are installed by default, e.g., the ones you mentioned being installed by default even on the Docker image:
*'''libcomps''' *'''libdnf''' *'''vim'''
I haven't checked vim (but work can be done to get rid of the dependency, it is vim-minimal -> libpython). For the dnf stack, is is mostly a matter of adapting the cmake files to not link extension modules to libpython. An example:
https://src.fedoraproject.org/rpms/libarcus/pull-request/8
Not being able to make the packages listed in bold libpython-less is a problem that would activate the contingency plan (revert).
but there are more, such as gdb, libreoffice, krita, boost, etc. that are installed on various live images, and calamares, which is popular on remixes. So all those images will be bloated as a result of your code duplicating change.
gdb Python support is optional.
krita is IMHO big enough to not notice the filesize increase.
So is libreoffice, but in fact only libreoffice-pyuno is doing this and it might be adapted or the dependency of libreoffice on libreoffice-pyuno might be made optional.
For boost, only the Python modules are affected and I'm confident it's the same problem as in the most of the rest of the list.
Extension modules should not link to libpython. Packages need to be adapted.
Only applications embedding Python need to link to libpython. That is what software like krita or blender are most likely doing.
In addition:
By applying this change, libpython's namespace will be separated from Python's, so '''C extension which are still linked to libpython''' might experience side effects or break.
so compatibility is an issue too.
It is an issue. We will look into this issue and provide help fixing the affected software. Python extension modules should not link to libpython and the packages need to be adapted not to do that.
We need a way to autogenerate the the Python language ABI dependency then. So far, no solution has been presented, and that needs to be fixed before this can be implemented. Without that and no library dependency, we have no way of knowing what to rebuild.
On 05. 11. 19 21:11, Neal Gompa wrote:
We need a way to autogenerate the the Python language ABI dependency then. So far, no solution has been presented, and that needs to be fixed before this can be implemented. Without that and no library dependency, we have no way of knowing what to rebuild.
There are basically 3 cases I can think of:
A) you build an extension module into sitearch - ABI dependency is generated B) you build anything that still links to libpython - dependency on specific libpython3.X.so is generated C) you build an extension module into custom directory - ABI dependency needs to be manually added now
for C) we should generate the dependency based on filename (*.cpython-38-{arch}-linux-gnu.so). but that would leave out cases where the filename is "simply" foo.so. As such, we might be able to figure out it is a Python extension by some other means.
Miro Hrončok wrote:
Only applications embedding Python need to link to libpython. That is what software like krita or blender
… and Calamares …
are most likely doing.
Kevin Kofler
On 11/5/19 4:59 PM, Kevin Kofler wrote:
… and Calamares …
... and Domoticz (Fedora), and Kodi (RPMFusion)...
Will this be as simple as a BR change in the spec or will application patches be necessary?
On 06. 11. 19 0:26, Michael Cronenworth wrote:
On 11/5/19 4:59 PM, Kevin Kofler wrote:
… and Calamares …
... and Domoticz (Fedora), and Kodi (RPMFusion)...
Will this be as simple as a BR change in the spec or will application patches be necessary?
Not for most cases. See this list:
Python extension modules that currently are unnecessary linked to libpython
- changes to cmake/autotools are needed, a sed in spec might do - if not changed, still works, but drags in the extra 3.4 MB (shared)
Non-extension software embedding Python and linking to libpython
- no changes necessary at all - but drags in the extra 3.4 MB (shared)
Python extension modules embedding Python and linking to libpython
- needs to be evaluated case by case - changes to cmake/autotools are needed - changes in code might be necessary as well - if not changed, might misbehave - Python Maint will provide help if asked for
On ke, 06 marras 2019, Miro Hrončok wrote:
On 06. 11. 19 0:26, Michael Cronenworth wrote:
On 11/5/19 4:59 PM, Kevin Kofler wrote:
… and Calamares …
... and Domoticz (Fedora), and Kodi (RPMFusion)...
Will this be as simple as a BR change in the spec or will application patches be necessary?
Not for most cases. See this list:
Python extension modules that currently are unnecessary linked to libpython
- changes to cmake/autotools are needed, a sed in spec might do
- if not changed, still works, but drags in the extra 3.4 MB (shared)
Non-extension software embedding Python and linking to libpython
- no changes necessary at all
- but drags in the extra 3.4 MB (shared)
Python extension modules embedding Python and linking to libpython
- needs to be evaluated case by case
- changes to cmake/autotools are needed
- changes in code might be necessary as well
- if not changed, might misbehave
- Python Maint will provide help if asked for
Do you have a list of affected packages?
Samba (and thus SSSD and FreeIPA) is affected. It is pretty fundamental that Samba modules link to libpython and I think it was designed so by you guys (Python team at Red Hat) when we ported Samba bindings to Python3.
# find /usr/lib64/python3.7/site-packages/samba -name '*.so' | xargs -n1 -I '{}' sh -c "ldd {} | egrep -q libpython && echo 'LINKED: {}' " LINKED: /usr/lib64/python3.7/site-packages/samba/_glue.cpython-37m-x86_64-linux-gnu.so LINKED: /usr/lib64/python3.7/site-packages/samba/_ldb.cpython-37m-x86_64-linux-gnu.so LINKED: /usr/lib64/python3.7/site-packages/samba/auth.cpython-37m-x86_64-linux-gnu.so LINKED: /usr/lib64/python3.7/site-packages/samba/werror.cpython-37m-x86_64-linux-gnu.so LINKED: /usr/lib64/python3.7/site-packages/samba/credentials.cpython-37m-x86_64-linux-gnu.so LINKED: /usr/lib64/python3.7/site-packages/samba/crypto.cpython-37m-x86_64-linux-gnu.so LINKED: /usr/lib64/python3.7/site-packages/samba/dcerpc/winbind.cpython-37m-x86_64-linux-gnu.so LINKED: /usr/lib64/python3.7/site-packages/samba/dcerpc/atsvc.cpython-37m-x86_64-linux-gnu.so LINKED: /usr/lib64/python3.7/site-packages/samba/dcerpc/windows_event_ids.cpython-37m-x86_64-linux-gnu.so LINKED: /usr/lib64/python3.7/site-packages/samba/dcerpc/auth.cpython-37m-x86_64-linux-gnu.so LINKED: /usr/lib64/python3.7/site-packages/samba/dcerpc/winreg.cpython-37m-x86_64-linux-gnu.so LINKED: /usr/lib64/python3.7/site-packages/samba/dcerpc/base.cpython-37m-x86_64-linux-gnu.so LINKED: /usr/lib64/python3.7/site-packages/samba/dcerpc/winspool.cpython-37m-x86_64-linux-gnu.so LINKED: /usr/lib64/python3.7/site-packages/samba/dcerpc/dcerpc.cpython-37m-x86_64-linux-gnu.so LINKED: /usr/lib64/python3.7/site-packages/samba/dcerpc/dfs.cpython-37m-x86_64-linux-gnu.so LINKED: /usr/lib64/python3.7/site-packages/samba/dcerpc/dns.cpython-37m-x86_64-linux-gnu.so LINKED: /usr/lib64/python3.7/site-packages/samba/dcerpc/dnsp.cpython-37m-x86_64-linux-gnu.so LINKED: /usr/lib64/python3.7/site-packages/samba/dcerpc/drsblobs.cpython-37m-x86_64-linux-gnu.so LINKED: /usr/lib64/python3.7/site-packages/samba/dcerpc/drsuapi.cpython-37m-x86_64-linux-gnu.so LINKED: /usr/lib64/python3.7/site-packages/samba/dcerpc/echo.cpython-37m-x86_64-linux-gnu.so LINKED: /usr/lib64/python3.7/site-packages/samba/dcerpc/epmapper.cpython-37m-x86_64-linux-gnu.so LINKED: /usr/lib64/python3.7/site-packages/samba/dcerpc/idmap.cpython-37m-x86_64-linux-gnu.so LINKED: /usr/lib64/python3.7/site-packages/samba/dcerpc/initshutdown.cpython-37m-x86_64-linux-gnu.so LINKED: /usr/lib64/python3.7/site-packages/samba/dcerpc/irpc.cpython-37m-x86_64-linux-gnu.so LINKED: /usr/lib64/python3.7/site-packages/samba/dcerpc/krb5pac.cpython-37m-x86_64-linux-gnu.so LINKED: /usr/lib64/python3.7/site-packages/samba/dcerpc/lsa.cpython-37m-x86_64-linux-gnu.so LINKED: /usr/lib64/python3.7/site-packages/samba/dcerpc/messaging.cpython-37m-x86_64-linux-gnu.so LINKED: /usr/lib64/python3.7/site-packages/samba/dcerpc/mgmt.cpython-37m-x86_64-linux-gnu.so LINKED: /usr/lib64/python3.7/site-packages/samba/dcerpc/misc.cpython-37m-x86_64-linux-gnu.so LINKED: /usr/lib64/python3.7/site-packages/samba/dcerpc/nbt.cpython-37m-x86_64-linux-gnu.so LINKED: /usr/lib64/python3.7/site-packages/samba/dcerpc/netlogon.cpython-37m-x86_64-linux-gnu.so LINKED: /usr/lib64/python3.7/site-packages/samba/dcerpc/ntlmssp.cpython-37m-x86_64-linux-gnu.so LINKED: /usr/lib64/python3.7/site-packages/samba/dcerpc/preg.cpython-37m-x86_64-linux-gnu.so LINKED: /usr/lib64/python3.7/site-packages/samba/dcerpc/samr.cpython-37m-x86_64-linux-gnu.so LINKED: /usr/lib64/python3.7/site-packages/samba/dcerpc/security.cpython-37m-x86_64-linux-gnu.so LINKED: /usr/lib64/python3.7/site-packages/samba/dcerpc/server_id.cpython-37m-x86_64-linux-gnu.so LINKED: /usr/lib64/python3.7/site-packages/samba/dcerpc/smb_acl.cpython-37m-x86_64-linux-gnu.so LINKED: /usr/lib64/python3.7/site-packages/samba/dcerpc/spoolss.cpython-37m-x86_64-linux-gnu.so LINKED: /usr/lib64/python3.7/site-packages/samba/dcerpc/srvsvc.cpython-37m-x86_64-linux-gnu.so LINKED: /usr/lib64/python3.7/site-packages/samba/dcerpc/svcctl.cpython-37m-x86_64-linux-gnu.so LINKED: /usr/lib64/python3.7/site-packages/samba/dcerpc/unixinfo.cpython-37m-x86_64-linux-gnu.so LINKED: /usr/lib64/python3.7/site-packages/samba/dcerpc/witness.cpython-37m-x86_64-linux-gnu.so LINKED: /usr/lib64/python3.7/site-packages/samba/dcerpc/wkssvc.cpython-37m-x86_64-linux-gnu.so LINKED: /usr/lib64/python3.7/site-packages/samba/dcerpc/xattr.cpython-37m-x86_64-linux-gnu.so LINKED: /usr/lib64/python3.7/site-packages/samba/gensec.cpython-37m-x86_64-linux-gnu.so LINKED: /usr/lib64/python3.7/site-packages/samba/gpo.cpython-37m-x86_64-linux-gnu.so LINKED: /usr/lib64/python3.7/site-packages/samba/messaging.cpython-37m-x86_64-linux-gnu.so LINKED: /usr/lib64/python3.7/site-packages/samba/net.cpython-37m-x86_64-linux-gnu.so LINKED: /usr/lib64/python3.7/site-packages/samba/netbios.cpython-37m-x86_64-linux-gnu.so LINKED: /usr/lib64/python3.7/site-packages/samba/ntstatus.cpython-37m-x86_64-linux-gnu.so LINKED: /usr/lib64/python3.7/site-packages/samba/param.cpython-37m-x86_64-linux-gnu.so LINKED: /usr/lib64/python3.7/site-packages/samba/policy.cpython-37m-x86_64-linux-gnu.so LINKED: /usr/lib64/python3.7/site-packages/samba/posix_eadb.cpython-37m-x86_64-linux-gnu.so LINKED: /usr/lib64/python3.7/site-packages/samba/registry.cpython-37m-x86_64-linux-gnu.so LINKED: /usr/lib64/python3.7/site-packages/samba/samba3/libsmb_samba_internal.cpython-37m-x86_64-linux-gnu.so LINKED: /usr/lib64/python3.7/site-packages/samba/samba3/param.cpython-37m-x86_64-linux-gnu.so LINKED: /usr/lib64/python3.7/site-packages/samba/samba3/passdb.cpython-37m-x86_64-linux-gnu.so LINKED: /usr/lib64/python3.7/site-packages/samba/samba3/smbd.cpython-37m-x86_64-linux-gnu.so LINKED: /usr/lib64/python3.7/site-packages/samba/security.cpython-37m-x86_64-linux-gnu.so LINKED: /usr/lib64/python3.7/site-packages/samba/xattr_native.cpython-37m-x86_64-linux-gnu.so LINKED: /usr/lib64/python3.7/site-packages/samba/xattr_tdb.cpython-37m-x86_64-linux-gnu.so
On 06. 11. 19 11:41, Alexander Bokovoy wrote:
Python extension modules embedding Python and linking to libpython
- needs to be evaluated case by case
- changes to cmake/autotools are needed
- changes in code might be necessary as well
- if not changed, might misbehave
- Python Maint will provide help if asked for
Do you have a list of affected packages?
We anticipate that the number of affected packages that actually need to link to libpython from extension modules is (very close to) 0.
But no, we don't have a list yet. We intent to go package by package (see the list in the proposal) and examine the reason the file is linked to libpython.
We are aware about samba linking to libpython and we anticipate changes will be needed. This was already semi-discussed when samba libs didn't build with Python 3.8.
Samba (and thus SSSD and FreeIPA) is affected. It is pretty fundamental that Samba modules link to libpython and I think it was designed so by you guys (Python team at Red Hat) when we ported Samba bindings to Python3.
Why is it fundamental to link extension modules to libpython? I wasn't directly involved with porting Samba, looping Lumír in. However note that when samba was ported, it was common to ink Python extension to libpython by default.
# find /usr/lib64/python3.7/site-packages/samba -name '*.so' | xargs -n1 -I '{}' sh -c "ldd {} | egrep -q libpython && echo 'LINKED: {}' " LINKED: /usr/lib64/python3.7/site-packages/samba/_glue.cpython-37m-x86_64-linux-gnu.so ...
On Python 3.7, extension modules are linked to libpython by default. On Python 3.8, extension modules are only linked to libpython if explicitly told so by the build system, such as sambas's waf.
Extension modules built with distutils/setuptools are not linked to libpython.
Important pointer:
https://bugzilla.redhat.com/show_bug.cgi?id=1711638#c8 ... #c34
On ke, 06 marras 2019, Miro Hrončok wrote:
On 06. 11. 19 11:41, Alexander Bokovoy wrote:
Python extension modules embedding Python and linking to libpython
- needs to be evaluated case by case
- changes to cmake/autotools are needed
- changes in code might be necessary as well
- if not changed, might misbehave
- Python Maint will provide help if asked for
Do you have a list of affected packages?
We anticipate that the number of affected packages that actually need to link to libpython from extension modules is (very close to) 0.
But no, we don't have a list yet. We intent to go package by package (see the list in the proposal) and examine the reason the file is linked to libpython.
We are aware about samba linking to libpython and we anticipate changes will be needed. This was already semi-discussed when samba libs didn't build with Python 3.8.
If you'd be able to help us removing this linking dependency, that would be great.
Samba (and thus SSSD and FreeIPA) is affected. It is pretty fundamental that Samba modules link to libpython and I think it was designed so by you guys (Python team at Red Hat) when we ported Samba bindings to Python3.
Why is it fundamental to link extension modules to libpython? I wasn't directly involved with porting Samba, looping Lumír in. However note that when samba was ported, it was common to ink Python extension to libpython by default.
We had to support two different Python builds in parallel and had to do a lot to link them properly to both runtimes in a correct way.
# find /usr/lib64/python3.7/site-packages/samba -name '*.so' | xargs -n1 -I '{}' sh -c "ldd {} | egrep -q libpython && echo 'LINKED: {}' " LINKED: /usr/lib64/python3.7/site-packages/samba/_glue.cpython-37m-x86_64-linux-gnu.so ...
On Python 3.7, extension modules are linked to libpython by default. On Python 3.8, extension modules are only linked to libpython if explicitly told so by the build system, such as sambas's waf.
Extension modules built with distutils/setuptools are not linked to libpython.
Important pointer:
https://bugzilla.redhat.com/show_bug.cgi?id=1711638#c8 ... #c34
Thanks for the link, will read later.
On 06. 11. 19 17:44, Alexander Bokovoy wrote:
If you'd be able to help us removing this linking dependency, that would be great.
We would. However we'd only invest the time and energy into it if this change is accepted, not before that. IF samba and or freeipa breaks, that would be Fedora 32 blocker, so we would revert this change until a fix or viable workaround is found.
Dne 05. 11. 19 v 16:03 Ben Cotton napsal(a):
https://fedoraproject.org/wiki/Changes/PythonStaticSpeedup
== Summary == Python 3 traditionally in Fedora was built with a shared library libpython3.?.so and the final binary was dynamically linked against that shared library. This change is about creating the static library and linking the final python3 binary against it, as it provides significant performance improvement, up to 27% depending on the workload. The static library will not be shipped. The shared library will continue to exist in a separate subpackage. In essence, python3 will no longer depend on libpython.
== Owner ==
- Name: [[User:Cstratak| Charalampos Stratakis]], [[User:Vstinner|
Victor Stinner]], [[User:Churchyard| Miro Hrončok]]
- Email: python-maint@redhat.com
== Detailed Description ==
When we compile the python3 package on Fedora (prior to this change), we create the libpython3.?.so shared library and the final python3 binary (<code>/usr/bin/python3</code>) is dynamically linked against it. However by building the libpython3.?.a static library and statically linking the final binary against it, we can achieve a performance gain of 5% to 27% depending on the workload.
Where are these number coming from? And what is the reason for the performance hit for dynamically linked Python?
Vít
Link time optimizations and profile guided optimizations also have a greater impact when python3 is linked statically.
Since Python 3.8, [https://docs.python.org/3.8/whatsnew/3.8.html#debug-build-uses-the-same-abi-... C extensions must no longer be linked to libpython by default]. Applications embedding Python now need to utilize the --embed flag for python3-config to be linked to libpython. During the [[Changes/Python3.8|Python 3.8 upgrade and rebuilds]] we've uncovered various cases of packages linking to libpython implicitly through various hacks within their buildsystems and fixed as many as possible. However, there are legitimate reasons to link an application to libpython and for those cases libpython should be provided so applications that embed Python can continue to do so.
This mirrors the Debian/Ubuntu way of building Python, where they offer a statically linked binary and an additional libpython subpackage. The libpython subpackage will be created and python3-devel will depend on it, so packages that embed Python will keep working.
The change was first done in Debian and Ubuntu years ago, followed by Python 3.8. manylinux1 and manylinux2010 ABI don't link C extensions to libpython either (to support Debian/Ubuntu).
By applying this change, libpython's namespace will be separated from Python's, so '''C extension which are still linked to libpython''' might experience side effects or break.
There is one exception for C extensions. If an application is linked to libpython in order to embed Python, C extensions used only within this application can continue to be linked to libpython.
Currently there is no upstream option to build the static library, as well as the shared one and statically link the final binary to it, so we have to rely on a downstream patch to achieve it. We plan to work with upstream to incorporate the changes there as well.
Before the change, python3.8 is dynamically linked to libpython3.8:
<pre> +-------------------+ | | | | +--------------------+ | libpython3.8.so <---------+ /usr/bin/python3.8 | | | +--------------------+ | | +-------------------+ </pre>
After the change, python3.8 is statically linked to libpython3.8:
<pre> +-----------------------+ | | | /usr/bin/python3.8 | | | +-------------------+ | +-------------------+ | | | | | | | | | | | | | | libpython3.8.so | | | libpython3.8.a | | | | | | | | | | | | | | +-------------------+ | +-------------------+ | +-----------------------+ </pre>
As a negative side effect, when both libpython3.8.so and /usr/bin/python3.8 are installed, the filesystem footprint will be slightly increased (libpython3.8.so on Python 3.8.0, x86_64 is ~3.4M). OTOH only a very small amount of packages will depend on libpython3.8.so.
== Benefit to Fedora == Python's performance will increase significantly depending on the workload. Since many core components of the OS also depend on Python this could lead to an increase in their performance as well, however individual benchmarks will need to be conducted to verify the performance gain for those components.
[https://pyperformance.readthedocs.io/ pyperformance] results, ignoring differences smaller than 5%:
(see wiki page for table)
== Scope ==
- Proposal owners:
** Review and merge the [https://src.fedoraproject.org/rpms/python3/pull-request/133 pull request with the implementation]. ** Go through the Python C extension packages that are linked to libpython and test if things work correctly. A copr repository will be provided for testing.
- Other developers: Other developers are encouraged to test the new
statically linked python3 and check if their package works as expected
- Release engineering: [https://pagure.io/releng/issue/8953 #8953]
This change does not require a mass rebuild, however a rebuild of the affected packages will be required. The affected packages will be rebuilt in copr first.
- Policies and guidelines: The packaging guidelines will need to be
updated to explicitly mention that C extensions should not be linked to libpython, and that the python3 binary is statically linked.
- Trademark approval: N/A (not needed for this Change)
== Upgrade/compatibility impact == Affected package maintainers should verify that their packages work as expected and the only impact the end users should see is a performance increase for workloads relying on Python.
== How To Test == Copr repo with instructions: https://copr.fedorainfracloud.org/coprs/g/python/Python3_statically_linked/
=== Package changes test === The change will bring the new <code>libpython3</code> subpackage as a dependency of <code>python3-devel</code>.
Test that it's installed:
<pre> $ rpm -q libpython3 </pre>
Test that it's uninstalled if <code>python3-devel</code> is removed:
<pre> $ dnf remove python3-devel </pre>
Test that <code>python3-libs</code> no longer includes the libpython shared library.
<pre> $ rpm -ql python3-libs | grep libpython3 </pre>
=== Dynamic linker test ===
To check that the python3.8 program is not linked to libpython, ldd can be used. For example, Python 3.7 will still be linked to libpython:
<pre> $ ldd /usr/bin/python3.7|grep libpython libpython3.7m.so.1.0 => /lib64/libpython3.7m.so.1.0 (0x00007fbb57333000) </pre>
But python3.8 will no longer be linked to libpython:
<pre> $ ldd /usr/bin/python3.8|grep libpython </pre>
=== Performance test ===
The performance speedup can be measured using the official Python benchmark suite [https://pyperformance.readthedocs.io/ pyperformance]: see [https://pyperformance.readthedocs.io/usage.html#run-benchmarks Run benchmarks].
=== Namespace test ===
The following script can be used to verify that the change is in effect:
<pre> import ctypes import sys EMPTY_TUPLE_SINGLETON = () def get_empty_tuple(lib): # Call PyTuple_New(0) func = lib.PyTuple_New func.argtypes = (ctypes.c_ssize_t,) func.restype = ctypes.py_object return func(0) def test_lib(libname, lib): obj = get_empty_tuple(lib) if obj is EMPTY_TUPLE_SINGLETON: print("%s: SAME namespace" % libname) else: print("%s: DIFFERENT namespace" % libname) def test(): program = ctypes.pythonapi if hasattr(sys, 'abiflags'): abiflags = sys.abiflags else: # Python 2 abiflags = '' ver = sys.version_info filename = ('libpython%s.%s%s.so.1.0' % (ver.major, ver.minor, abiflags)) libpython = ctypes.cdll.LoadLibrary(filename) test_lib('program', program) test_lib('libpython', libpython) test() </pre>
Output before the change:
<pre> program: SAME namespace libpython: SAME namespace </pre>
Output after the change:
<pre> program: SAME namespace libpython: DIFFERENT namespace </pre>
== User Experience == Python based workloads should see a performance gain of up to 27%.
== Dependencies == While this specific change is not dependent on anything else, we would like to ensure that all the packages that link to libpython continue to work as expected.
Currently (30/10/2019) 118 packages on rawhide depend on libpython.
Result of the "repoquery --repo=rawhide --source --whatrequires 'libpython3.8.so.1.0()(64bit)' " command on Fedora Rawhide, x86_64:
*COPASI *Io-language *OpenImageIO *YafaRay *antimony *blender *boost *calamares *calibre *cantor *ceph *clingo *condor *createrepo_c *csound *cvc4 *dionaea *dmlite *domoticz *fontforge *freecad *gdb *gdcm *gdl *getdp *glade *globus-net-manager *glom *gnucash *gpaw *hamlib *hokuyoaist *hugin *insight *kdevelop-python *kicad *kitty *krita *lammps *ldns *libCombine *libarcus https://src.fedoraproject.org/rpms/libarcus/pull-request/8 *libarcus-lulzbot *libbatch *libcec *'''libcomps''' *'''libdnf''' *libftdi *libkml *libkolabxml *libldb *libnuml *libpeas *libplist *libreoffice *librepo *libsavitar *libsbml *libsedml *libtalloc *libyang *libyui-bindings *link-grammar *lldb *mathgl *med *mod_wsgi *nautilus-python *nbdkit *nest *netgen-mesher *neuron *nextpnr *nordugrid-arc *nwchem *openbabel *openscap *opentrep *openvdb *pam_wrapper *paraview *perl-Inline-Python *pidgin *pitivi *plplot *postgresql *pynac *pyotherside *pythia8 *python-gstreamer1 *python-jep *python-qt5 *<del>python3</del> *qgis *qpid-dispatch *qpid-proton *rdkit *renderdoc *rmol *root *samba *scidavis *sigil *swift-lang *texworks *thunarx-python *trademgen *trellis *unbound *uwsgi *vdr-epg-daemon *vigra *'''vim''' *vrpn *vtk *weechat *znc
Packages in '''bold''' are the ones present in the default docker/podman "fedora:rawhide" image.
== Contingency Plan ==
- Contingency mechanism: If issues appear that cannot be fixed in a
timely manner the change can be easily reverted and will be considered again for the next fedora release. Also a proper upgrade path mechanism will be provided in case of reversion, since libpython.3.?.so will be a separate package with this change.
- Contingency deadline: Before the beta freeze of Fedora 32 (2020-02-25)
- Blocks release? Yes
- Blocks product? None
== Documentation == The documentation will be reflected in the changes for the python packaging guidelines.
On Wed, 06 Nov 2019 11:49:18 +0100, Vít Ondruch wrote:
we can achieve a performance gain of 5% to 27% depending on the workload.
Where are these number coming from? And what is the reason for the performance hit for dynamically linked Python?
Yes, it looks suspicious. -fPIC was a performance hit for i686 but it no longer affects x86_64.
Jan
On Wed, Nov 06, 2019 at 12:12:54PM +0100, Jan Kratochvil wrote:
On Wed, 06 Nov 2019 11:49:18 +0100, Vít Ondruch wrote:
we can achieve a performance gain of 5% to 27% depending on the workload.
Where are these number coming from? And what is the reason for the performance hit for dynamically linked Python?
Yes, it looks suspicious. -fPIC was a performance hit for i686 but it no longer affects x86_64.
No. It is a smaller performance hit on x86_64, no need to compute the PIC base register at the start of functions and waste a register for it, but still significant (mainly that indirect addressing is used for anything not know to bind to variables in the current TU, while otherwise direct addressing could be used).
Jakub
On Wed, 6 Nov 2019 at 05:50, Vít Ondruch wrote:
Dne 05. 11. 19 v 16:03 Ben Cotton napsal(a):
When we compile the python3 package on Fedora (prior to this change), we create the libpython3.?.so shared library and the final python3 binary (<code>/usr/bin/python3</code>) is dynamically linked against it. However by building the libpython3.?.a static library and statically linking the final binary against it, we can achieve a performance gain of 5% to 27% depending on the workload.
Where are these number coming from? And what is the reason for the performance hit for dynamically linked Python?
Yea. This sounds like a bug/deficiency in the linking system, and the problem is possibly attacked from the wrong direction.
Orcan
Where are these number coming from?
There are pyperformance results: https://fedoraproject.org/wiki/Changes/PythonStaticSpeedup#Benefit_to_Fedora
It's the official benchmark suite to measure the Python performance on speed.python.org.
I ran the benchmarks on my laptop using CPU isolation (isolcpus and rcu_nocbs Linux kernel parameters).
And what is the reason for the performance hit for dynamically linked Python?
Honestly, the speedup doesn't make any sense to me :-D But I only trust benchmark results, not beliefs nor documentations about compilers and linkers.
I looked at the assembly to compare statically linked and dynamically linked "python3.8" binaries. I noticed two main differences:
* function calls in dynamically linked Python use the PTL thing: it's not a direct function call, there is an indirection * I see inlining more often in the statically linked Python
Reminder: currently (dynamically linked Python), /usr/bin/python3.8 is basically just a single function call to Py_BytesMain(argc, argv). ALL Python code lives in libpython.
I cannot explain why inlining cannot be done more often in libpython.
I cannot explain why PLT is needed when a libpython function calls a libpython function.
Yea. This sounds like a bug/deficiency in the linking system, and the problem is possibly attacked from the wrong direction.
IMHO compilers and linkers are doing their best to optimize libpython, but the nature of libpython (a dynamic .so library) prevents some kinds of optimizations.
It seems like putting all code into an *application* allows to go further in term of optimization.
By the way, the two binaries that I analyzed are optimized using LTO (Link Time Optimization) *and* PGO (Profile Guided Optimization). They are the most advanced optimizations technics!
Victor
On 07/11/2019 14:59, Victor Stinner wrote:
I cannot explain why PLT is needed when a libpython function calls a libpython function.
Because an exported symbol in an ELF shared library is subject to potential interposition using LD_PRELOAD so the calls need to go through the PLT to be resolved.
Tom
Dne 07. 11. 19 v 16:05 Tom Hughes napsal(a):
On 07/11/2019 14:59, Victor Stinner wrote:
I cannot explain why PLT is needed when a libpython function calls a libpython function.
Because an exported symbol in an ELF shared library is subject to potential interposition using LD_PRELOAD so the calls need to go through the PLT to be resolved.
Not sure what PLT is (pre load table?), but is it something what could be disabled?
This sounds like the whole system could be 25% faster if we link statically.
Vít
On Thu, Nov 07, 2019 at 05:15:18PM +0100, Vít Ondruch wrote:
This sounds like the whole system could be 25% faster if we link statically.
Yeah, that's the advantage of static linking. This brings us stuff like statically linked distibutions - https://sta.li/faq/ Generally advantages of dynamic libraries prevail over speed.
On 07. 11. 19 17:15, Vít Ondruch wrote:
Dne 07. 11. 19 v 16:05 Tom Hughes napsal(a):
On 07/11/2019 14:59, Victor Stinner wrote:
I cannot explain why PLT is needed when a libpython function calls a libpython function.
Because an exported symbol in an ELF shared library is subject to potential interposition using LD_PRELOAD so the calls need to go through the PLT to be resolved.
Not sure what PLT is (pre load table?), but is it something what could be disabled?
This sounds like the whole system could be 25% faster if we link statically.
If we build things statically with libraries, it's a can full of worms. What needs to be said about this change that we don't staticaly link against different libraries, we just build CPython source into one "fat" executable instead of splitting it into a tiny wrapper and a "fat" libpython.
Once upon a time, Miro Hrončok mhroncok@redhat.com said:
If we build things statically with libraries, it's a can full of worms. What needs to be said about this change that we don't staticaly link against different libraries, we just build CPython source into one "fat" executable instead of splitting it into a tiny wrapper and a "fat" libpython.
It might be useful to see how other interpreters that are built like this perform; I know perl has used libperl.so for ages (maybe all the perl5 time?). Does it have the same performance impact, and if so, can/should it be switched to /usr/bin/perl linking the core static?
Alternately, is there some way to reduce the overhead of the dynamic library (that could help multiple languages)?
Chris Adams wrote:
Alternately, is there some way to reduce the overhead of the dynamic library (that could help multiple languages)?
-fno-semantic-interposition
Can this please be tried on the dynamically linked Python?
Kevin Kofler
* Vít Ondruch:
Dne 07. 11. 19 v 16:05 Tom Hughes napsal(a):
On 07/11/2019 14:59, Victor Stinner wrote:
I cannot explain why PLT is needed when a libpython function calls a libpython function.
Because an exported symbol in an ELF shared library is subject to potential interposition using LD_PRELOAD so the calls need to go through the PLT to be resolved.
Not sure what PLT is (pre load table?), but is it something what could be disabled?
Procedure Linkage Table.
It can be disabled by using hidden symbols. For internal symbols, that is easy. For symbols that are used externally, I do not think we have good toolchain support. Link-time optimization can detect truly internal symbols and make them hidden. Some targets can also perform relaxation of relocations, eliminating most of the PLT indirection overhead if it turns out a function is not exported after all and therefore cannot be interposed. But that needs a version script, and it can't work for calls to functions that are in fact public.
In glibc, we create hidden aliases for public functions which should not be interposed. It's not too bad if you have preprocessor macros for this task, but you do need to annotate each function declaration and definition separately.
This sounds like the whole system could be 25% faster if we link statically.
Not reallly, quite a few system components already do this kind of optimization.
Toolchain support for this is quite poor however. Ideally, we would have a compilation mode that reuses the annotations that Windows uses, but given that our system headers currently lack __dllimport specifiers (or whatever they are called), even with that approach, it's quite a lot of work. I might be mistaken about this, but I think there was a huge conflict about some intermediate visibility setting (protected?) that might help with this, basically creating non-interposable function symbols, but I don't think it's usable for that in its current state.
Thanks, Florian
Dne 07. 11. 19 v 23:08 Florian Weimer napsal(a):
- Vít Ondruch:
Dne 07. 11. 19 v 16:05 Tom Hughes napsal(a):
On 07/11/2019 14:59, Victor Stinner wrote:
I cannot explain why PLT is needed when a libpython function calls a libpython function.
Because an exported symbol in an ELF shared library is subject to potential interposition using LD_PRELOAD so the calls need to go through the PLT to be resolved.
Not sure what PLT is (pre load table?), but is it something what could be disabled?
Procedure Linkage Table.
It can be disabled by using hidden symbols. For internal symbols, that is easy. For symbols that are used externally, I do not think we have good toolchain support. Link-time optimization can detect truly internal symbols and make them hidden. Some targets can also perform relaxation of relocations, eliminating most of the PLT indirection overhead if it turns out a function is not exported after all and therefore cannot be interposed. But that needs a version script, and it can't work for calls to functions that are in fact public.
In glibc, we create hidden aliases for public functions which should not be interposed. It's not too bad if you have preprocessor macros for this task, but you do need to annotate each function declaration and definition separately.
This sounds like the whole system could be 25% faster if we link statically.
Not reallly, quite a few system components already do this kind of optimization.
Toolchain support for this is quite poor however. Ideally, we would have a compilation mode that reuses the annotations that Windows uses, but given that our system headers currently lack __dllimport specifiers (or whatever they are called), even with that approach, it's quite a lot of work. I might be mistaken about this, but I think there was a huge conflict about some intermediate visibility setting (protected?) that might help with this, basically creating non-interposable function symbols, but I don't think it's usable for that in its current state.
Thx for explanation Florian.
Generally, I am against this change proposal, because:
1) Apparently, there is some work which needs to be done on the toolchain. Applying workarounds just hides the issues and we won't move forward ever.
2) I am asking this questions, because I believe that the same issue might suffer Ruby and others are concerned about Perl. Applying this just to Python is not systematic.
However, if part of this change proposal was actually collecting the information what have to be done to have similar performance for the dynamically linked libraries comparing to static linking, if there is push to improve the toolchain and if there is generally better understanding of the issue, then I would not mind if this is accepted as temporary measure. Unfortunately, nothing like this is mentioned in the change proposal.
Vít
Thanks, Florian
----- Original Message -----
From: "Vít Ondruch" vondruch@redhat.com Cc: devel@lists.fedoraproject.org Sent: Friday, November 8, 2019 10:01:47 AM Subject: Re: Fedora 32 System-Wide Change proposal: Build Python 3 to statically link with libpython3.8.a for better performance
Dne 07. 11. 19 v 23:08 Florian Weimer napsal(a):
- Vít Ondruch:
Dne 07. 11. 19 v 16:05 Tom Hughes napsal(a):
On 07/11/2019 14:59, Victor Stinner wrote:
I cannot explain why PLT is needed when a libpython function calls a libpython function.
Because an exported symbol in an ELF shared library is subject to potential interposition using LD_PRELOAD so the calls need to go through the PLT to be resolved.
Not sure what PLT is (pre load table?), but is it something what could be disabled?
Procedure Linkage Table.
It can be disabled by using hidden symbols. For internal symbols, that is easy. For symbols that are used externally, I do not think we have good toolchain support. Link-time optimization can detect truly internal symbols and make them hidden. Some targets can also perform relaxation of relocations, eliminating most of the PLT indirection overhead if it turns out a function is not exported after all and therefore cannot be interposed. But that needs a version script, and it can't work for calls to functions that are in fact public.
In glibc, we create hidden aliases for public functions which should not be interposed. It's not too bad if you have preprocessor macros for this task, but you do need to annotate each function declaration and definition separately.
This sounds like the whole system could be 25% faster if we link statically.
Not reallly, quite a few system components already do this kind of optimization.
Toolchain support for this is quite poor however. Ideally, we would have a compilation mode that reuses the annotations that Windows uses, but given that our system headers currently lack __dllimport specifiers (or whatever they are called), even with that approach, it's quite a lot of work. I might be mistaken about this, but I think there was a huge conflict about some intermediate visibility setting (protected?) that might help with this, basically creating non-interposable function symbols, but I don't think it's usable for that in its current state.
Thx for explanation Florian.
Generally, I am against this change proposal, because:
- Apparently, there is some work which needs to be done on the
toolchain. Applying workarounds just hides the issues and we won't move forward ever.
I think it's more reasonable to do a small SPEC change in Python to achieve a 27% performance boost, than wait for the toolchain to catch up on things that are not well defined yet. I don't see that as a valid reason for not accepting the change, although you might want to elaborate further here.
- I am asking this questions, because I believe that the same issue
might suffer Ruby and others are concerned about Perl. Applying this just to Python is not systematic.
Maybe. Systematic or not when compared to other dynamic languages is not really relevant for this change to take effect. I don't know about Perl's or Ruby's architecture design, but is there a reason to keep them in line in that aspect? Or for any other aspect at all, apart from the general packaging guidelines?
However, if part of this change proposal was actually collecting the information what have to be done to have similar performance for the dynamically linked libraries comparing to static linking, if there is push to improve the toolchain and if there is generally better understanding of the issue, then I would not mind if this is accepted as temporary measure. Unfortunately, nothing like this is mentioned in the change proposal.
No this is not intended to be temporal, hence why it's not mentioned as such. The information has been collected as a case for the change. If other languages would like to conduct similar benchmarks and experiments they are free to do so, but the scope of this change is just for Python. Also it's not intended as a push for certain toolchain changes/optimizations, although thet would be more than welcome. In addition this change is not a case for the benefits of static linking in general, as we are not aware or experimented on other things, we outlined a specific case, which is in line with the packaging guidelines, since statically linking a binary against its own library is permitted.
This change intends to speed up Python by a significant margin in the context of the current toolchain, with the outlined effects on compatibility and package changes. We can debate on that, the rest seem irrelevant in the context of accepting it or not.
Vít
Thanks, Florian
devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
On Fri, Nov 8, 2019 at 6:01 PM Charalampos Stratakis cstratak@redhat.com wrote:
----- Original Message -----
From: "Vít Ondruch" vondruch@redhat.com Cc: devel@lists.fedoraproject.org Sent: Friday, November 8, 2019 10:01:47 AM Subject: Re: Fedora 32 System-Wide Change proposal: Build Python 3 to statically link with libpython3.8.a for better performance
Dne 07. 11. 19 v 23:08 Florian Weimer napsal(a):
- Vít Ondruch:
Dne 07. 11. 19 v 16:05 Tom Hughes napsal(a):
On 07/11/2019 14:59, Victor Stinner wrote:
I cannot explain why PLT is needed when a libpython function calls a libpython function.
Because an exported symbol in an ELF shared library is subject to potential interposition using LD_PRELOAD so the calls need to go through the PLT to be resolved.
Not sure what PLT is (pre load table?), but is it something what could be disabled?
Procedure Linkage Table.
It can be disabled by using hidden symbols. For internal symbols, that is easy. For symbols that are used externally, I do not think we have good toolchain support. Link-time optimization can detect truly internal symbols and make them hidden. Some targets can also perform relaxation of relocations, eliminating most of the PLT indirection overhead if it turns out a function is not exported after all and therefore cannot be interposed. But that needs a version script, and it can't work for calls to functions that are in fact public.
In glibc, we create hidden aliases for public functions which should not be interposed. It's not too bad if you have preprocessor macros for this task, but you do need to annotate each function declaration and definition separately.
This sounds like the whole system could be 25% faster if we link statically.
Not reallly, quite a few system components already do this kind of optimization.
Toolchain support for this is quite poor however. Ideally, we would have a compilation mode that reuses the annotations that Windows uses, but given that our system headers currently lack __dllimport specifiers (or whatever they are called), even with that approach, it's quite a lot of work. I might be mistaken about this, but I think there was a huge conflict about some intermediate visibility setting (protected?) that might help with this, basically creating non-interposable function symbols, but I don't think it's usable for that in its current state.
Thx for explanation Florian.
Generally, I am against this change proposal, because:
- Apparently, there is some work which needs to be done on the
toolchain. Applying workarounds just hides the issues and we won't move forward ever.
I think it's more reasonable to do a small SPEC change in Python to achieve a 27% performance boost, than wait for the toolchain to catch up on things that are not well defined yet. I don't see that as a valid reason for not accepting the change, although you might want to elaborate further here.
- I am asking this questions, because I believe that the same issue
might suffer Ruby and others are concerned about Perl. Applying this just to Python is not systematic.
Maybe. Systematic or not when compared to other dynamic languages is not really relevant for this change to take effect. I don't know about Perl's or Ruby's architecture design, but is there a reason to keep them in line in that aspect? Or for any other aspect at all, apart from the general packaging guidelines?
However, if part of this change proposal was actually collecting the information what have to be done to have similar performance for the dynamically linked libraries comparing to static linking, if there is push to improve the toolchain and if there is generally better understanding of the issue, then I would not mind if this is accepted as temporary measure. Unfortunately, nothing like this is mentioned in the change proposal.
No this is not intended to be temporal, hence why it's not mentioned as such. The information has been collected as a case for the change. If other languages would like to conduct similar benchmarks and experiments they are free to do so, but the scope of this change is just for Python. Also it's not intended as a push for certain toolchain changes/optimizations, although thet would be more than welcome. In addition this change is not a case for the benefits of static linking in general, as we are not aware or experimented on other things, we outlined a specific case, which is in line with the packaging guidelines, since statically linking a binary against its own library is permitted.
This change intends to speed up Python by a significant margin in the context of the current toolchain, with the outlined effects on compatibility and package changes. We can debate on that, the rest seem irrelevant in the context of accepting it or not.
I agree. The benefit of this change for python (and hence, a big amount of software that runs on both users' *and* fedora infra machines) is rather big, while the impact on other packages is either non-existent, or pretty small. Additionally, I think Miro mentioned that this change was already done in other distros, so we're also not the first ones doing this.
(Side note: I wouldn't even have objected to this being a Self-Contained Change, since it basically only affects one package - albeit an important one (python3) - and the Change owners have offered to help with adapting other packages, in the very few cases where that would be necessary. I also trust the members of the Python SIG have done / will do their homework here, as they have done with both the python 3.8 transition and python2 retirement, which were / are both being executed almost flawlessly.)
Fabio
Vít
Thanks, Florian
devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
-- Regards,
Charalampos Stratakis Software Engineer Python Maintenance Team, Red Hat _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
On 08. 11. 19 20:40, Fabio Valentini wrote:
(Side note: I wouldn't even have objected to this being a Self-Contained Change, since it basically only affects one package - albeit an important one (python3)
Doing this as a system wide change was my decision, Harris (Charalampos) wanted to do this as self contained IIRC.
I believe that Fedora is at a point where almost every non-cosmetic Python change should be treated as system wide. In this case, changes need to be done in the dnf stack and in the samba stack, so the decision was clear for me.
Dne 08. 11. 19 v 20:40 Fabio Valentini napsal(a):
On Fri, Nov 8, 2019 at 6:01 PM Charalampos Stratakis cstratak@redhat.com wrote:
----- Original Message -----
From: "Vít Ondruch" vondruch@redhat.com Cc: devel@lists.fedoraproject.org Sent: Friday, November 8, 2019 10:01:47 AM Subject: Re: Fedora 32 System-Wide Change proposal: Build Python 3 to statically link with libpython3.8.a for better performance
Dne 07. 11. 19 v 23:08 Florian Weimer napsal(a):
- Vít Ondruch:
Dne 07. 11. 19 v 16:05 Tom Hughes napsal(a):
On 07/11/2019 14:59, Victor Stinner wrote:
> I cannot explain why PLT is needed when a libpython function calls a > libpython function. Because an exported symbol in an ELF shared library is subject to potential interposition using LD_PRELOAD so the calls need to go through the PLT to be resolved.
Not sure what PLT is (pre load table?), but is it something what could be disabled?
Procedure Linkage Table.
It can be disabled by using hidden symbols. For internal symbols, that is easy. For symbols that are used externally, I do not think we have good toolchain support. Link-time optimization can detect truly internal symbols and make them hidden. Some targets can also perform relaxation of relocations, eliminating most of the PLT indirection overhead if it turns out a function is not exported after all and therefore cannot be interposed. But that needs a version script, and it can't work for calls to functions that are in fact public.
In glibc, we create hidden aliases for public functions which should not be interposed. It's not too bad if you have preprocessor macros for this task, but you do need to annotate each function declaration and definition separately.
This sounds like the whole system could be 25% faster if we link statically.
Not reallly, quite a few system components already do this kind of optimization.
Toolchain support for this is quite poor however. Ideally, we would have a compilation mode that reuses the annotations that Windows uses, but given that our system headers currently lack __dllimport specifiers (or whatever they are called), even with that approach, it's quite a lot of work. I might be mistaken about this, but I think there was a huge conflict about some intermediate visibility setting (protected?) that might help with this, basically creating non-interposable function symbols, but I don't think it's usable for that in its current state.
Thx for explanation Florian.
Generally, I am against this change proposal, because:
- Apparently, there is some work which needs to be done on the
toolchain. Applying workarounds just hides the issues and we won't move forward ever.
I think it's more reasonable to do a small SPEC change in Python to achieve a 27% performance boost, than wait for the toolchain to catch up on things that are not well defined yet. I don't see that as a valid reason for not accepting the change, although you might want to elaborate further here.
- I am asking this questions, because I believe that the same issue
might suffer Ruby and others are concerned about Perl. Applying this just to Python is not systematic.
Maybe. Systematic or not when compared to other dynamic languages is not really relevant for this change to take effect. I don't know about Perl's or Ruby's architecture design, but is there a reason to keep them in line in that aspect? Or for any other aspect at all, apart from the general packaging guidelines?
However, if part of this change proposal was actually collecting the information what have to be done to have similar performance for the dynamically linked libraries comparing to static linking, if there is push to improve the toolchain and if there is generally better understanding of the issue, then I would not mind if this is accepted as temporary measure. Unfortunately, nothing like this is mentioned in the change proposal.
No this is not intended to be temporal, hence why it's not mentioned as such. The information has been collected as a case for the change. If other languages would like to conduct similar benchmarks and experiments they are free to do so, but the scope of this change is just for Python. Also it's not intended as a push for certain toolchain changes/optimizations, although thet would be more than welcome. In addition this change is not a case for the benefits of static linking in general, as we are not aware or experimented on other things, we outlined a specific case, which is in line with the packaging guidelines, since statically linking a binary against its own library is permitted.
This change intends to speed up Python by a significant margin in the context of the current toolchain, with the outlined effects on compatibility and package changes. We can debate on that, the rest seem irrelevant in the context of accepting it or not.
I agree. The benefit of this change for python (and hence, a big amount of software that runs on both users' *and* fedora infra machines) is rather big, while the impact on other packages is either non-existent, or pretty small. Additionally, I think Miro mentioned that this change was already done in other distros, so we're also not the first ones doing this.
Now it looks like I am strongly against this change proposal, but what I wanted is this:
~~~
However, if part of this change proposal was actually collecting the information what have to be done to have similar performance for the dynamically linked libraries comparing to static linking, if there is push to improve the toolchain and if there is generally better understanding of the issue, then I would not mind if this is accepted as temporary measure. Unfortunately, nothing like this is mentioned in the change proposal.
~~~
I don't see anything from these points being part of the change proposal. I don't see that Python team really understands the issue. If they do, they don't do good job explaining it. All I see is "others do this and it improves speed, so lets do it as well".
It is also surprising that from all these points I got response just the "temporary" word, which is admittedly an easy target to pick upon if my arguments should be shot down. In this context, I don't mind if temporary is "years", but it should not be forgotten why the change was done and it should be reverted as soon as the toolchain improves.
Vít
(Side note: I wouldn't even have objected to this being a Self-Contained Change, since it basically only affects one package - albeit an important one (python3) - and the Change owners have offered to help with adapting other packages, in the very few cases where that would be necessary. I also trust the members of the Python SIG have done / will do their homework here, as they have done with both the python 3.8 transition and python2 retirement, which were / are both being executed almost flawlessly.)
Fabio
Vít
Thanks, Florian
devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
-- Regards,
Charalampos Stratakis Software Engineer Python Maintenance Team, Red Hat _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Charalampos Stratakis wrote:
From: "Vít Ondruch" vondruch@redhat.com
- Apparently, there is some work which needs to be done on the
toolchain. Applying workarounds just hides the issues and we won't move forward ever.
I think it's more reasonable to do a small SPEC change in Python to achieve a 27% performance boost, than wait for the toolchain to catch up on things that are not well defined yet. I don't see that as a valid reason for not accepting the change, although you might want to elaborate further here.
Sorry, but I'm with Vít there. If Python is running into toolchain limitations, the goal should be to work on improving the toolchain, not to add a hack with side effects (bloat, compatibility issues) to the Python package, a hack with which we will then get stuck with forever (because you admitted yourself that you do not intend it to be temporary).
Kevin Kofler
On Sat, Nov 9, 2019 at 8:31 AM Kevin Kofler kevin.kofler@chello.at wrote:
Sorry, but I'm with Vít there. If Python is running into toolchain limitations, the goal should be to work on improving the toolchain, not to add a hack with side effects (bloat, compatibility issues) to the Python package, a hack with which we will then get stuck with forever (because you admitted yourself that you do not intend it to be temporary).
How is statically linked libpython hack? It's just a different way to do it, isn't it? And if toolchain needs some improving, fine, but why should we have lower performance and keep waiting on it if there is a solution available right now? Solution that's been tested and proven to work in other high profile Distributions (Ubuntu, Debian). And size increase? It's so tiny, I can't imagine why should that matter at all.
Also, this is change to Python ecosystem in Fedora, it does not depend on Ruby, Perl and others.
Anyhow, I am very much for this proposal.
Frantisek Zatloukal wrote:
How is statically linked libpython hack? It's just a different way to do it, isn't it?
It means you are shipping 2 copies of the Python interpreter, one statically linked into the python3 binary and one as a shared library. This is much less elegant than shipping a single shared copy of the code.
You also need to actually build all the code twice to actually get the performance improvements, because if you just statically link the PIC objects (built for the shared library) into the binary, the performance will not noticeably improve.
And if toolchain needs some improving, fine, but why should we have lower performance and keep waiting on it if there is a solution available right now?
Because sometimes it is better to wait a bit for an elegant solution than to rush out a quick hack that we then end up stuck with.
And size increase? It's so tiny, I can't imagine why should that matter at all.
We are talking about megabytes! That is not tiny at all!
Each size increase always gets filed off with the same "it's so tiny" excuse, except that several of those "tiny" size increases (even the ones that are actually tiny, in the kilobyte range) end up adding up to dozens of megabytes of bloat, to the point where our live images keep growing and growing, increasing download sizes for all users, and making some of them unsuitable for the physical media they were originally intended for. (CD size seems already no longer reachable for most images, but if this trend continues, we will end up blowing DVD size as well!)
The Fedora 31 KDE Spin is 1 854 996 480 bytes. A decade ago, the size target was CD size, i.e., 700 000 000 bytes. Then, the size target was bumped to 1 000 000 000 bytes, and it went upwards from there. Now, the size has grown by a factor of almost 3 in only a decade! So I am really really fed up of all those "so tiny, I can't imagine why should that matter at all" size increases.
Also, this is change to Python ecosystem in Fedora, it does not depend on Ruby, Perl and others.
I never claimed otherwise. (Though, if those decide to implement the same hack, the bloat will become even worse.)
Kevin Kofler
On Sunday, November 10, 2019, Kevin Kofler kevin.kofler@chello.at wrote:
Frantisek Zatloukal wrote:
How is statically linked libpython hack? It's just a different way to do it, isn't it?
It means you are shipping 2 copies of the Python interpreter, one statically linked into the python3 binary and one as a shared library. This is much less elegant than shipping a single shared copy of the code.
You also need to actually build all the code twice to actually get the performance improvements, because if you just statically link the PIC objects (built for the shared library) into the binary, the performance will not noticeably improve.
And if toolchain needs some improving, fine, but why should we have lower performance and keep waiting on it if there is a solution available right now?
Because sometimes it is better to wait a bit for an elegant solution than to rush out a quick hack that we then end up stuck with.
And size increase? It's so tiny, I can't imagine why should that matter
at
all.
We are talking about megabytes! That is not tiny at all!
It is - even ssd storage is reasonably cheap nowadays. Other changes like "upgrade foo to n+1" are most likely also increase size no one seem to care because it's not that written in the change proposal.
Anyways the performance win here easily justifies the tine space increase that no one would notice in practice.
On Thu, 07 Nov 2019 15:59:41 +0100, Victor Stinner wrote:
I cannot explain why inlining cannot be done more often in libpython.
I cannot explain why PLT is needed when a libpython function calls a libpython function.
Could you re-run the benchmark with shared library but with -fno-semantic-interposition? I have run it locally but it takes a lot of time.
Thanks, Jan
On Thu, 07 Nov 2019 22:36:44 +0100, Jan Kratochvil wrote:
On Thu, 07 Nov 2019 15:59:41 +0100, Victor Stinner wrote:
I cannot explain why inlining cannot be done more often in libpython.
I cannot explain why PLT is needed when a libpython function calls a libpython function.
Could you re-run the benchmark with shared library but with -fno-semantic-interposition? I have run it locally but it takes a lot of time.
nbody python3-3.7.5-1.fc30.x86_64: Mean +- std dev: 217 ms +- 2 ms nbody -fno-semantic-interposition: Mean +- std dev: 203 ms +- 3 ms - 6.9% nbody static linkage claim: -27%
So -fno-semantic-interposition does help but it is not the whole static gain.
Jan
Hi Jan,
With the helper of Florian Weimer and Charalampos Stratakis, we also agreed to test this flag in priority. I understood that it disables the LD_PRELOAD feature: it's no longer possible to override symbols in libpython with LD_PRELOAD. Thanks to that, the compiler can avoid PLT indirection for function calls and can inline more function functions in libpython. I'm talking about a function call from libpython to libpython: something which is very common in python. Basically, almost all function calls are calls from libpython to libpython.
I'm impressed. Thanks to -fno-semantic-interposition, I get the same speedup on a dynamically linked Python (libpython) compared to statically linked Python!
Yesterday, I tried on a vanilla Python compiled manually:
./configure --enable-optimizations --with-lto --enable-shared CFLAGS="-fno-semantic-interposition" LDFLAGS="-fno-semantic-interposition"
I saw the same speed up than avoiding --enable-shared. Today I validated this result using the RPM generated by Charalampos's PR: https://src.fedoraproject.org/rpms/python38/pull-request/53
In short, https://fedoraproject.org/wiki/Changes/PythonStaticSpeedup is useless: there is no need to modify Python to statically link it to libpython. We can keep the dynamically library libpython and keep Python dynamically linked to it. We only need to pass -fno-semantic-interposition to compiler and linker flags when building Python!
I'm not sure if we need a Fedora change just for a compiler flag. Again, the only drawback is that we will no longer be able to override a symbol using LD_PRELOAD. Honestly, I never did that. I don't see any use case for that. But I used LD_PRELOAD on the libc multiple times to mock the system clock for example.
If someone really needs LD_PRELOAD, it's quite easy to build a custom Python without -fno-semantic-interposition.
Victor
Dne 15. 11. 19 v 10:21 Victor Stinner napsal(a):
I'm not sure if we need a Fedora change just for a compiler flag. Again, the only drawback is that we will no longer be able to override a symbol using LD_PRELOAD. Honestly, I never did that. I don't see any use case for that. But I used LD_PRELOAD on the libc multiple times to mock the system clock for example.
If someone really needs LD_PRELOAD, it's quite easy to build a custom Python without -fno-semantic-interposition.
Mock's Nosync plugin use LD_PRELOAD: https://github.com/rpm-software-management/mock/wiki/Feature-nosync
On Fri, Nov 15, 2019 at 01:23:09PM +0100, Miroslav Suchý wrote:
Dne 15. 11. 19 v 10:21 Victor Stinner napsal(a):
I'm not sure if we need a Fedora change just for a compiler flag. Again, the only drawback is that we will no longer be able to override a symbol using LD_PRELOAD. Honestly, I never did that. I don't see any use case for that. But I used LD_PRELOAD on the libc multiple times to mock the system clock for example.
If someone really needs LD_PRELOAD, it's quite easy to build a custom Python without -fno-semantic-interposition.
Mock's Nosync plugin use LD_PRELOAD: https://github.com/rpm-software-management/mock/wiki/Feature-nosync
IIUC mock would not be affected by this change.
The LD_PRELOAD limitation described applies to symbols that are in the libpython.so library.
Those docs suggest mock is replacing the fsync() API in glibc with its LD_PRELOAD, so that should continue to work as normal.
Regards, Daniel
On Fri, 2019-11-15 at 12:31 +0000, Daniel P. Berrangé wrote:
On Fri, Nov 15, 2019 at 01:23:09PM +0100, Miroslav Suchý wrote:
Dne 15. 11. 19 v 10:21 Victor Stinner napsal(a):
I'm not sure if we need a Fedora change just for a compiler flag. Again, the only drawback is that we will no longer be able to override a symbol using LD_PRELOAD. Honestly, I never did that. I don't see any use case for that. But I used LD_PRELOAD on the libc multiple times to mock the system clock for example.
If someone really needs LD_PRELOAD, it's quite easy to build a custom Python without -fno-semantic-interposition.
Mock's Nosync plugin use LD_PRELOAD: https://github.com/rpm-software-management/mock/wiki/Feature-nosync
IIUC mock would not be affected by this change.
The LD_PRELOAD limitation described applies to symbols that are in the libpython.so library.
Those docs suggest mock is replacing the fsync() API in glibc with its LD_PRELOAD, so that should continue to work as normal.
Regards, Daniel
Thinking aloud: does anyone ever use symbol overriding for anything other than glibc?
What would it do to distro-wide performance if -fno-semantic-interposition were added to the default rpm build flags, (and glibc added -fsemantic- interposition to override this)?
Basically, change the default distro-wide to libraries opting-in to being able to be interposed, rather than opting-out (-fsemantic- interposition appears to be on by default, looking at the source for gcc).
Would other workloads get benefit? How much would break?
(Not that I'm volunteering to run the experiment myself)
Hope this is constructive Dave
Dne 15. 11. 19 v 15:51 David Malcolm napsal(a):
On Fri, 2019-11-15 at 12:31 +0000, Daniel P. Berrangé wrote:
On Fri, Nov 15, 2019 at 01:23:09PM +0100, Miroslav Suchý wrote:
Dne 15. 11. 19 v 10:21 Victor Stinner napsal(a):
I'm not sure if we need a Fedora change just for a compiler flag. Again, the only drawback is that we will no longer be able to override a symbol using LD_PRELOAD. Honestly, I never did that. I don't see any use case for that. But I used LD_PRELOAD on the libc multiple times to mock the system clock for example.
If someone really needs LD_PRELOAD, it's quite easy to build a custom Python without -fno-semantic-interposition.
Mock's Nosync plugin use LD_PRELOAD: https://github.com/rpm-software-management/mock/wiki/Feature-nosync
IIUC mock would not be affected by this change.
The LD_PRELOAD limitation described applies to symbols that are in the libpython.so library.
Those docs suggest mock is replacing the fsync() API in glibc with its LD_PRELOAD, so that should continue to work as normal.
Regards, Daniel
Thinking aloud: does anyone ever use symbol overriding for anything other than glibc?
What would it do to distro-wide performance if -fno-semantic-interposition were added to the default rpm build flags, (and glibc added -fsemantic- interposition to override this)?
Basically, change the default distro-wide to libraries opting-in to being able to be interposed, rather than opting-out (-fsemantic- interposition appears to be on by default, looking at the source for gcc).
+1
Because this was from the beginning my concern. Why do it just for Python if possibly the whole distribution could benefit.
Vít
Would other workloads get benefit? How much would break?
(Not that I'm volunteering to run the experiment myself)
Hope this is constructive Dave _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
On 15. 11. 19 16:20, Vít Ondruch wrote:
Dne 15. 11. 19 v 15:51 David Malcolm napsal(a):
On Fri, 2019-11-15 at 12:31 +0000, Daniel P. Berrangé wrote:
On Fri, Nov 15, 2019 at 01:23:09PM +0100, Miroslav Suchý wrote:
Dne 15. 11. 19 v 10:21 Victor Stinner napsal(a):
I'm not sure if we need a Fedora change just for a compiler flag. Again, the only drawback is that we will no longer be able to override a symbol using LD_PRELOAD. Honestly, I never did that. I don't see any use case for that. But I used LD_PRELOAD on the libc multiple times to mock the system clock for example.
If someone really needs LD_PRELOAD, it's quite easy to build a custom Python without -fno-semantic-interposition.
Mock's Nosync plugin use LD_PRELOAD: https://github.com/rpm-software-management/mock/wiki/Feature-nosync
IIUC mock would not be affected by this change.
The LD_PRELOAD limitation described applies to symbols that are in the libpython.so library.
Those docs suggest mock is replacing the fsync() API in glibc with its LD_PRELOAD, so that should continue to work as normal.
Regards, Daniel
Thinking aloud: does anyone ever use symbol overriding for anything other than glibc?
What would it do to distro-wide performance if -fno-semantic-interposition were added to the default rpm build flags, (and glibc added -fsemantic- interposition to override this)?
Basically, change the default distro-wide to libraries opting-in to being able to be interposed, rather than opting-out (-fsemantic- interposition appears to be on by default, looking at the source for gcc).
+1
Because this was from the beginning my concern. Why do it just for Python if possibly the whole distribution could benefit.
I'm not saying we shouldn't. It's a good idea (to explore).
Why not start with Python and if it proves working, continue form there?
The benefit is that in Python, we would handle the Python change and the revert would be just one package, in case unforeseen problems occur.
On Fri, 2019-11-15 at 16:28 +0100, Miro Hrončok wrote:
On 15. 11. 19 16:20, Vít Ondruch wrote:
Dne 15. 11. 19 v 15:51 David Malcolm napsal(a):
On Fri, 2019-11-15 at 12:31 +0000, Daniel P. Berrangé wrote:
On Fri, Nov 15, 2019 at 01:23:09PM +0100, Miroslav Suchý wrote:
Dne 15. 11. 19 v 10:21 Victor Stinner napsal(a):
I'm not sure if we need a Fedora change just for a compiler flag. Again, the only drawback is that we will no longer be able to override a symbol using LD_PRELOAD. Honestly, I never did that. I don't see any use case for that. But I used LD_PRELOAD on the libc multiple times to mock the system clock for example.
If someone really needs LD_PRELOAD, it's quite easy to build a custom Python without -fno-semantic-interposition.
Mock's Nosync plugin use LD_PRELOAD: https://github.com/rpm-software-management/mock/wiki/Feature-nosync
IIUC mock would not be affected by this change.
The LD_PRELOAD limitation described applies to symbols that are in the libpython.so library.
Those docs suggest mock is replacing the fsync() API in glibc with its LD_PRELOAD, so that should continue to work as normal.
Regards, Daniel
Thinking aloud: does anyone ever use symbol overriding for anything other than glibc?
What would it do to distro-wide performance if -fno-semantic-interposition were added to the default rpm build flags, (and glibc added -fsemantic- interposition to override this)?
Basically, change the default distro-wide to libraries opting-in to being able to be interposed, rather than opting-out (-fsemantic- interposition appears to be on by default, looking at the source for gcc).
+1
Because this was from the beginning my concern. Why do it just for Python if possibly the whole distribution could benefit.
I'm not saying we shouldn't. It's a good idea (to explore).
Why not start with Python and if it proves working, continue form there?
The benefit is that in Python, we would handle the Python change and the revert would be just one package, in case unforeseen problems occur.
Indeed, it would be a massive scope creep compared to your feature; I just thought it worth mentioning as an idea - I don't want to derail your work (and thanks for speeding up python!)
Dave
On Fri, Nov 15, 2019 at 09:51:45AM -0500, David Malcolm wrote:
On Fri, 2019-11-15 at 12:31 +0000, Daniel P. Berrangé wrote:
On Fri, Nov 15, 2019 at 01:23:09PM +0100, Miroslav Suchý wrote:
Dne 15. 11. 19 v 10:21 Victor Stinner napsal(a):
I'm not sure if we need a Fedora change just for a compiler flag. Again, the only drawback is that we will no longer be able to override a symbol using LD_PRELOAD. Honestly, I never did that. I don't see any use case for that. But I used LD_PRELOAD on the libc multiple times to mock the system clock for example.
If someone really needs LD_PRELOAD, it's quite easy to build a custom Python without -fno-semantic-interposition.
Mock's Nosync plugin use LD_PRELOAD: https://github.com/rpm-software-management/mock/wiki/Feature-nosync
IIUC mock would not be affected by this change.
The LD_PRELOAD limitation described applies to symbols that are in the libpython.so library.
Those docs suggest mock is replacing the fsync() API in glibc with its LD_PRELOAD, so that should continue to work as normal.
Regards, Daniel
Thinking aloud: does anyone ever use symbol overriding for anything other than glibc?
What would it do to distro-wide performance if -fno-semantic-interposition were added to the default rpm build flags, (and glibc added -fsemantic- interposition to override this)?
Basically, change the default distro-wide to libraries opting-in to being able to be interposed, rather than opting-out (-fsemantic- interposition appears to be on by default, looking at the source for gcc).
Would other workloads get benefit? How much would break?
It'd break libvirt's entire test suite. We rely on being able to mock symbols inside libvirt.so, as well as libc, for unit testing.
Regards, Daniel
* David Malcolm:
What would it do to distro-wide performance if -fno-semantic-interposition were added to the default rpm build flags, (and glibc added -fsemantic- interposition to override this)?
glibc already does the equivalent of -fno-semantic-interposition manually. We even have a test case that only certain select symbols are exempted (mostly malloc). But you cannot interpose the open function and expect that it will alter the behavior of fopen, or anything else that calls fopen under the covers. This kind of internal interposition is also inhibited by -fno-semantic-interposition in combination with LTO and controls on symbol visibility within the linker.
I'm sure there have been previous discussions about -Bsymbolic, which does something similar at the linker/dynamic loader level. I wouldn't want Fedora to switch the default here, the toolchain default should change first, for cross-distribution consistency.
Thanks, Florian
On 2019-11-15 at 14:51 UTC, David Malcolm wrote:
Thinking aloud: does anyone ever use symbol overriding for anything other than glibc?
Yes. It is particularly useful for "spear fishing" debugging of lower-level interfaces in large, complex multi-process applications. By some means you determine that [part of] the bug involves a bad parameter to a particular API, but a conditional breakpoint in gdb has too much overhead (if you can figure out at all how to invoke gdb in the cloud of processes.) So: LD_PRELOAD a .so which overrides the API and checks the parameter. If no problem then pass control to the original implementation via RTLD_NEXT. If bad, then raise an alarm, prepare a backtrace, pause or spin until rescued by manual attach of gdb, etc.
* John Reiser:
On 2019-11-15 at 14:51 UTC, David Malcolm wrote:
Thinking aloud: does anyone ever use symbol overriding for anything other than glibc?
Yes. It is particularly useful for "spear fishing" debugging of lower-level interfaces in large, complex multi-process applications. By some means you determine that [part of] the bug involves a bad parameter to a particular API, but a conditional breakpoint in gdb has too much overhead (if you can figure out at all how to invoke gdb in the cloud of processes.) So: LD_PRELOAD a .so which overrides the API and checks the parameter. If no problem then pass control to the original implementation via RTLD_NEXT. If bad, then raise an alarm, prepare a backtrace, pause or spin until rescued by manual attach of gdb, etc.
That only seems to need shallow interposition, though. In most cases, I doubt you are interested in API calls from the library self because those are probably unproblematic.
Thanks, Florian
Thinking aloud: does anyone ever use symbol overriding for anything other than glibc?
Yes. It is particularly useful for "spear fishing" debugging of lower-level interfaces in large, complex multi-process applications.
That only seems to need shallow interposition, though. In most cases, I doubt you are interested in API calls from the library self because those are probably unproblematic.
One actual case: why exp(600.0) ? Yes, the first use of overriding was shallow and libm (part of glibc). But the caller was deep within a scientific library, and the second overriding was not shallow at all.
* John Reiser:
Thinking aloud: does anyone ever use symbol overriding for anything other than glibc?
Yes. It is particularly useful for "spear fishing" debugging of lower-level interfaces in large, complex multi-process applications.
That only seems to need shallow interposition, though. In most cases, I doubt you are interested in API calls from the library self because those are probably unproblematic.
One actual case: why exp(600.0) ? Yes, the first use of overriding was shallow and libm (part of glibc). But the caller was deep within a scientific library, and the second overriding was not shallow at all.
That's still unaffected. What I meant is that you can still alter calls at library boundaries. Only purely internal calls are gone.
Thanks, Florian
This Change has been withdrawn and replaced with https://fedoraproject.org/wiki/Changes/PythonNoSemanticInterpositionSpeedup
Discussion is at https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/...
Hi
On Fri, Nov 15, 2019 at 10:22 AM Victor Stinner vstinner@redhat.com wrote:
Hi Jan,
With the helper of Florian Weimer and Charalampos Stratakis, we also agreed to test this flag in priority. I understood that it disables the LD_PRELOAD feature: it's no longer possible to override symbols in libpython with LD_PRELOAD. Thanks to that, the compiler can avoid PLT indirection for function calls and can inline more function functions in libpython. I'm talking about a function call from libpython to libpython: something which is very common in python. Basically, almost all function calls are calls from libpython to libpython.
I'm impressed. Thanks to -fno-semantic-interposition, I get the same speedup on a dynamically linked Python (libpython) compared to statically linked Python!
Yesterday, I tried on a vanilla Python compiled manually:
./configure --enable-optimizations --with-lto --enable-shared CFLAGS="-fno-semantic-interposition" LDFLAGS="-fno-semantic-interposition"
I saw the same speed up than avoiding --enable-shared. Today I validated this result using the RPM generated by Charalampos's PR: https://src.fedoraproject.org/rpms/python38/pull-request/53
In short, https://fedoraproject.org/wiki/Changes/PythonStaticSpeedup is useless: there is no need to modify Python to statically link it to libpython. We can keep the dynamically library libpython and keep Python dynamically linked to it. We only need to pass -fno-semantic-interposition to compiler and linker flags when building Python!
I'm not sure if we need a Fedora change just for a compiler flag. Again, the only drawback is that we will no longer be able to override a symbol using LD_PRELOAD. Honestly, I never did that. I don't see any use case for that. But I used LD_PRELOAD on the libc multiple times to mock the system clock for example.
Please do file a Change. It works not only as a coordination tracker, but also as a tool to inform everyone about changes happening in Fedora.
The discussion on the topic has been very interesting, as well as the outcome. I think it would be nice to see the summary with the estimated impact and highlight it via Release Notes.
If someone really needs LD_PRELOAD, it's quite easy to build a custom Python without -fno-semantic-interposition.
Victor _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
On 15. 11. 19 13:24, Aleksandra Fedorova wrote:
Hi
On Fri, Nov 15, 2019 at 10:22 AM Victor Stinner vstinner@redhat.com wrote:
Hi Jan,
With the helper of Florian Weimer and Charalampos Stratakis, we also agreed to test this flag in priority. I understood that it disables the LD_PRELOAD feature: it's no longer possible to override symbols in libpython with LD_PRELOAD. Thanks to that, the compiler can avoid PLT indirection for function calls and can inline more function functions in libpython. I'm talking about a function call from libpython to libpython: something which is very common in python. Basically, almost all function calls are calls from libpython to libpython.
I'm impressed. Thanks to -fno-semantic-interposition, I get the same speedup on a dynamically linked Python (libpython) compared to statically linked Python!
Yesterday, I tried on a vanilla Python compiled manually:
./configure --enable-optimizations --with-lto --enable-shared CFLAGS="-fno-semantic-interposition" LDFLAGS="-fno-semantic-interposition"
I saw the same speed up than avoiding --enable-shared. Today I validated this result using the RPM generated by Charalampos's PR: https://src.fedoraproject.org/rpms/python38/pull-request/53
In short, https://fedoraproject.org/wiki/Changes/PythonStaticSpeedup is useless: there is no need to modify Python to statically link it to libpython. We can keep the dynamically library libpython and keep Python dynamically linked to it. We only need to pass -fno-semantic-interposition to compiler and linker flags when building Python!
I'm not sure if we need a Fedora change just for a compiler flag. Again, the only drawback is that we will no longer be able to override a symbol using LD_PRELOAD. Honestly, I never did that. I don't see any use case for that. But I used LD_PRELOAD on the libc multiple times to mock the system clock for example.
Please do file a Change. It works not only as a coordination tracker, but also as a tool to inform everyone about changes happening in Fedora.
Already working on it.
The discussion on the topic has been very interesting, as well as the outcome. I think it would be nice to see the summary with the estimated impact and highlight it via Release Notes.
Yes indeed.
----- Original Message -----
From: "Victor Stinner" vstinner@redhat.com To: devel@lists.fedoraproject.org Sent: Friday, November 15, 2019 10:21:44 AM Subject: Re: Fedora 32 System-Wide Change proposal: Build Python 3 to statically link with libpython3.8.a for better performance
Hi Jan,
With the helper of Florian Weimer and Charalampos Stratakis, we also agreed to test this flag in priority. I understood that it disables the LD_PRELOAD feature: it's no longer possible to override symbols in libpython with LD_PRELOAD. Thanks to that, the compiler can avoid PLT indirection for function calls and can inline more function functions in libpython. I'm talking about a function call from libpython to libpython: something which is very common in python. Basically, almost all function calls are calls from libpython to libpython.
I'm impressed. Thanks to -fno-semantic-interposition, I get the same speedup on a dynamically linked Python (libpython) compared to statically linked Python!
Yesterday, I tried on a vanilla Python compiled manually:
./configure --enable-optimizations --with-lto --enable-shared CFLAGS="-fno-semantic-interposition" LDFLAGS="-fno-semantic-interposition"
I saw the same speed up than avoiding --enable-shared. Today I validated this result using the RPM generated by Charalampos's PR: https://src.fedoraproject.org/rpms/python38/pull-request/53
In short, https://fedoraproject.org/wiki/Changes/PythonStaticSpeedup is useless: there is no need to modify Python to statically link it to libpython. We can keep the dynamically library libpython and keep Python dynamically linked to it. We only need to pass -fno-semantic-interposition to compiler and linker flags when building Python!
I'm not sure if we need a Fedora change just for a compiler flag. Again, the only drawback is that we will no longer be able to override a symbol using LD_PRELOAD. Honestly, I never did that. I don't see any use case for that. But I used LD_PRELOAD on the libc multiple times to mock the system clock for example.
If someone really needs LD_PRELOAD, it's quite easy to build a custom Python without -fno-semantic-interposition.
Victor _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Thanks Victor for running the benchmarks.
The change will be withdrawn and another self-contained one will be created. I think not being able to override symbols on the system python is a better tradeoff than the size/speed and possible incompatibilities.
Side note: the list of packages that still link to libpython without embedding the interpreter will be used to mass file bugs in order to unlink them.
* Ben Cotton:
https://fedoraproject.org/wiki/Changes/PythonStaticSpeedup
== Summary == Python 3 traditionally in Fedora was built with a shared library libpython3.?.so and the final binary was dynamically linked against that shared library. This change is about creating the static library and linking the final python3 binary against it, as it provides significant performance improvement, up to 27% depending on the workload. The static library will not be shipped. The shared library will continue to exist in a separate subpackage. In essence, python3 will no longer depend on libpython.
Will python still be PIE? Or will you disable hardening and build it as a position-dependent binary?
Thanks, Florian
Will python still be PIE? Or will you disable hardening and build it as a position-dependent binary?
Yes, the python ELF binary still uses PIE (Position Independent Executable). I checked the patched package:
$ file /usr/bin/python3.8 /usr/bin/python3.8: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=b69aa38762233169fa21b3943e1ca62f86b2358b, stripped
$ rpm -q python38 python38-3.8.0-666.fc30.x86_64
Victor
On 11/5/19, Ben Cotton wrote:
https://fedoraproject.org/wiki/Changes/PythonStaticSpeedup
== Summary == Python 3 traditionally in Fedora was built with a shared library libpython3.?.so and the final binary was dynamically linked against that shared library. This change is about creating the static library and linking the final python3 binary against it, as it provides significant performance improvement, up to 27% depending on the workload. The static library will not be shipped. The shared library will continue to exist in a separate subpackage. In essence, python3 will no longer depend on libpython.
<<snip>>
There are alternatives that provide gradations in the tradeoffs.
0) Include _Py_UnixMain in libpython3.POINTVER.so, also set ElfXX_Ehdr.e_entry and include enough -startfiles so that execve(libpython3.POINTVER.so, ...) will act as if execve(python, ...). Compare execve("/lib64/libc.so.6", ...) which prints the credits for glibc. Then python3 and libpython3.POINTVER.so can be hardlinked or symlinked. This removes one DT_NEEDED from the startup of python3.
1) Do not flag python3 and libpython3.POINTVER.so with DT_BIND_NOW or DF_BIND_NOW. This removes the need to perform relocation processing for every slot in the PLT at process startup.
2) Use -Wl,-Bsymbolic during the build (static bind) of libpython3.POINTVER.so. This removes all intra-library symbolic relocations (hence PLT slots) at the cost of also removing the ability to override (interpose) them.
3) Compile and build libpython3.POINTVER.so as ET_EXEC (without -fPIC, without -shared, without -fPIE), static bind with -Wl,-Ttext-segment=$(< /proc/sys/vm/mmap-min-addr) to put the library below the pages of any default ET_EXEC, static bind with --export-dynamic (or --dynamic-list=) to make visible all Python primitives, and enhance the dynamic linker ld-linux to dlopen(ET_EXEC, ...) as if ET_DYN but OR-in MAP_FIXED when mmap() of PT_LOAD. The dynamic linker can be tricked today by changing ElfXX_Ehdr.e_type from ET_EXEC to ET_DYN, as long as the linux kernel honors the hint of mmap(non_zero, ...) without MAP_FIXED.
Today's /lib64/libpython3.7m.so.1.0 occupies about 3.4 MB of pages, which fits between default mmap-min-addr of 64K and default -Ttext-segment of 4M.
Dne 05. 11. 19 v 16:03 Ben Cotton napsal(a):
== Summary == Python 3 traditionally in Fedora was built with a shared library libpython3.?.so and the final binary was dynamically linked against that shared library. This change is about creating the static library and linking the final python3 binary against it, as it provides significant performance improvement, up to 27% depending on the workload. The static library will not be shipped. The shared library will continue to exist in a separate subpackage. In essence, python3 will no longer depend on libpython.
It seems that we have one group of people who prefer speed and another group of people who prefer saved space.
Instead of focusing on a swiss-knife to satisfy everybody (which will not work), can we have python3-static **and** python3-dynamic (*) packages and let users decide which one will be installed and handle `/usr/bin/python3` using `alternatives(8)`? Then FESCO can "only" decide which one will be the default. And that is far less controversial than deciding whether you will be forced to use a time-saving or space-saving solution.
(*) The names can vary. It would be probably something like python3 vs. python3-dynamic or python3 vs. python3-static.
On 12. 11. 19 14:00, Miroslav Suchý wrote:
Dne 05. 11. 19 v 16:03 Ben Cotton napsal(a):
== Summary == Python 3 traditionally in Fedora was built with a shared library libpython3.?.so and the final binary was dynamically linked against that shared library. This change is about creating the static library and linking the final python3 binary against it, as it provides significant performance improvement, up to 27% depending on the workload. The static library will not be shipped. The shared library will continue to exist in a separate subpackage. In essence, python3 will no longer depend on libpython.
It seems that we have one group of people who prefer speed and another group of people who prefer saved space.
Instead of focusing on a swiss-knife to satisfy everybody (which will not work), can we have python3-static **and** python3-dynamic (*) packages and let users decide which one will be installed and handle `/usr/bin/python3` using `alternatives(8)`? Then FESCO can "only" decide which one will be the default. And that is far less controversial than deciding whether you will be forced to use a time-saving or space-saving solution.
While I realize that this might actually be a clever thing to do, as the Python maintainer, I don't want this for various reasons. Most importantly, it means we need to to "support" twice that many Python interpreters.
It would also create a problem in RPM requirements.
Suppose a package need /usr/bin/python3.8 to be dynamically linked. How do I express that? It would need to harcode some kind of /usr/libexec/python3.8-dynamic? Would this require custom shebangs... etc.? I really don't want to go that way. It's bad on RHEL 8 already, with "platform-python".
Note that this is my personal opinion, not a team opinion.
On 12. 11. 19 14:18, Miro Hrončok wrote:
On 12. 11. 19 14:00, Miroslav Suchý wrote:
Dne 05. 11. 19 v 16:03 Ben Cotton napsal(a):
== Summary == Python 3 traditionally in Fedora was built with a shared library libpython3.?.so and the final binary was dynamically linked against that shared library. This change is about creating the static library and linking the final python3 binary against it, as it provides significant performance improvement, up to 27% depending on the workload. The static library will not be shipped. The shared library will continue to exist in a separate subpackage. In essence, python3 will no longer depend on libpython.
It seems that we have one group of people who prefer speed and another group of people who prefer saved space.
Instead of focusing on a swiss-knife to satisfy everybody (which will not work), can we have python3-static **and** python3-dynamic (*) packages and let users decide which one will be installed and handle `/usr/bin/python3` using `alternatives(8)`? Then FESCO can "only" decide which one will be the default. And that is far less controversial than deciding whether you will be forced to use a time-saving or space-saving solution.
While I realize that this might actually be a clever thing to do, as the Python maintainer, I don't want this for various reasons. Most importantly, it means we need to to "support" twice that many Python interpreters.
It would also create a problem in RPM requirements.
Suppose a package need /usr/bin/python3.8 to be dynamically linked. How do I express that? It would need to harcode some kind of /usr/libexec/python3.8-dynamic? Would this require custom shebangs... etc.? I really don't want to go that way. It's bad on RHEL 8 already, with "platform-python".
Note that this is my personal opinion, not a team opinion.
I've confirmed this with the team. We are not going to do this, sorry.
We either do the change or don't. I'm personally fine with both options.
On Tuesday, November 12, 2019 6:18:00 AM MST Miro Hrončok wrote:
On 12. 11. 19 14:00, Miroslav Suchý wrote:
Dne 05. 11. 19 v 16:03 Ben Cotton napsal(a):
== Summary == Python 3 traditionally in Fedora was built with a shared library libpython3.?.so and the final binary was dynamically linked against that shared library. This change is about creating the static library and linking the final python3 binary against it, as it provides significant performance improvement, up to 27% depending on the workload. The static library will not be shipped. The shared library will continue to exist in a separate subpackage. In essence, python3 will no longer depend on libpython.
It seems that we have one group of people who prefer speed and another group of people who prefer saved space.
Instead of focusing on a swiss-knife to satisfy everybody (which will not work), can we have python3-static **and** python3-dynamic (*) packages and let users decide which one will be installed and handle `/usr/bin/python3` using `alternatives(8)`? Then FESCO can "only" decide which one will be the default. And that is far less controversial than deciding whether you will be forced to use a time-saving or space-saving solution.
While I realize that this might actually be a clever thing to do, as the Python maintainer, I don't want this for various reasons. Most importantly, it means we need to to "support" twice that many Python interpreters.
It would also create a problem in RPM requirements.
Suppose a package need /usr/bin/python3.8 to be dynamically linked. How do I express that? It would need to harcode some kind of /usr/libexec/python3.8-dynamic? Would this require custom shebangs... etc.? I really don't want to go that way. It's bad on RHEL 8 already, with "platform-python".
Note that this is my personal opinion, not a team opinion.
-- Miro Hrončok -- Phone: +420777974800 IRC: mhroncok _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
If that software was to be packaged, in this case, you'd simply: Requires: python3
change the shebang to /path/to/my/python3
However, I believe there's a third option here. It could be as simple as providing a python3-static in addition, and NOT using `alternatives`. This way, packages and scripts that actually need the performance improvements can directly call python3-static, and everything else just continues to work as it does now.
On 12. 11. 19 23:21, John M. Harris Jr wrote:
If that software was to be packaged, in this case, you'd simply: Requires: python3
change the shebang to /path/to/my/python3
I am strongly against any proposals that involve /path/to/my/python3.
However, I believe there's a third option here. It could be as simple as providing a python3-static in addition, and NOT using `alternatives`. This way, packages and scripts that actually need the performance improvements can directly call python3-static, and everything else just continues to work as it does now.
The idea here was to speed up Python for the benefit of the entire distro, not for those who choose to use it.
If a software is written in python and the maintaioners want to speed it up, there are more explicit actions they can take.
We don't want to invest our energy into this for the couple packages that will opt in. We want to invest our energy into making all Fedora software that happens to run on Python be generally faster.
And it is completely understandable that some trade-offs are OK for somebody and not OK for somebody else. We are reading this thread and I will urge FESCo to pay special attention to the negative responses.
John M. Harris Jr wrote:
However, I believe there's a third option here. It could be as simple as providing a python3-static in addition, and NOT using `alternatives`. This way, packages and scripts that actually need the performance improvements can directly call python3-static, and everything else just continues to work as it does now.
But the wasted space will be even more, because now you have libpython, the dynamic python3 linked against it, AND the python3-static binary. So it does not address the issue at all.
Kevin Kofler
On Tuesday, November 12, 2019 4:21:03 PM MST Kevin Kofler wrote:
But the wasted space will be even more, because now you have libpython, the dynamic python3 linked against it, AND the python3-static binary. So it does not address the issue at all.
Yes, that would be the case if something in one of the installer images used that python3-static package, which I admittedly did not consider.
Dne 13. 11. 19 v 0:21 Kevin Kofler napsal(a):
But the wasted space will be even more, because now you have libpython, the dynamic python3 linked against it, AND the python3-static binary. So it does not address the issue at all.
+1
"Requires: /path/to/my/python3" is no go. Because no maintainer knows what an user prefers. Speed or space. And you may end up in mixed environment where you waste both space and cpu.
Both packages should provides "python3" and it should be an user responsibility which python flavor will be installed on system.
On 11/12/19 2:21 PM, John M. Harris Jr wrote:
However, I believe there's a third option here. It could be as simple as providing a python3-static in addition, and NOT using `alternatives`.
Is that an option, though? From the discussion, I was under the impression that static vs dynamic python affected whether or not python extensions need to be linked to libpython*.*m.so. I'm unclear on that, though because I see some modules today that aren't linked to that library, though most of the ones I checked are.
On 14. 11. 19 2:48, Gordon Messmer wrote:
On 11/12/19 2:21 PM, John M. Harris Jr wrote:
However, I believe there's a third option here. It could be as simple as providing a python3-static in addition, and NOT using `alternatives`.
Is that an option, though? From the discussion, I was under the impression that static vs dynamic python affected whether or not python extensions need to be linked to libpython*.*m.so. I'm unclear on that, though because I see some modules today that aren't linked to that library, though most of the ones I checked are.
Currently (python3.8 executable uses dynamic libpython3.8.so):
- extension modules are not linked to libpython3.8.so by default - extension modules linked to libpython3.8.so (by cmake etc.) work just fine
After the change (python3.8 executable is "fat" and contains everything):
- extension modules are not linked to libpython3.8.so by default - extension modules linked to libpython3.8.so (by cmake etc.) might blow up
The extra "python3-static" thing mimics the second behavior, so:
- extension modules are not linked to libpython3.8.so by default - extension modules linked to libpython3.8.so (by cmake etc.): - work just fine with "default" python - might blow up with "python3-static" python
On 05. 11. 19 16:03, Ben Cotton wrote:
https://fedoraproject.org/wiki/Changes/PythonStaticSpeedup
== Summary == Python 3 traditionally in Fedora was built with a shared library libpython3.?.so and the final binary was dynamically linked against that shared library. This change is about creating the static library and linking the final python3 binary against it, as it provides significant performance improvement, up to 27% depending on the workload. The static library will not be shipped. The shared library will continue to exist in a separate subpackage. In essence, python3 will no longer depend on libpython.
Ben, please postpone the FESCo ticket.
We are exploring some of the interesting speedup proposals in this thread first.
Should I change the page back to incomplete? We don't want to repeat the entire process once we are ready again.
On 14. 11. 19 14:47, Miro Hrončok wrote:
On 05. 11. 19 16:03, Ben Cotton wrote:
https://fedoraproject.org/wiki/Changes/PythonStaticSpeedup
== Summary == Python 3 traditionally in Fedora was built with a shared library libpython3.?.so and the final binary was dynamically linked against that shared library. This change is about creating the static library and linking the final python3 binary against it, as it provides significant performance improvement, up to 27% depending on the workload. The static library will not be shipped. The shared library will continue to exist in a separate subpackage. In essence, python3 will no longer depend on libpython.
Ben, please postpone the FESCo ticket.
We are exploring some of the interesting speedup proposals in this thread first.
Should I change the page back to incomplete? We don't want to repeat the entire process once we are ready again.
OK, alternate change is:
https://fedoraproject.org/wiki/Changes/PythonNoSemanticInterpositionSpeedup
The old one is back to incomplete:
https://fedoraproject.org/wiki/Changes/PythonStaticSpeedup