On Wed, Jan 15, 2020 at 06:05:42PM +0100, Miro Hrončok wrote:
### File types (and bytecode caches)
The orthogonal dimension is the file type. Python standard library
contains directories with both "extension modules" (written in C
(usually) and compiled to `*.cpython-38-x86_64-linux-gnu.so` shared
object file) and "pure Python" modules (written in Python and saved
as `*.py` source file).
Each pure Python module comes in 4 files:
- `module.py` -- the source
- `__pycache__/module.cpython-38.pyc` -- regular (not optimized) bytecode cache
- `__pycache__/module.cpython-38.opt-1.pyc` -- optimized bytecode cache (level 1)
- `__pycache__/module.cpython-38.opt-2.pyc` -- optimized bytecode cache (level 2)
I suspect that the difference in speed between loading various .pyc
files is negligible. Do you have actual benchmarks for this?
### Solution 5: Stop shipping mandatory bytecode cache
This solution sounds simple: We do no longer ship the bytecode cache
mandatorily. Technically, we move the `.pyc` files to a subpackage
of `python3-libs` (or three different subpackages, that is not
important here). And we only *Recommend* them from `python3-libs` --
by default, the users get them, but for space critical Fedora
flavors (such as container images) the maintainers can opt-out and
so can the powerusers.
This would **save 18.6 MiB / 50%** -- quite a lot.
However, as said earlier, if the bytecode cache files are not there,
Python attempts to create them upon first import. That can result in
several problems, here we will try to propose how to workaround
them.
Below using a flag file in each __pycache__ directory is suggested.
What about a different route: having a flag file for all descendants
of a directory?
For example, /usr/lib/python3.8/.dont_write_bytecode
would cover all modules under /usr/lib/python3.8/.
If a .pyc file is present, python could still make use of it.
This would be a nicer solution because it wouldn't require modifying
individual packages, but would still avoid the selinux issues and
slowdowns from failed attempts to write the optimized files.
The __pycache__ files wouldn't need to exist at all.
Zbyszek