Fedora pagure confusion wrt EPEL
by Michael Schwendt
I don't do EPEL packaging. I never signed up as an "owner" of EPEL packages.
I don't want to be the new default owner of EPEL bugzilla tickets.
Where may I be able to stop this mess?
3 years, 7 months
python-pep8 is orphaned
by iliana weller
Hello,
I've orphaned python-pep8. pep8 was renamed to pycodestyle in 2016; it
received its last release in 2017. It should be removed from Fedora in a
future release.
I unfortunately don't have time to proceed with the full retirement
process myself. If somebody would like to pick it up:
https://fedoraproject.org/wiki/Orphaned_package_that_need_new_maintainers...
$ dnf repoquery --whatrequires python2-pep8
python2-autopep8-0:1.2.4-9.fc29.noarch
python2-pytest-pep8-0:1.0.6-15.fc29.noarch
python2-spyder-0:3.3.1-3.fc29.noarch
$ dnf repoquery --whatrequires python3-pep8
python3-autopep8-0:1.2.4-9.fc29.noarch
python3-hacking-0:1.1.0-3.fc29.noarch
python3-pytest-pep8-0:1.0.6-15.fc29.noarch
python3-spyder-0:3.3.1-3.fc29.noarch
See also https://bugzilla.redhat.com/show_bug.cgi?id=1667200's dependent
bugs.
(Please CC me on replies that need my attention.)
--
iliana weller <ilianaw(a)buttslol.net>
3 years, 7 months
Fedora 32 Self-Contained Change proposal: Build Python with
-fno-semantic-interposition for better performance
by Ben Cotton
https://fedoraproject.org/wiki/Changes/PythonNoSemanticInterpositionSpeedup
Simplified version of another change proposal|This change was
originally proposed for [[Releases/32|Fedora 32]] as
[[Changes/PythonStaticSpeedup]], however based on
[https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.o...
community feedback], it has been significantly reduced.
== Summary ==
We add the <code>-fno-semantic-interposition</code> compiler/linker
flag when building Python interpreters, as it provides significant
performance improvement, up to 27% depending on the workload. Users
will no longer be able to use LD_PRELOAD to override a symbol from
libpython, which we consider a good trade off for the speedup.
== Owner ==
* Name: [[User:Cstratak| Charalampos Stratakis]], [[User:Vstinner|
Victor Stinner]], [[User:Churchyard| Miro Hrončok]]
* Email: python-maint(a)redhat.com
* Shout-out: [[User:Jankratochvil|Jan Kratochvíl]] for first
suggesting this instead of the original proposal, followed by
[[User:Kkofler|Kevin Kofler]]. [[User:Fweimer|Florian Weimer]] for
providing answers to our questions. David Gray for originally
suggesting to link Python statically to gain performance.
== Detailed Description ==
When we build the Python interpreter with the
<code>-fno-semantic-interposition</code> compiler/linker flag, we can
achieve a performance gain of 5% to 27% depending on the workload.
Link time optimizations and profile guided optimizations also have a
greater impact when python3 is built this way.
As a negative side effect, it disables the LD_PRELOAD feature: it's no
longer possible to override symbols in libpython with LD_PRELOAD.
Interposition is enabled by default in compilers like GCC: function
calls to a library goes through a "Procedure Linkage Table" (PLT).
This indirection is required to allow a library loaded by LD_PRELOAD
environment variable to override a function. The indirection puts more
pressure on the CPU level 1 cache (instruction cache). In term of
performance, the main drawback is that function calls from a library
to the same library cannot be inlined, to respect the interposition
semantics. Inlining is usually a big win in term of performance.
Disabling interposition for libpython removes the overhead on function
calls by avoiding the PLT indirection, and allows to inline more
function calls. We're describing function calls from libpython to
libpython, something which is very common in Python: almost all
function calls are calls from libpython to libpython.
If Fedora users need to use LD_PRELOAD to override symbols in
libpython, the recommend way is to build a custom Python without
<code>-fno-semantic-interposition</code>.
It is still possible to use LD_PRELOAD to override symbols in other
libraries (for example in glibc).
=== Affected Pythons ===
Primarily, we will change the interpreter in the {{package|python3}}
package, that is Python 3.8 in Fedora 32 and any later version of
Python in future Fedora releases.
Impact on other Python packages (and generally software using Python)
is not anticipated (other than the possible speedup).
We will also change the
[https://developer.fedoraproject.org/tech/languages/python/multiple-python...
alternate Python interpreters] where possible and useful, primarily
the upstream supported versions of CPython, such as
{{package|python39}} (if already packaged), {{package|python37}} and
{{package|python36}}.
=== Affected Fedora releases ===
This is a Fedora 32 change and it will be implemented in Rawhide
(Fedora 32) only. Any future versions of Fedora will inherit the
change until it is reverted for some reason.
If it turns out that there are absolutely no issues, we might consider
backporting the speedup to already released Fedora versions (for
example Fedora 31). Such action would be separately coordinated with
[https://docs.fedoraproject.org/en-US/fesco/ FESCo].
== Benefit to Fedora ==
Python's performance will increase significantly depending on the
workload. Since many core components of the OS also depend on Python
this could lead to an increase in their performance as well, however
individual benchmarks will need to be conducted to verify the
performance gain for those components.
[https://pyperformance.readthedocs.io/ pyperformance] results,
ignoring differences smaller than 5%:
(See change proposal)
== Scope ==
* Proposal owners:
** Review and merge the
[https://src.fedoraproject.org/rpms/python3/pull-request/151 pull
request with the implementation].
** Monitor Koschei for significant problems.
** Backport the change to alternate Python versions.
* Other developers are encouraged to check if their package works as expected
* Release engineering: N/A (not needed for this Change) -- this change
does not require a mass rebuild nor any other special releng work
* Policies and guidelines: N/A (not needed for this Change)
* Trademark approval: N/A (not needed for this Change)
== Upgrade/compatibility impact ==
Python package maintainers should verify that their packages work as
expected and the only impact the end users should see is a performance
increase for workloads relying on Python.
== How To Test ==
Test that everything Python related in Fedora works as usual.
=== Was the flag applied test ===
You can test whether the <code>-fno-semantic-interposition</code> flag
was applied for your Python build:
<pre>
>>> import sysconfig
>>> '-fno-semantic-interposition' in (sysconfig.get_config_var('PY_CFLAGS') + sysconfig.get_config_var('PY_CFLAGS_NODIST'))
True
>>> '-fno-semantic-interposition' in (sysconfig.get_config_var('PY_LDFLAGS') + sysconfig.get_config_var('PY_LDFLAGS_NODIST'))
True
</pre>
Before the change, you would see <code>False</code>, <code>False</code>.
=== Performance test ===
The performance speedup can be measured using the official Python
benchmark suite [https://pyperformance.readthedocs.io/ pyperformance]:
see [https://pyperformance.readthedocs.io/usage.html#run-benchmarks
Run benchmarks].
== User Experience ==
Python based workloads should see a performance gain of up to 27%.
== Dependencies ==
This change is not dependent on anything else.
== Contingency Plan ==
* Contingency mechanism: If issues appear that cannot be fixed in a
timely manner the change can be easily reverted and will be considered
again for the next fedora release.
* Contingency deadline: Before the beta freeze of Fedora 32 (2020-02-25)
* Blocks release? Yes
* Blocks product? None
== Documentation ==
This change proposal has all the documentation.
See the [[Changes/PythonStaticSpeedup|previous change proposal]] and
the [https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.o...
thread about it on the devel mailing list] for more relevant
information about what we are not doing
--
Ben Cotton
He / Him / His
Fedora Program Manager
Red Hat
TZ=America/Indiana/Indianapolis
3 years, 7 months
Announcing new anitya integration and de-orphaning process
by Pierre-Yves Chibon
Good Morning Everyone,
Tomorrow we are planning on deploying a new version of pagure and
pagure-dist-git on production.
These changes come with two changes to the packager workflow:
* Anitya integration in dist-git
Something we lost when loosing pkgdb was the easy integration to anitya
(https://release-monitoring.org). With the coming changes we are getting them
back.
On the left hand-side column, there will be a drop-down button allowing to
change the settings for anitya for the project.
Existing status will be migrated from the fedora-scm-requests repo on pagure to
use this drop-down.
Using the fedora-scm-requests repo for the anitya integration will no longer be
supported.
* Change in the de-orphaning process
Currently if a package is orphaned, one has to open a ticket against the releng
project to adopt it. With these changes, anyone will be able to adopt orphaned
projects (not retired on master) directly from dist-git's UI.
If the project is retired or has been orphaned for too long, a ticket on the
releng project will still be required though.
Both of these changes can already be reviewed in staging at:
https://src.stg.fedoraproject.org
Looking forward for your feedback!
Pierre
_______________________________________________
devel-announce mailing list -- devel-announce(a)lists.fedoraproject.org
To unsubscribe send an email to devel-announce-leave(a)lists.fedoraproject.org
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel-announce@lists.fedora...
3 years, 7 months
Orphaning owncloud and nextcloud
by James Hogarth
Hi all,
It's become clear that I haven't had the time I thought I'd have this past
year due to $life ...
These are in a bit of a broken state and right now I'd advise people that
need them to use upstream packages/containers.
I don't foresee sufficient time coming in the near future with family needs
in advance of hobbies like Fedora of course.
I'll give it a week or so for anyone to contact me who wants to pick them
up, otherwise I'll update pagure to assign them to "orphan"
James
3 years, 8 months
RFC: Python minimization in Fedora
by Miro Hrončok
Hello Fedora!
In Python Maint, we sat down and we came up with several ideas how to minimize
the filesystem footprint of Python. Unfortunately, the result is horribly long,
sorry about that.
Please, share your feedback, additional solutions, comments etc.
Version with formatting and pictures is available at:
https://github.com/hroncok/python-minimization/blob/master/document.md
Enclosing here for better in-line responses:
# Python minimization in Fedora
> While Fedora is well suited for traditional physical/virtual workstations and
servers, it is often overlooked for use cases beyond traditional installs.
>
> Some modern types of deployments — such as IoT and containers — are quite
sensitive to size. For IoT that's usually slow data connections (for
updates/management) and for cloud and containers it’s the massive scale.
-- the preamble of the [Fedora Minimization
Objective](https://docs.fedoraproject.org/en-US/minimization/)
One of the biggest things in Fedora is Python. Because [Fedora loves
Python](https://fedoralovespython.org/) and because the package manager for
Fedora packages -- dnf -- happens to be written in Python, the Python
interpreter and its standard library comes pre-installed on many (if not all)
Fedora systems and is often not possible to remove it without destroying the
system completely or making it unmanageable.
Python comes with [Batteries
Included](https://en.wikipedia.org/wiki/Batteries_Included) -- the standard
library is quite big. While pleasant for the programmers, this comes with a
large filesystem footprint not entirely desired in Fedora. In this document, we
will analyze the footprint and offer several minimization solutions/ideas with
their challenges, pros (MiB saved) and cons. It is a list of ideas; **we're not
promising to do any of this**.
**Goal:**
1. Significantly lower the filesystem footprint of the mandatory Python
installation in Fedora.
**Non-goals:**
1. We don't aim to lower the filesystem footprint of all Python installations
in Fedora -- the default may remain big, if there is an opt-out mechanism.
2. We don't aim to lower the filesystem footprint of all Fedora Python RPM
packages, just the `python3` package and its subpackages -- the interpreter and
the standard library.
However, if any non-goal becomes a side effect of the solution of our goal, good.
**Constraints:**
1. Do not break Python users' expectations. As an example, we don't strip
Python standard library to the bare minimum and still call it Python.
2. Do not break Fedora users' expectations. As an example, we don't break the
ability to hot patch Python files on a live system by default.
3. Do not break Fedora packagers' expectations. As an example, we don't
[require "system tools" to use a custom Python
entrypoint](https://fedoraproject.org/wiki/Changes/System_Python), such as
`/usr/libexec/platform-python` or `/usr/libexec/system-python`.
4. Do not significantly increase the filesystem footprint of the default
Python installation. As an example, we don't package [two separate versions (and
stacks) of Python](https://fedoraproject.org/wiki/Changes/Platform_Python_Stack)
-- one minimal for dnf (or Ansible) and another "normal" for the users.
5. Do not diverge from upstream significantly (but we can drive upstream
change). As an example, we don't reinvent the import machinery of Python
downstream only, but we might do it in upstream and even [use Fedora to pioneer
the change](https://fedoraproject.org/wiki/Changes/python3_c.utf-8_locale).
The listed constraints are not absolute. We will mention in each solution,
whether we feel that some constraints are violated, but that doesn't mean we
shall outright discard the solution.
## How large is Python, actually
tl;dr Python 3.8.1 in Fedora has 111 MiB (approximately 77 3.5" floppy disks),
but we only **install 37.5 MiB by default** (26 floppy disks).

In Fedora we split the Python interpreter into various RPM subpackages, some of
them are optional. This is what you get all the time:
- `python3` contains `/usr/bin/python3` and friends; has 21 KiB.
- `python3-libs` contains `/usr/lib64/libpython3.8.so.1.0` and the majority of
the standard library, is required by `python3`; has 37.5 MiB.
And this is what you get optionally:
- `python3-devel` contains the "development files" and makes it possible to
compile extension modules, or build RPM packages with Python modules; has 4.5 MiB.
- `python3-tkinter` contains the `tkinter` module and several others depending
on it (e.g. `turtle`), it is *Recommended* (not *Required*) by `python3` when
the *tk* framework is installed, to avoid an unnecessary dependency on *tk* and
*X*; has 2 MiB.
- `python3-idle` contains the [Python's Integrated Development and Learning
Environment](https://docs.python.org/3/library/idle.html), an application,
depends on `tkinter` and is not recommended nor required by anything; has 4.2 MiB.
- `python3-test` has the `test` module (the selftest suite of Python) and
tests contained in other modules (e.g. `lib2to3.tests`), most users don't need
this package, it is the biggest part of Python; has 62.8 MiB.
- `python-unversioned-command` contains the `/usr/bin/python` symbolic link;
has close to 0 Bytes.
For the sake of this document, we will mostly focus on the `python3-libs`
package, as it contains the wast majority of the bytes we want to get rid of
from minimal Fedora installations. We will mostly focus on the standard library,
not `/usr/lib64/libpython3.8.so.1.0` -- that file has copious 3.7 MiB, but it
contains the Python interpreter itself and minimizing that is out of scope here
-- we have bigger fish to fry.
## 2-dimensional classification of the standard library files
When we look closely on the files in the standard library, we can classify them
by 2 important dimensions: Python modules and file types.
### Python modules
The Python 3.8 standard library has 276 different top-level modules, the biggest
two being `test` and `idlelib`, both already not part of `python3-libs`. If we
factor out modules and submodules removed from `python3-libs`, the ten larges
remaining modules are:
1. `encodings`: 2.5 MiB
1. `pydoc_data`: 1.8 MiB
1. `distutils`: 1.8 MiB
1. `asyncio`: 1.4 MiB
1. `email`: 1.1 MiB
1. `unicodedata`: 1.0 MiB
1. `xml`: 1010 KiB
1. `lib2to3`: 993 KiB
1. `multiprocessing`: 925 KiB
1. `unittest`: 750 KiB
Some modules here are interesting because they contain mostly data (`encodings`,
`pydoc_data`, `unicodedata`), or because they are obviously developer oriented
and very rarely used on runtime (`distutils`, `lib2to3`, `unittest`).
A special case is the `ensurepip` module -- it has only 34.4 KiB, but it
*Requires* unbundled `python-pip-wheel` (1.18 MiB) and `python-setuptools-wheel`
(348 KiB) - that puts it between (3) and (4) in the above statistics with 1.56
MiB in total.
### File types (and bytecode caches)
The orthogonal dimension is the file type. Python standard library contains
directories with both "extension modules" (written in C (usually) and compiled
to `*.cpython-38-x86_64-linux-gnu.so` shared object file) and "pure Python"
modules (written in Python and saved as `*.py` source file).
Each pure Python module comes in 4 files:
- `module.py` -- the source
- `__pycache__/module.cpython-38.pyc` -- regular (not optimized) bytecode cache
- `__pycache__/module.cpython-38.opt-1.pyc` -- optimized bytecode cache (level 1)
- `__pycache__/module.cpython-38.opt-2.pyc` -- optimized bytecode cache (level 2)
Each of these files has a different purpose (explained below) and each of the
files is wasting precious storage space.
In total, the different file types in `/usr/lib64/python3.8/` take (without 3rd
party packages):
- `.py`: 26.4 MiB
- `.pyc`: 22.0 MiB
- `.opt-1.pyc`: 22.0 MiB
- `.opt-2.pyc`: 19.8 MiB
- `.so`: 5.3 MiB
Files from `python3-libs` in `/usr/lib64/python3.8/` take:
- `.py`: 9.8 MiB
- `.pyc`: 6.7 MiB
- `.opt-1.pyc`: 6.7 MiB
- `.opt-2.pyc`: 5.2 MiB
- `.so`: 4.9 MiB
We see that the various filetypes of pure Python modules occupy significant
amount of space when combined. But what are they for?
#### .py source files
Python is an interpreted language. As such, when you `import` a pure Python
module, it is primarily loaded from the `.py` source. The source is parsed and
loaded to Python bytecode, which is stored in memory and executed. To speed
things up, the bytecode is cached to special files described below. When the
cached bytecode already exists (and considered valid), the module is loaded from
there, bypassing the source code.
We currently package the source files and the bytecode cache files as well, but
the source files are still needed. They are used in the following ways:
- module discovery -- the bytecode cache files in `__pycache__` are not
importable without the source files;
- tracebacks -- when Python raises an uncaught exception, it is presented in a
form of a *traceback* containing the original source code, loaded from the
source files on demand;
- custom administrator changes and hotfixes -- when editing the source files
directly on disk, the bytecode cache is invalidated (at least by default) and
will not be used until re-cached;
- cache invalidation checks -- each time the bytecode is loaded from the
cache, the source file is checked for mtime, so it has to exist (there are
however [other optional cache invalidation
modes](https://docs.python.org/3/reference/import.html#pyc-invalidation) --
checking checksum of the source file or not checking anything);
- `__file__` -- some modules read the path of their own sources from the magic
`__file__` variable and some logic around that might fail if the path is
different (such as if the modules is loaded directly from a bytecode cache file).
#### .pyc regular (not optimized) bytecode cache
When a pure Python module gets imported for the first time after it has been
modified (or first time ever), the bytecode cache is is created in
`__pycache__/<modulename>.cpython-38.pyc` to be later used on subsequent
imports. Why are the bytecode cache files created during the buildtime of the
RPM `python3` package and shipped with the corresponding `.py` file? This is
what would happen if the files were not shipped:
1. If a non-root user executes Python code, Python won't succeed saving the
file, the bytecode cache will not be written and hence there will be no future
benefits from having the cache in the first place -- startup will be slower. On
each import, Python will attempt the write which might have further minor
negative impact on performance.
2. If a root user with restricted SELinux context executes Python code, then
write operation will fail and the audit log will be pumped with AVC violations.
The result is (1) + lots of noise.
3. If a root user with unrestricted SELinux context runs Python code, Python
is able to regenerate and store the `.pyc` files. They will then stay on disk
after the package is removed (possibly updated to the next 3.X version) unless
proper RPM level trickery is done (such as listing it as `%ghost`).
#### .opt-?.pyc (optimized) bytecode caches
Similarly to the previous point, the optimized bytecode cache files --
`__pycache__/<modulename>.cpython-38.opt-1.pyc` (or `...opt-2.pyc`) -- are
created when Python is invoked with the `-O` (or `-OO`) flag.
When run with the optimization flag,
[`-O`](https://docs.python.org/3/using/cmdline.html#cmdoption-o):
> Remove assert statements and any code conditional on the value of `__debug__`.
When run with [`-OO`](https://docs.python.org/3/using/cmdline.html#cmdoption-oo):
> Do `-O` and also discard docstrings.
To clarify: This *is* the optimization. There is nothing more. In most common
cases, you don't gain any significant performance boost, yet we must assume that
there are Fedora users out there invoking Python in this way -- either because
their code actually gains performance or because they were tempted by the word
"optimization".
The bytecode has asserts, `__debug__` conditionalized code and docstrings (with
level 2) optimized away and hence is different and needs a different cache.
If the cache files don't exist and the users invoke Python with `-O`/`-OO` (or
other means, such as the
[`PYTHONOPTIMIZE`](https://docs.python.org/3/using/cmdline.html#envvar-PYTHONOPTIMIZE)
environment variable), everything bad from the previous section would happen.
### Biggest modules in python3-libs, breakdown by file type
module | .py | .pyc | .opt-1.pyc | .opt-2.pyc | other
| total
----------------|-----------|-----------|------------|------------|-------------|---------
encodings | 1.4 MiB | 378.4 KiB | 377.9 KiB | 362.4 KiB | 24.0 KiB
| 2.5 MiB
pydoc_data | 656.1 KiB | 408.3 KiB | 408.3 KiB | 408.3 KiB | 8.0 KiB
| 1.8 MiB
distutils | 647.1 KiB | 421.3 KiB | 420.5 KiB | 321.1 KiB | 16.9 KiB
| 1.8 MiB
ensurepip | 7.6 KiB | 6.5 KiB | 6.5 KiB | 5.9 KiB | 8.0 KiB
| 34.4 KiB<br>+ 1.52 MiB wheels
asyncio | 441.2 KiB | 365.8 KiB | 363.6 KiB | 291.2 KiB | 8.0 KiB
| 1.4 MiB
email | 364.8 KiB | 283.1 KiB | 282.8 KiB | 188.7 KiB | 16.0 KiB
| 1.1 MiB
unicodedata | | | | | 1.0 MiB .so
| 1.0 MiB
xml | 288.5 KiB | 242.8 KiB | 241.8 KiB | 196.4 KiB | 40.0 KiB
| 1009.5 KiB
lib2to3 | 281.1 KiB | 237.3 KiB | 234.2 KiB | 185.9 KiB | 32.0 KiB
| 993.4 KiB
multiprocessing | 262.9 KiB | 222.7 KiB | 220.4 KiB | 203.1 KiB | 16.0 KiB
| 925.1 KiB
See the remaining lines in the [data source][source].
## Possible solutions
Now when we know what is on those 77 floppy disks, we can decide which ones need
to go.
 by default. This
is achieved by splitting various test modules, IDLE and tkinter to separate
optional subpackages.
How does that stand? This solution technically discards 51 floppy disks, gets
rid of 73.5 MiB, saves 66% of space. That is pretty good. However, it is the
status quo and we will use it as base to compare other proposed solutions, hence
for the sake of our measurements, this **saves 0 MiB / 0%**. All further
percentage savings will be based on the current mandatory 37.5 MiB.
The status quo however already **violates constraint (1)**: it breaks Python
users' expectations. As a Python user, I expect the entire of the standard
library to be installed, which is not the case. While Python comes pre-installed
and ready to be used by developers and users alike, programs using the `tkinter`
module will simply fail with a confusing `ModuleNotFoundError`. This has been
the case forever and the situation is similar (or worse) with other Linux
distributors of Python, such as Debian or openSUSE. Always installing `tkinter`
would contradict the goal here, so we won't change that.
### Solution 1: Slim down the Python standard library
One solution is to stop having such a big standard library. Python has existed
for some time now and a lot of the standard library modules might no longer be
relevant to the general audience.
Our colleague Christian Heimes has proposed [PEP
594](https://www.python.org/dev/peps/pep-0594/) -- *Removing dead batteries from
the standard library* for Python upstream. So far, it has not been approved and
the discussion [turned out to be a heated
one](http://pyfound.blogspot.com/2019/05/amber-brown-batteries-included-b....
It proposes to remove 30 modules from the standard library for various reasons,
mostly because they have better replacements or because they are no longer as
useful as they once were.
If approved, this would **save 1.4 MiB / 3.7%** or a bit less (two removed
classes are parts of bigger files and the calculations were simplified to assume
the entire file is no longer there -- the difference is not significant).
Not to violate the (5) constraint, this however **has to happen in upstream**,
that means not sooner than in Python 3.10 (cca Fedora 35). This is not a kind of
change that would benefit from pioneering in Fedora.
We are not aware of a static analyzer that would recognize dependencies on
standard library modules and there is no existing metadata for this. Just
removing the modules in Fedora (or moving them to an optional subpackage) would
only cause breakage and break Python users' (1) and Fedora packagers'
expectations (3).
### Solution 2: Move developer oriented modules to python3-devel (or split the
stdlib into pieces)
Quite a handful of modules are clearly targeted at developers who code in Python
and not at the users of the applications written in it.
Here they are, largest first:
1. `pydoc_data`: Contains data for the `pydoc` module described below.
2. `distutils`: Used when distributing and installing Python packages trough
`setup.py` files. Predecessor of `setuptools`.
3. `ensurepip`: Used to install `pip`, mostly to virtual environments via the
`venv` module.
4. `lib2to3`: Used by the `2to3` tool to convert legacy code to Python 3. Also
used on install time trough `setup.py` files.
5. `unittest`: A testing framework for unit tests.
6. `pydoc`: Generates developer documentation from docstrings.
7. `doctest`: Tests if documentation reflects the reality.
8. `venv`: Creates Python virtual environments.
Moving all those modules to `python3-devel` (or `python3-libs-devel` etc.) could
**save 6.1 MiB / 16%** and additional **1.5 MiB of wheels** (not calculated in
the total amount we count percentages from).
This would however **violate the (1) and (3) criterion**. Python users expect
working `venv` and `unittest`. Fedora packagers would need to manually
(remember: no metadata, no static analyzer) track runtime dependencies on such
modules -- they actually happen, for example there are [modules depending on
lib2to3](https://pypi.org/project/modernize/).
Alternatively such thing would no longer be allowed to name itself Python. It
would merely be a "minimal Python" with a separate entrypoint - and that
**violates the (3) or (4) criterion** (depending on the actual implementation).
Alternatively, this change would need to be driven upstream -- track
dependencies on standard library modules and allow it to be shipped in parts.
See also our draft [PEP 534](https://www.python.org/dev/peps/pep-0534/) --
*Improved Errors for Missing Standard Library Modules*.
If implemented, this would allow us to split the library to several parts
(either minimal + rest, or per module, or anything in between) and only make the
actually used modules mandatory, saving an unknown amount of space (arguably
quite large) and several external dependencies as well (such as `libsqlite3.so`,
`libgdbm.so` etc.). We could basically do the `python3-tkinter` split at scale,
via an upstream supported way.
### Solution 3: Compress large data-like modules
Some pure Python modules, like `encodings` or `pydoc_data` contain mostly data.
We could compress the data in the modules. For example `pydoc_data` is basically
a dictionary with very long strings. Those strings are repeated in source as
well as various bytecode cache files.
We could store them as compressed bytestrings instead.
Alternatively, we could leverage the Python's ability to import from a zip file
and zip such modules. That prevents "hot patching" them on live system
(constraint (2)), but if absolutely needed, they can be unzipped and edited. The
need to live patch `encodings` or `pydoc_data` should not be very common.
Not all modules can be zipped, extra caution would be needed.
For example, `pydoc` currently reads a CSS file like this:
```python
path_here = os.path.dirname(os.path.realpath(__file__))
css_path = os.path.join(path_here, url)
with open(css_path) as fp:
return ''.join(fp.readlines())
```
Similar code would need to be ported to
[importlib.resources](https://docs.python.org/3/library/importlib.html#mod...
-- changes like this are very likely accepted by upstream, but still needed to
be carefully found first.
Either way, when carefully only zipping `encodings` and `pydoc_data`, we could
**save 3.4 MiB / 9 %**. When compressing the strings inline, we anticipate
similar or worse result.
### Solution 4: ZIP the entire standard library
Stretching previous solution a bit further, we might want to zip the entire
standard library (at least the pure Python parts). However, we are not sure
whether this was anticipated by upstream and whether this does not in fact
**violate constraint (5)**. Extra care would be needed.
This would require a great deal of testing and thorough analysis of half a
million of lines of code.
Not only this will most likely break things, it will probably also **violate
constraints (1) and (2)** (Python and Fedora users' expectations). It can also
increase the startup time.
To mitigate that, we might want to ship 2 RPM packages with the standard library
-- one uncompressed and one zipped:
- The `python3-libs` package would *Require* any of them (via virtual provides
or boolean requires: `Requires: (python3-libs-modules or
python3-libs-modules-zip)`).
- The `python3-libs` package would **Recommend** the uncompressed package.
- To avoid increasing the total filesystem footprint when both packages are
installed, the packages might conflict with each other -- however that might be
a bad user experience.
Nevertheless, this might (in theory) **save 17.8 MiB / 47 %**.
### Solution 5: Stop shipping mandatory bytecode cache
This solution sounds simple: We do no longer ship the bytecode cache
mandatorily. Technically, we move the `.pyc` files to a subpackage of
`python3-libs` (or three different subpackages, that is not important here). And
we only *Recommend* them from `python3-libs` -- by default, the users get them,
but for space critical Fedora flavors (such as container images) the maintainers
can opt-out and so can the powerusers.
This would **save 18.6 MiB / 50%** -- quite a lot.
However, as said earlier, if the bytecode cache files are not there, Python
attempts to create them upon first import. That can result in several problems,
here we will try to propose how to workaround them.
#### Problem 5.1: Slower starts without bytecode cache
When a non-root user runs Python code, the bytecode cache is never created.
This can result in potentially slower start of Python apps. However, that might
be OK: The wast majority of Fedora users will get the *Recommended* bytecode
cache and the rest will have a small slowdown. This does not violate users'
expectations **if documented properly** - most users get the old behavior (the
default remains fast, but big).
Optionally, we might patch Python to warn in that case and suggest installing
the appropriate subpackage. That would of course be a downstream only patch and
would **violate constraint (5)**. Alternatively, the warning might suggest
running a specific command as root to populate the cache -- that might (or might
not) be acceptable upstream. Arguably it is not a very nice user experience, and
also it only helps with limited bandwith, not limited storage space.
#### Problem 5.2: Leftover bytecode cache files
When a root user with unrestricted SELinux context runs Python code, the
bytecode cache is created.
As such, it would need to be marked as `%ghost` in the RPM package with the
Python source, while it would exist as real file in the RPM package with the
bytecode cache.
Example pseudo-specfile snippet:
```spec
%files libs
# this package Recommends the 3 packages below
.../module.py
%dir .../__pycache__/
%ghost .../__pycache__/module.cpython-38.pyc
%ghost .../__pycache__/module.cpython-38.opt-1.pyc
%ghost .../__pycache__/module.cpython-38.opt-2.pyc
%files libs-bytecode-cache
# this package Requires the libs subpackage
.../__pycache__/module.cpython-38.pyc
%files libs-bytecode-cache-opt-1
# this package Requires the libs subpackage
.../__pycache__/module.cpython-38.opt-1.pyc
%files libs-bytecode-cache-opt-2
# this package Requires the libs subpackage
.../__pycache__/module.cpython-38.opt-2.pyc
```
Our experiments show that if two packages co-own a file and one of them is
marked as `%ghost`, everything works as expected:
- manually created `.pyc` file is overridden by the packaged one without a
conflict/error/warning/problem
- manually created `.pyc` file is removed on package removal
Hence, we anticipate this point as potentially non-problematic, however real
testing with the `python3` package has not yet been done.
#### Problem 5.3: SELinux denials
When a root user with restricted SELinux context runs Python code, the bytecode
cache is not created and the audit log is pumped with AVC violations. The result
is the same as in 5.1 plus noise.
As a workaround, we might work with the SELinux experts to allow the Python
process to write the bytecode cache even in restricted context.
This could be a **potential security problem** -- any malicious code written in
Python would be able to store malicious bytecode in the cache -- all other
invocations of Python would execute that bytecode instead of the proper one.
As such, we *think* this **violates constraint (2)** -- Fedora users expect that
SELinux keeps them safe. However, we don't really know what level of protection
is expected here: This might require further discussions.
As a solution to this problem, we might stop Python from attempting to write the
bytecode cache in the first place. That would still preserve problem 5.1 (that
is fine), but would also solve 5.2. However, we cannot just patch Python to stop
writing bytecode cache, as that would violate constraints (1) and (5). We might
however pioneer an upstream change, that skips writing bytecode cache if a
certain marker is present in the `__pycache__` directory:
```spec
%files libs
.../module.py
%dir .../__pycache__/
.../__pycache__/cpython-38.nowrite
%files libs-bytecode-cache
.../__pycache__/module.cpython-38.pyc
... other opt levels in this or other subpackages ...
```
(The name of the marker is just an example.) If present, all present bytecode
cache would be read but there would be no attempts to write it. As a result,
users would gain the cache benefit when they install the bytecode cache
package(s) (recommended by the `python3-libs` package), but Python would not
attempt to create the files. This is a reasonable compromise:
- Default remains big and fast (cached).
- Minimal is small and a bit slower.
- No SELinux problems.
*Note:* If we are to eventually adapt this solution (in either form) in all
Python RPM packages to gain even more space, this would certainly need more
RPM-level abstraction with macros and dark magic (like the debuginfo packages)
-- we cannot anticipate all Fedora Python package maintainers to manually do
this. However for now, we would only do it in `python3-libs` as written in the
goal of this document.
### Solution 6: Stop shipping mandatory optimized bytecode cache
This is essentially the same as previous solution except we would keep the
non-optimized bytecode cache mandatory. That gives us several more options to
workaround the caveats.
This would **save 11.9 MiB / 32%**.
#### Workaround 6.1: Fallback to less optimized bytecode cache
We can patch Python to fallback to less optimized bytecode cache if the properly
optimized bytecode cache does not exist or cannot be created.
1. opt-2 would fallback to opt-1 or non-optimized (in this order)
1. opt-1 would fallback to non-optimized
1. non-optimized would always be present
This workaround would require a change of the current caching logic. Either
there will be no attempt to write the new bytecache files if the less optimized
bytecode cache exists, or Python would check if it can write the bytecode cache
and only fallback to less optimized ones if it cannot write to the destination.
This workaround however **violates Python users' expectations (1)**: It executes
less optimized bytecode than the user has elected to. At the same time, this
**violates (5)** if done downstream-only. Both can be **solved by doing this
with upstream coordination** -- designing a PEP that describes this behavior
into great detail, implement the behavior in Fedora and bring it upstream once
ready. Impact on performance would need to be evaluated as well.
#### Optimization level 2 is already broken
It is important to note that optimization level 2 bytecode cache in Fedora is
already partially "broken". In the times of Python 2 and 3.4 or less, both
non-zero optimization levels shared the same bytecode cache paths. Hence the
Fedora packages only shipped optimization level 1 `.pyo` files (`o` for optimized).
Python 3.5 has altered the paths to make optimization 1 and 2 cache coexistable
and the `python3` package was adapted to ship all 3 levels of optimization (0, 1
and 2), but all the other packages still only ship two (0 and 1) --
[`brp-python-bytecompile` and
`%py_byte_compile`](https://docs.fedoraproject.org/en-US/packaging-guidelines/Python_Appendix/#manual-bytecompilation)
both only compile for the two levels. That means all the problems with missing
bytecode cache files are actually already happening with all Fedora's Python 3
RPM packages (except `python3-libs` and other `python3` subpackages themselves)
when Python is executed with `-OO` or when `PYTHONOPTIMIZE` is set to 2+.
This has been the case **since Fedora 24** and **nobody has ever reported it as
a problem** -- hence we might just drop the optimization level 2 bytecode cache
and consider the problems an unsupported corner case. That would **save 5.2 MiB
/ 14%**. Technically this is wrong, but pragmatically it works just fine.
Alternatively, we might make a case upstream and deprecate and eventually remove
`-00` because we don't use it -- however we are not sure if that is a good
enough reason.
With the marker file proposed in the previous solution, we can outright drop the
optimization level 2 bytecode cache for good (or move it to a package that is
not even *Recommended*, only *Suggested*).
### Solution 7: Stop shipping mandatory source files, ship .pyc instead
Since the `.py` source files are not the ones that are imported by default, we
might as well ship only the bytecode files mandatorily.
To allow module discovery, we would need to rename and move the `.pyc` files
from `__pycache__/module.cpython-38.pyc` to `../module.pyc`.
When such file is located in `sys.path`, this is what happens:
- When only `module.py` exists (status quo), everything works as described in
the first sections of this document.
- When both `module.py` and `module.pyc` exist, the `.pyc` is ignored and
everything works as if it was not there (including the bytecode cache files in
`__pycache__/*.pyc`).
- When only `module.pyc` exists, the module is imported from that bytecode
cache file regardless of the optimization level (bytecode cache files in
`__pycache__/*.pyc` are ignored).
When doing it this way (shipping only nonoptimized `.pyc`, not shipping source
or additional bytecode caches (optimized), we would **save 21.7 MiB / 57.9 %**.
Several things would **violate Python/Fedora users' expectations (1)(2)**:
- Tracebacks would not contain lines of sources.
- The source files would be gone -- not only users cannot edit them but they
can no longer even read them.
To mitigate that, we could have 2 RPM packages with the standard library
(similarly to *Solution 4: ZIP the entire standard library*):
1. One with moved `.pyc` files only.
2. One with source `.py` files and `__pycache__` (possibly only recommended if
combined with other solutions).
In order to save ourselves from 2 conflicting subpackages, we might do it this way:
1. The moved `.pyc` files package is mandatory.
2. The other RPM package is recommended.
This however **violates constraint (4)** -- default users would get two files
with non-optimized bytecode cache. Unfortunately the files are in different
directories, and hence we cannot hardlink them on the RPM level -- RPM only
allows hardlinking files in the same directory to avoid cross filesystem
hardlinks. If we symlink the files, Python currently does not follow them.
If we get upstream support for following symbolic links, we might do something
like this:
```spec
%files libs
# Recommends libs-source
.../module.pyc
%files libs-source
# Requires libs
.../module.py
%dir .../__pycache__/
.../__pycache__/module.cpython-38.pyc # symbolic link to ../module.pyc
.../__pycache__/module.cpython-38.opt-1.pyc
.../__pycache__/module.cpython-38.opt-2.pyc
```
With the two optimized caches optionally `%ghost`ed if combined with other
solutions.
If we don't get upstream support for following symbolic links, we might ship the
duplicate bytecode cache files and change them to a hardlink in RPM scriptlet /
trigger (if they are on the same filesystem, which is very likely), however that
only helps with limited storage space (and it requires the storage during
installation), not limited bandwith.
Alternatively, we might change the way the source and bytecode caches are
prioritized on import time, with upstream coordination, to allow having the
non-optimized `.pyc` file in just one location without losing the benefits of
having the source files. Such as having an (optionally compressed) source file
in a `__pysource__` directory and loading it when showing tracebacks.
We could also explore this solution with only some modules (e.g. big data
modules, described in *Solution 3: Compress large data-like modules*:
`encodings`, `pydoc_data`). For such limited scope, we could simply only ship
the one `.pyc` file (one optimization level without sources).
### Solution 8: Compress .pyc files
We might propose an upstream change (pioneered in Fedora) to add an option to
[compress the `.pyc` files](https://bugs.python.org/issue22789). We would add a
"compressed" flag to the `.pyc` header, and we would change `importlib` to unzip
the payload before unmarshalling (deserializing) the bytecode.
This would potentially save **10.2 MiB / 27.2%**, but it might have negative
impact on performance. The number is based on actually zipping each individual
`.pyc` file, not on only compressing the content.
### Solution 9: Deduplicate bytecode cache
Given the nature of the bytecode caches, the non-optimized, optimized level 1
and optimized level 2 `.pyc` files may or may not be identical.
Consider the following Python module:
```python
1
```
All three bytecode cache files would by identical.
While with:
```python
assert 1
```
Only the two optimized cache files would be identical with each other.
And this:
```python
"""Dummy module docstring"""
1
```
Would produce two identical bytecode cache files but the opt-2 file would differ.
Only modules like this would produce 3 different files:
```python
"""Dummy module docstring"""
assert 1
```
When we examine all the bytecode cache files currently shipped with
`python3-libs` and compare them between the optimization levels, we get:
- 607 modules have bytecode files
- 454 identical optimization 0 and 1 pairs
- 68 identical optimization 1 and 2 pairs
- 62 identical optimization 0, 1 and 2 triads (already counted in both of the
above)
Since all of the bytecode caches are kept within the same folder, we can in fact
hardlink them between each other and **save 4.0 MiB / 10.7 %**. Even if this
would be [done automagically by the
filesystem](https://btrfs.wiki.kernel.org/index.php/Deduplication), by doing it
explicitly we also save the bandwidth -- the RPM packages are smaller.
It is also important to realize that most of the standard library modules have
docstrings (except empty `__init__.py` files), but only every fourth has
`__debug__` conditionals or asserts. If we also go with a solution that removes
the second optimization level bytecode cache and combine it with this one, we
can deduplicate optimization level 1 bytecode cache for three quarters of the
modules.
When the bytecode cache is updated for some reason, e.g. because the source file
was updated by an administrator, the cache file is recreated, effectively
breaking the hardlink. As more files get updated this way, the size naturally
increases, but this does not break users' expectations.
As a nice benefit, we can automatically do this with all Fedora Python RPM
packages without any cons (except for an insignificant slowdown when comparing
the files during build) saving potentially large amounts of space. That's a lot
of saved money in the cloud world.
As a single data point for that general slim down: On my workstation I have 360
MiB of various Python 3.7 bytecode files in `/usr` and I can save 108 MiB.
### Solution 10: Stop shipping mandatory Python, rewrite dnf to Rust
The main reason we need to ship Python everywhere is the package manager -- dnf.
If we rewrite dnf to some non-Python, possibly compiled language such as Rust
(or C if we are more traditional), we don't need to ship Python at all. This
might sound crazy, but see for example
[microdnf](https://github.com/rpm-software-management/microdnf) -- a minimal dnf
for (mostly) Docker containers that uses libdnf and hence doesn't require Python.
This solution **saves 37.5 MiB / 100%** of mandatory Python. It possibly also
saves more space by reducing the amount of installed Python packages, but
increases the size of dnf itself. We can most likely assume a compiled
executable would have a lesser footprint than a handful of Python modules used
by dnf -- this doesn't violate constraint (4): the combined footprint of
(micro)dnf + Python won't be significantly larger than now.
However, most importantly, this solution **violates constraint (2)**: Fedora
users expect Python to be available, always. Missing Python could break stuff
like Ansible based deployments.
## Conclusion
You can see that some of the solutions offer significant slim-down with very
little struggle, while other solutions may turn out to be to breaking. At the
same time, various solutions can be combined.
It is important to note that the solutions can contradict each other and the
storage savings cannot be generally summed when combining them. As an example,
we cannot deduplicate different optimization level bytecode cache files and ship
them from different optional subpackages at the same time.
For now, we plan to [start with bytecode cache
deduplication](https://github.com/fedora-python/compileall2/issues/16), and we
will let the Fedora community discuss our proposals. After all, there might be
holes in them and the list is certainly not complete.
## Copyright
This document is placed in the public domain or under the [CC0 1.0 Universal
license](https://creativecommons.org/publicdomain/zero/1.0/), whichever is more
permissive.
The photos are [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/).
[source]:
https://github.com/hroncok/python-minimization/blob/master/python-minimiz...
--
Miro Hrončok
--
Phone: +420777974800
IRC: mhroncok
3 years, 8 months
Release rpkg-1.58 fedpkg-1.37
by Ondrej Nosek
Hi all,
a new version rpkg-1.58 and fedpkg-1.37 is released.
Currently, Fedora 30 packages are in the stable repository, feel free to
try other waiting distributions in Bodhi.
Numerous features and improvements (as well as bugfixes) includes:
(For "rpkg")
- Improvements for scratch module builds
- Allow passing arguments to “mbs-manager build_module_locally”
- Remove the ability to parse a module’s branch
- Permit setting arbitrary rpm macros during build
- Ignore specific files in a cloned repository
- Pass specific arguments to “mock”
- Added “depth” argument to "git clone"
- Watch multiple module builds
- Show module build links in output from command module-build
- Add the ability to configure multiple regex expressions
- Add “retire” command supporting both packages and modules
- Import srpm without uploading sources
- Ignore any specified profile when finding the Flatpak build target
- Added update-docs script
- And other fixes and small improvements
(For "fedpkg")
- Ignore files in a cloned repository
- Enable shell completion for module scratch builds
- Show hint when Pagure token expires
- Include possible distprefix in “–define dist” for Forge-based packages
- Other small fixes
More specific changelog (web documentation):
https://docs.pagure.org/rpkg/releases/1.58.html
https://docs.pagure.org/fedpkg/releases/1.37.html
Updates:
https://bodhi.fedoraproject.org/updates/?builds=rpkg-1.58-1.el6&builds=rp...
Alternative link:
https://bodhi.fedoraproject.org/updates/?packages=rpkg&page=1
rpkg is available from PyPI.
Thanks to all contributors.
Regards
3 years, 8 months
Fonts packaging policy rewrite proposal
by Nicolas Mailhot
Hi,
A fonts packaging policy rewrite proposal has been pushed to FPC today:
https://pagure.io/packaging-committee/pull-request/934
It should be clearer, more opinionated, and take into account:
– updates of The OpenType standard
– variable fonts
– web fonts
– upstream depreciation of non OpenType formats: final stages of the
Harfbuzz consolidation decided at the 2006 Text Layout summit
https://www.freedesktop.org/wiki/TextLayout/
– appstream & fonts
– weak dependencies
– and probably more I forget here
It is based on the new fonts-rpm-macros project for automation:
This project builds on tooling enhancements in redhat-rpm-config and rpm
itself, done during the past two years for the Forge and Go sets of
packaging macros. It started 2 years ago as a fork of fontpackages,
which is the core of our current fonts packaging guidelines.
It will require putting the fonts-srpm-macros package in the default
build root, like is done for other domain-specific packaging macro
sets.
Major additions:
– better documentation (clearer and more complete)
– better automation (less packager hassle for better and more complete
results)
Major removals:
– tools and scripts
– fixing metadata with ttname
Mostly because no one seems willing to maintain those scripts, or port
ttname to python 3.
https://copr.fedorainfracloud.org/coprs/nim/fonts-rpm-macros/builds/
showcases the new policy on 62 real-world source packages, generating
139 installation packages. Some of those are badly delayed updates to
Fedora packages, others are brand-new packages ready for Fedora
inclusion. They include major font packages such as Stix, DejaVu, Droid,
IBM Plex.
Existing Fedora packages will continue to build, the old fontpackages
macros are grandfathered in fonts-rpm-macros for now. They will be
removed in a few years to give packagers time to apply the new
guidelines.
Regards,
--
Nicolas Mailhot
3 years, 8 months