Trying out a new way of packaging Python modules

David Malcolm dmalcolm at redhat.com
Mon Mar 15 02:53:29 UTC 2010


= BACKGROUND : Multiple python stacks =
For Fedora 13 we've added a parallel-installable Python 3 stack,
doubling the number of Python runtimes from one to two.

I've ben mulling over things we could do for Python in Fedora 14. An
idea I'd like to borrow from Debian is to add debug builds of the
runtimes, which would give the developer the option of installing a
python stack built with full reference tracking information which is a
great boon during development of a C extension, and for analysing memory
usage; you can e.g. interate over all live objects.  In my experience
this roughly halves the speed of the interpreter, and increases memory
usage.   This would double the number of our runtimes from 2 to 4.  (As
I understand it Debaian have shipped a standard vs debug pair of python
stacks for a couple of years now).  

There's also the possibility of packaging pre-releases of python27 and
python32 within Fedora, and of shipping experimental stacks using the
other runtime implementations: Unladen Swallow, PyPy, Jython, Pynie,
etc.

In EPEL5 I've recently proposed a parallel-installable "python26"
alternate stack (see
https://www.redhat.com/archives/epel-devel-list/2010-March/msg00064.html ).  Although I want a python26 stack, I suspect EPEL5 will also want python27 and python32 stacks in the future.  

Given that Python add-ons aren't compatible between major or minor
releases [1], we need to somehow repackage add-on python modules for the
python stacks.

We updated Fedora's python packaging guidelines for Fedora 13 in order
to cover the "python3" runtime:
http://fedoraproject.org/wiki/Packaging:Python

I like that we have a single src.rpm per upstream tarball (for those
cases where both python 2 and python 3 can be built), and have a single
build per-arch within Koji that emits a subpackage per python runtime.

However I dislike the copy&paste approach we're using within .spec
files.  For me, .spec files are source code, and I passionately hate
copy&paste of source code in programming: it's too easy for things to
get out of sync, tends to leads to bugs, and we end up having to work
harder than we should need to.  Computers give us many ways of sharing
and reusing information, and we should use these to make our lives
easier (the computer should serve us, not the other way around!).

The way I see this:  for every OS release, there would be a set of
supported Python runtimes, but I want us to be able to change it from
release to release, and I don't want us to be tied down in manual
specfile rewriting to achieve this; if we have to rewrite all our
specfiles to add a new runtime, we will never want to add a new runtime.

For Fedora 12, this was just "python".  For Fedora 13 this is "python"
and "python3".  Similarly for EPEL5: the system "python: is 2.4, but I
want to add a python26, and I suspect we will want more in the future.
For Fedora 14 I don't know what we should do yet, but I'd love to be
able to support more of the Python runtimes, and having a debug stack a
"yum install" away would be handy.


So I want to move from a "static" model where we hardcode information
about the available python runtimes into all of .spec files into a more
dynamic model, where the .spec file for a python module queries a tool
about what python runtimes it ought to support based on this OS, and
acts accordingly.


= THE EXPERIMENT =
I've implemented a tool, currently named "rpm-pyconfig" which can be
used to query information about Python runtimes.  Currently I'm
hardcoding this data within the tool, but the idea is that every runtime
would drop a .conf file (perhaps in .ini format) into a config dir
"/etc/rpm-pyconfig.d/", and the tool would query this.  (somewhat like
pkg-config).

One idea is that we would have a meta-package "python-meta-config" or
somesuch, which "Requires" all of the rpms holding the .conf files, thus
defining what the expected set of runtimes we're targetting for this OS
release.  We'd change this early during the development cycle for a
Fedora release, and then keep it stable.

I've used this new tool to rewrite "python-coverage.spec" so that it
dynamically queries what python runtimes it should support.  You can see
the specfile I wrote here:
http://fedorapeople.org/gitweb?p=dmalcolm/public_git/rpm-pyconfig.git;a=blob;f=python-coverage.spec;hb=HEAD

This specfile _does_ build, and emits subpackages.

The tool has its own expression-evaluation language.  For example
"@confbin" means "the python binary for the current configuration", and
is expanded to "python" for the default python2 build, and to "python3"
for the default python3 build.

The main commands are:
  --for-each  (and --for-each-2 and --for-each-3) which iterates over
every python configuration (or just those that are Python 2/Python3)
  --eval  which evaluates an expression then prints the result
  --exe   which evaluates an expression then executes the result

You can then use this within a specfile in a various ways.

For example, you can write:
   rpm-pyconfig --foreach-2 --exe \
       "cp -a ../pristine @confsrcdir"
   rpm-pyconfig --foreach-3 --exe \
       "cp -a %{py3dir} @confsrcdir"
and have each subbuild get its own copy of the source tree
("@confsrcdir" expands differently for each one), the python2 subbuilds
getting a pristine copy, the python3 subbuilds getting one that's had
2to3 run on it.

You can also use it with rpm's subshell capture syntax: %()
This is a little-known rpm construct in which the commands within the
parentheses are executed, and the stdout is captured, and then processed
further.

For example, the following uses it to iterate over all python runtimes
and express that there should be a -coverage subpackage for each one,
leading to "python-coverage" and "python3-coverage":
# Define the metadata for each built package:
%(rpm-pyconfig --foreach --eval "
%package -n @confpkg-coverage
Summary: Code coverage testing module for @confpkg
Group: System Environment/Libraries
Requires: @confpkg-setuptools
%description -n @confpkg-coverage
%SHARED_DESC

@confdescline
")


Similarly we can programatically generate multiple %files stanzas as
desired:

%(rpm-pyconfig --foreach --eval "
%files -n @confpkg-coverage
%defattr(-,root,root,-)
%doc README.txt
%{_bindir}/@confbin-coverage
@conf_sitearch/coverage/
@conf_sitearch/coverage*.egg-info/
")

Note how in this world the "%python_sitearch" rpm macro goes away, and
we instead use the "@conf_sitearch" rpm-pyconfig macro, which will vary
per configuration.

The current implementation of the tool is here:
http://fedorapeople.org/gitweb?p=dmalcolm/public_git/rpm-pyconfig.git;a=blob;f=rpm-pyconfig;hb=HEAD
(it's just an experiment at this stage, so is somewhat hackish)

This is what I've got so far.  It's a very different way of doing python
packaging to what we've done in the past.  I like it, in that it
expresses the stacks programmatically, rather than requiring copy&paste
of specfile fragments.

I'm sure there are issues with this approach, but I suspect they are
fixable, and I think this approach is more sustainable than manually
handcoding information about the supported Python runtimes into every
specfile: it gives us the freedom to add runtimes.  Doing so is a major
event, but it at least reduces the effort required.

An example of an issue: currently it emits "python2-coverage" for the
core python2 package, rather than "python-coverage" (that's fixable).

Some more thoughts on the brainstorm that led to this can be seen at:
https://fedoraproject.org/wiki/DaveMalcolm/PythonIdeas

My biggest concern with this is that %() is little-known, and I don't
know how well-supported it is e.g. in RHEL5's rpm, and going forward.
Oh, and it's a total change in how we do Python packaging :)

The syntax is obviously up for grabs, and is somewhat clunky in places.
I used "@" to avoid having to deal with "%" and "$" already having
meaning in the rpm and shell worlds (since we have to pass around
strings that have functional content for all three langugages).

I suspect we'd need an exclusion syntax (e.g. "this module isn't
buildable on PyPy", "this module isn't buildable on 2.7 with ppc"); you
can perhaps express this using things like:

if [ $(rpm-pyconfig --eval @major_version. at minor_version) -eq 2.7 ]
  %ifarch ppc
  %endif
elif

or somesuch

Thoughts?
Dave

[1] both compiled C extensions and compiled .pyc/.pyo bytecode files are
incompatible (but in both cases this is detectable, the former via the
"NEEDED" field in the ELF metadata, the latter via the magic number in
their header); see PEP 384 and PEP 3147 for future ideas towards being
able to share more, but they're not done yet.




More information about the python-devel mailing list