Hi Martin,
I was quite busy lately so did not have time to reply.
(Most of the time I'll speak for Rust ecosystem below)
On Fri, Feb 21, 2020 at 4:06 PM Martin Sehnoutka <msehnout(a)redhat.com> wrote:
Hi,
before I write the proposal itself I just want to stress the fact that
it isn’t my intention to change the current packaging workflow and
definitely not the user experience. Also if you have C or Python
packages it would not affect your work at all (I’m not familiar with all
interpreted languages in Fedora, but I guess it is similar to Python and
therefore it is not affected by the problems I am going to describe).
I disagree, Python has same registry (PyPI) as Rust does (crates.io)
and. Of course, there are different problems in different ecosystems,
but approach should be same to all.
First of all, let me describe the problem I see in our Fedora ecosystem
with relation to Go and Rust language ecosystems. More specifically in
the relation between RPM buildroot and packages in these ecosystems.
Both of these languages follow the idea that packages should be small
and only have a limited set of features. Developers then use a lot of
these packages to write the final executable that is meant for end-users
[1]. Also both of these languages use static linking of the final
binaries meaning that Fedora users don’t install RPM packages of these
libraries as they are already baked inside of the binary [2].
Technically, we can do dynamic linking. The problem is that there is
no stable ABI, so even rebuild of the same sources with different
version of compiler can result into a different thing.
I can't speak for Go, but in Rust the crate (aka library) can be used
(compiled) with different features. So we would either need to produce
libraries with all combinations of those features set or ship one big
library with all of them included. That would mean, the dependency set
will grow and most likely we won't save any space, or even worse,
waste it. Until some amount of packages which are common on all
systems, of course.
By the way, Haskell has very similar problem, they just put hashes all
over the place to ensure that all packages are rebuilt.
The 1st problem is that if we want to build RPMs of the final
executables the way we do now, we need to package all these small
libraries into RPM even though they are just build dependencies and
users never install RPMs of these libraries. Many of these RPMs are
automatically generated from the upstream packages meaning that we don’t
do anything except for unpacking the upstream package (e.g. plain
tarball in case of crates.io) and then we package the same into RPM.
This process is unfortunately not fully automated and therefore requires
a certain amount of human effort.
As others pointed out, we do audit licenses. But apart from that, we also do:
1) cargo build and cargo test to ensure that the crate actually works
2) Try to patch them to rely on latest version of crates
3) Add patches to use system versions of libraries (libcurl,
oniguruma, libssh2, …)
We did found many issues related to 32bit, endianess and even when
bumping version of dependency, you can find that there is actual bug
in the code of crate
(
https://github.com/budziq/rust-skeptic/pull/116#issuecomment-590009805)
To sum up the previous paragraph, I don’t think it is necessary to
repackage upstream tarball into a downstream RPM.
If it is automated, why do you think so? If you have different
ecosystems, being able to check something across all ecosystems in the
same way (aka using just rpm instead of pip, cargo, go, …).
The 2nd problem is present only in the Rust ecosystem (as far as my
knowledge goes). Cargo, the official package manager for Rust, can
handle the scenario where an executable depends on a single library in
two different versions [3]. Dnf, on the other hand, will not install two
versions of the same RPM package. What we do now is, that we patch the
affected executables and libraries to only use the newest versions
everywhere. This is again an additional maintenance cost and we create
differences from upstream, because these patches are not necessarily
merged. See this as an example:
https://src.fedoraproject.org/rpms/rust-bstr/blob/master/f/rust-bstr.spec...
https://github.com/BurntSushi/bstr/pull/23
As others, pointed out - we can (and we do) compat packages.
What do you propose to do if there is, let's say vulnerability in the
version some crate depends on and upstream is not responsibe?
95+% of our patches are merged into upstreams sooner or later
(excluding dead upstreams).
To sum up the 2nd problem: we are using dnf instead of the upstream
package manager to install dependencies. These two approaches can be
incompatible (and they are in case of Rust).
That's not exactly true, we do mirror what Cargo does with RPM. And
that works pretty well.
The 3rd problem I see is that issues like this are not going away, they
are only going to get worse as other ecosystems emerge. e.g. if the
Swift language became popular (for Linux development) it would again
have it’s own package manager and probably its own set of issues in
relation to our build system. (I mention Swift specifically because it
is a compiled language that have stable ABI only for std, other
libraries are statically linked)
So would it have the same problem if we want to have mirror/patched
registry of the language X in Fedora which would comply with our
policies / requirements.
Now, I’d like to point out that Go and Rust packaging works well in
Fedora due to the enormous effort of certain people and I very much
admire the work they do. But on the other hand I’m afraid where the
ecosystem would go without them. This is where we get to the situation
in the enterprise derivatives, specifically RHEL and CentOS. Their
solution to this problem is not to package all libraries separately but
to bundle all libraries directly into source RPMs of each executable. So
the bundling is not present only on the binary file level, but also on
the source RPM level. Go went even further in this case and it is common
to bundle all the dependencies as a source code directly in the upstream
repository. See this repo as an example:
https://github.com/containers/libpod/tree/master/vendor
Bundling is bad, that's it. I'm not going to write why it is bad and
what disadvantages it has.
I think the real problem here is that our Fedora Build System just
suck. The hours I spent on building rust packages in Fedora is because
of this, nothing more.
Another part of that time is spent on things you've described above
(like patching crate to use latest versions of deps). Technically I
could have created bunch of compat packages, but I call this just
lazyness, if you port to latest versions of deps, you have to care
about just one copy of a library and that's it.
It is fair to say, that my first motivation was the current state of
packaging in RHEL but I’d prefer to discuss this in Fedora first.
The proposal itself is fairly simple: Let’s stop packaging all Go and
Rust libraries into RPM and install them to the buildroot in the
upstream format instead.
No?
The specific implementation is up for a discussion but I think it is
logical to start by asking what features do we want from the build
system? My answers are:
* We want to know what exactly was used to build the RPM to ensure
integrity. This is possible with upstream tarballs as well as with RPMs.
Just store a hash of the tarball.
That's not enough, you need to also store all sources.
* There should be no maintenance cost. If we would avoid modifying
the
upstream package it would be the ideal case.
Of course we can't. If you automate importing of the packages in RPM
format in Fedora in 90%, that would be much more beneficial.
* It should be possible to patch the library in case of severe
issues
like CVEs. This is a contradiction to the previous point, but it could
be solved by unpacking the upstream package into a git repository and
creating a new, modified package in upstream format from it.
* It should be possible to run the build locally in exactly the same
way Koji does it. This is just about exposing our “registry” of packages
to the public Internet.
Finally, the implementation could be something like this:
A service like release-monitoring could monitor the upstream projects
for new releases (e.g. NPM has a RSS feed). Every time there is a new
release it would automatically synchronize our own Fedora-specific
registry, which would in turn be accessible in buildroot. Then the RPM
macros like “cargo_build” could be modified to use this registry.
If we wanted to have the possibility to patch these libraries we could
synchronize the upstream release into our git repository and generate
packages in upstream format from it.
You forgot one very important part here: you need to run tests on
every package of the registry every time their dependencies get update
and/or buildroot changes.
What do you think about the way we build RPM packages for Go and Rust?
Do you see other solutions or is it ok the way it is? (Please do not
suggest to “fix” the upstream communities)
I think we:
* build them fine
* should continue on improving RPM to hide all non-interesting details
from maintainers
* should improve our buildsystem
* automate most of the things
Thanks for your opinions!
Martin
FAQ:
* Does it mean you want to get rid of -devel subpackages for C libraries?
No. First of all C language does not have a single package manager and
it is still common to install dependencies using the distro specific
package manager. But packaging into RPM also comes with the advantage of
having shared libraries so in this case it has a positive effect on the
end-user experience. Rust and Go libraries on the other hand are mostly
useless for end-users.
* I prefer to install development libraries using the distro package
manager instead of the upstream package manager. Why do you want to
change it?
This is the case for C development, but it is different in most of the
modern languages. For example we ship Python packages, but we discourage
people from using them for their own development. Python developers are
instead encouraged to use pip to create a virtual environment and use
upstream packages. In the case of Rust, using the system libraries would
be much harder than using the upstream package manager.
That's not true. It is pretty easy to use them in Rust ecosystem, just
drop one configuration file. It is just that no one yet implemented
support for having multiple registries in Cargo (in the sense you can
use system crates together with crates.io one).
>
> * Does it mean you want to get rid of Rust packages like ripgrep?
>
> No. The end user experience stays unchanged. Packages that we ship to
> Fedora users must be in RPM format, it is only the buildroot that changes.
>
> Notes:
> [1] Please don’t discuss if it is a good idea or not. This is a
> discussion for upstream of those languages.
> [2] Again, please don’t discuss static vs. dynamic linking here.
> [3]
https://stephencoakley.com/2019/04/24/how-rust-solved-dependency-hell
> _______________________________________________
> devel mailing list -- devel(a)lists.fedoraproject.org
> To unsubscribe send an email to devel-leave(a)lists.fedoraproject.org
> Fedora Code of Conduct:
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> List Guidelines:
https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives:
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org