Re: musings on rust packaging [was Re: F38 proposal: RPM Sequoia (System-Wide Change proposal)]

Tuesday, 1 November 2022

On Wed, Oct 19, 2022 at 01:04:39PM +0200, Fabio Valentini wrote:
...
 I'll respond inline. 
Me too -- and apologies for the delay.

...
 > I fundamentally disagree with Kevin on a deep level about
"entirely
 > useless", but ... find myself kind of agreeing about the
"unpackagable"
 > part. I mean: clearly we've found a way, but I'm really not sure we're
 > providing a lot of _value_ in this approach, and I'm also not sure it's
 > as successful as it could be.
 We *do* provide value to both users *and* developers by doing things
 the way we do, but the benefits might not be obvious to people who
 don't know how (Rust) packaging works, and what we as package
 maintainers do. 
Let me rephrase: I absolutely think you've provided value and are providing
value (and I appreciate it). I am not convinced that the value is in the
RPM-izing part, though.

[...]
...
 This is due to a limitation of how cargo handles target-specific
 dependencies - all dependencies that are *mentioned in any way* need
 to be *present* for it to resolve dependencies / enabled optional
 features / update its lockfile etc. But since we don't want to package
 bindings for Windows and mac OS system APIs, we need to actually patch
 them out, otherwise builds will fail. 
Theoretically, if we had our own crate repository, we could either make
those changes there (possibly using something like packit to carry the
patches) -- or, just, not make the changes and not worry because we know
those won't end up used anyway?

...
 You must realize that this is an extreme case. For many Rust
 applications that people want to package for Fedora, the number of
 dependencies that are missing is rather small, *because* most popular
 libraries are already packaged. 
It may be that I just hear about the difficult cases.

...
 We might need to reconsider how to package projects like this.
I'm
 pretty sure we could find a way to package them in a way that's
 compatible with how we're currently doing things but would be much
 less busywork. 
Okay, I'm open to that.

...
 Sure, but isn't that the case for most projects that a newcomer
wants
 to package, regardless of programming language? Say, somebody wants to
 package some cool new Python project for machine learning, then
 there's probably also some linear algebra package or SIMD math library
 in the dependency tree that's missing from Fedora. How is that
 different? 
Rust tends to be more fine-grained. I don't think this is necessarily
rust-specific _really_ — I think it's a trend as people get more used to
this way of doing things. With Python, there are some big packages
(including "batteries included" standard Python itself) which tend to group
big related sets of functionality. (notably: numpy, scipy, pandas...)

...
 For intra-project dependencies (i.e. bevy components depending on
 exact versions of bevy components), this is kind of expected, and we
 have tools to deal with this kind of situation (though bevy is on a
 different scale). For dependencies on third-party libraries, this is
 kind of unexpected, and I wonder why they do things like that? Locking
 some dependencies to exact versions is usually handled by relying on
 the lockfile, instead. 
I was wrong about this. I actually didn't realize that the ^ was optional. I
was, um, cargo-culting that around. Ah well. Anyway, that's less of a
problem than I worried.

...
 > The packaging guidelines say that I SHOULD create patches to
update to
 > latest versions of dependencies, and that I should further convince the
 > upstream to take them. Candidly, that seems like a waste of everone's
 > time.
 This is *not* a waste of time. If we don't invest time to do that, many
 project's dependencies grow stale, and actually *increase* the need for us
 to maintain compat packages. 
I have not tried this with any Rust package. My experience in the past is
that many upstreams find this the kind of thing that makes them go on long
blog rants about distro packaging -- they picked a version, it works fine,
they don't need the distraction of being told they must update it.

But even when this doesn't happen, it gets into the matter of expertise. If
I need to update a dependency for a newer-version of the sub-dependency, and
I don't know enough about either code base to do anything other than file a
"please update" bug, then everything is blocked on that.

I don't dispute that helping projects keep up to the latest is valuable
work. It even seems like it might be in-scope work for Fedora. But couldn't
we do that as something _separate_ from blocking ourselves (either literally
or through the extra overhead of compat packages) from packaging the
dependent app?

...
 > The guidelines provide for creating compat packages, but that
means 1) the
 > existing shared work is less useful, 2) requires even more extra steps, and
 > 3) even without reviews for compat has extra administrative overhead.

 We only maintain compat packages where porting to the new version (and
 submitting the changes upstream) is not feasible. Again, isn't that
 how Fedora is supposed to work? 
I guess it depends on how broadly one reads "feasible". :) 

...
 The barrier for participation is too high in some cases, I agree.
 However, in my experience, that's for a different reason:

 The "shiny new things that happen to be written in Rust" that new
 contributors want to have in Fedora are often very complicated
 projects that even experienced Rust packagers would need to spend a
 lot of time on.

 Examples of that might be:
 - wasmtime: I ultimately abandoned the attempt to package it "because
 Fedora Legal", but the packages themselves worked fine 
An aside, but: did I miss something with this on the Legal list? The only
thing I'm finding is a question about how to phrase `Apache-2.0 WITH
LLVM-exception`.

...
 - deno: requires dozens of new packages, some of which also have
 unclear / questionable licenses as well, but the packages themselves
 worked 
I'm not sure this example isn't agreeing with me. :)

...
 On the other hand, many "nice" CLI tools that people want
to package
 often require minimal knowledge of Rust packaging (our tools are
 pretty nice for "standard" projects), and often only need very few new
 dependencies to be packaged.

 Just as an example, I just today started reviewing a "simple" Rust
 application here:
 https://bugzilla.redhat.com/show_bug.cgi?id=1990713

 The spec file is very simple and almost entirely automatically
 generated (with the exception of the missing License breakdown for the
 statically linked binary), no dependencies were missing from Fedora.
 Even Rust newbies would not have trouble packaging this, and that
 would be a way better entry point than packaging stuff like Bevy. 
How many lines of that are unique to that package.

I guess my impression of this is a little big colored by a Python packaging
adventure from a little bit ago:
https://lists.fedoraproject.org/archives/list/python-devel@lists.fedorapr...
... where it kind of turns out that what looks like it could be automated
ends up with a hand-tuned specfile with a lot of exceptions.

I'm hopeful that maybe the Rust version of this could be more streamlined.

What if, instead of the specfile + boilerplate, there was a toml (or yaml,
or whatever) file that just listed whatever is unique to the package?

For this package, maybe it is... nothing? Just need an indicator that this
is a Rust package.

I know it looks simple to an experienced packager, but that specfile has a
_lot_ of complicated domain knowledge -- both general Fedora RPM packaging
and the rust packaging macros.

...
 > And, I led with: I appreciate all the work you've all done
to make this
 > work. That's definitely true — I think it was super-valuable to pilot this
 > approach. But I think that the Rust ecosystem would be a great place to
 > pilot a different way. Something lightweight where we cache crates and use
 > them _directly_ in the build process for _application_ RPMs.

 We have talked about this multiple times, but it won't work.
 I think this was tried with first-class maven artifact support in
 koji, but we all know how the Java packaging fiasco ended. 
I would rather see it as: we learned some lessons from that approach and can
do it better.

...
 Or even if making Rust crates first-class deliverables *did work*,
it
 wouldn't give us the benefits of the current approach:
 - we ensure that all crates in Fedora *build* on all architectures
 - we ensure that most crates in Fedora pass their test suites on all
 architectures 
But those things aren't attached to making them into RPMs.

...
 - we check all crates for objectionable content, licensing problems,
etc. 
Nor is this. And I don't think we _should_ skip this part. It is clear
value.

...
 - we change build flags to default to dynamically linking to system
 libraries instead of statically linking against vendored copies 
This too.

Mostly, at least. Assuming this isn't _prebuilt binaries_ or similar,
upstream may or may not have a good reason or strong opinion. Like any
bundling, we need a system which can track and react to security problems
with those libraries, though. (And we don't meaningfully have that for RPMs
now either.)

...
 This would mean that we basically stop contributing things to the
 upstream Rust ecosystem:
 - we diagnose / report / fix architecture support issues
 - we port projects to new versions of dependencies
 - etc. 
Why? I think it would give us _more_ time to do those things.

...
 I see this work in the upstream ecosystem as an important part of
the
 work we do in packaging Rust crates for Fedora,
 and I would not want to endorse an approach that meant we no longer do
 these things. 

Sure!

...
 > Rust packages include a lot of machine-readable metadata. We
should be
 > able to watch for CVEs, RustSec, and other security notices even without
 > encoding the metadata in RPMs. License review could also be automated —
 > the field in Cargo.toml is supposed to be SPDX, so that's convenient.
 > [3]

 I already monitor RustSec advisories and check *all of them* against
 Fedora packages. This takes up a miniscule amount of the time I spend
 on Rust packaging (because there's so few Rust security advisories).
 If I remember correctly, there were only 2-3 CVE issues in the Rust
 stack that actually affected our packages, and dealing with those was
 very simple:
 1) Push the patched version of the library, 2) rebuild dependent
 applications, 3) submit to bodhi.
 There's some amount of automation that *could* be done (mostly in
 figuring out which applications need to be rebuilt for a given library
 change), but that's also pretty easily done with a "dnf repoquery" or
 two. 
I appreciate that you do this (very much!). But it seems like this could be
_entirely_ automated (with alerts for dependency or license changes, etc).

...
 On the other hand, license review is still important, even if
it's
 already available in SPDX format in the upstream metadata.
 Just because sometimes, that metadata is either wrong or incomplete.
 And even more often, package review flags other problems (like missing
 LICENSE files for licenses that *require* redistributed sources to
 contain a copy of the license text). Relying on SPDX metadata alone is
 *not* safe. 
Again, yes -- and we should talk to Jilanye and others on the Legal list,
because we can do this better generally. For most packages, there's a
one-time review gate, applied with various diligence depending on the
packager and reviewer. Then, maybe never looked at again. For packages that
go into RHEL, there's another review by RH that _hopefully_ (and by policy!)
should go back to the Fedora packages.

...

 > We could also attach other metadata to the packages in the cache. Maybe some
 > popularity, update frequency from Cargo.io, but also package review flags:
 > checked license against source, and whatever other auditing we think should
 > be done. This moves the focus from specfile-correctness to the package
 > itself, and the effort from packaging to reviewing. (I'd suggest that for
 > the experiment, we not make any deep auditing manditory, but instead
 > encouraged.) And these flags should be able to be added by anyone in the
 > Rust SIG, not necessarily just at import.

 This is already the case, though?
 Writing a spec file for a new crate is already automated to the point
 where "standard" crates can be 100% automatically generated and need
 zero manual edits. 
See my comment above -- there are a lot of steps and a l

...
 If manual changes *are* required, then these changes would also be
 required in the "first-class crate artifact" scenario, so you don't
 gain anything.
 And if there's other problems that are caught during package review,
 the distribution mechanism doesn't matter, either. 
But our mechanism is really complicated -- a barrier to entry, lots of
places for mistakes, and even with collaboration with other distros, very
Fedora-specific. So I think there is something to gain.

...
 In my experience, changing the distribution mechanism or packaging
 paradigm will often make things *worse* instead of better. For
 example, the implosion of the NodeJS package ecosystem in Fedora was
 not only caused by the horrid state NPM, but also because the new
 packaging guidelines which prefer bundling essentially made it
 impossible for packagers to verify that objectionable content is
 present in vendored dependencies. For Java, Modularity was seen as a
 "solution", but the result was that basically everybody - except for
 the Red Hat maintainers who maintained the modules - just stopped
 doing Java packaging because of the hostile environment. 
I really hope we can look at these and learn how to do it better, instead of
deciding that better isn't possible. And — while I'm not really up on node —
I have pretty good hindsight on what went wrong with modularity. (Not enough
to try modularity _again_ just yet... but that's a different thing. A whole
talk for next year's Nest/Flock, maybe....)

...
 > Rust packaging seems like a great place to lead the way — and
then we can
 > maybe expand to Go, which has similar issues, and then Java (where, you
 > know, things have already collapsed despite heroic effort.)

 Oh, actually, I don't think Rust packaging is a good place to start
 here at all. :)

 The way cargo works already maps very neatly onto how RPM packages
 work, which is definitely *not* the case for other language
 ecosystems. I also think we could even massively improve handling of
 "large" projects with many sub-components (like bevy, zola, wasmtime,
 deno, etc.) - which are currently the only projects that are "painful"
 to package - *without* completely changing the underlying packaging
 paradigm or distribution mechanism. (I've been wanting to actually
 write better tooling for this use case, but alas, Bachelor thesis is
 more important for now.) 
I think we can both be right, here: the simple mapping seems like it makes
it good to experiment with.

...
 alternatives, all attempts at trying different approaches (maven
 artifacts in koji, vendoring NodeJS dependencies, Java Modules, etc.)
 have *failed* and ultimately made things worse instead of improving
 the situation - the only thing that has proven to be sustainable (for
 now) is ... maybe surprisingly, plain RPM packages. 
I'll take "for now". :)

-- 
Matthew Miller
<mattdm(a)fedoraproject.org&gt;
Fedora Project Leader

-- 
Matthew Miller
<mattdm(a)fedoraproject.org&gt;
Fedora Project Leader

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: musings on rust packaging [was Re: F38 proposal: RPM Sequoia (System-Wide Change proposal)]