On Mon, May 4, 2020 at 11:06 AM Tomas Tomecek ttomecek@redhat.com wrote:
Let’s talk about dist-git, as a place where we work. For us, packagers, it’s a well-known place. Yet for newcomers, it may take a while to learn all the details. Even though we operate with projects in a dist-git repository, the layout doesn’t resemble the respective upstream project.
There is a multitude of tasks we tend to perform in a dist-git repo:
- Bumping a release field for sake of a rebuild.
- Updating to the latest upstream release.
- Resolving CVEs.
- Fixing bugs by…
- Changing a spec file.
- Pulling a commit from upstream.
- Or even backporting a commit.
- And more...
For some tasks, the workflow is just fine and pretty straightforward. But for the other, it’s very gruesome - the moment you need to touch patch files, the horror comes in. The fact that we operate with patch files, in a git repository, is just mind-boggling to me.
Luckily, we have tooling which supports the repository layout - `fedpkg prep`, `srpm` or `mockbuild` are such handy commands - you can easily inspect the source tree or make sure your local change builds.
Where am I getting with this?
Over the years there have been multiple tools created to improve the development experience: rdopkg [r], rpkg-util [ru], tito [t] and probably much much more (e.g. the way Fedora kernel developers work on kernel [k]).
In the packit project, we work in source-git repositories. These are pretty much upstream repositories combined with Fedora downstream packaging files. An example: I recently added a project called nyancat [n] to Fedora. I have worked [w] on packaging the project in the GitHub repo and then just pushed the changes to dist-git using packit tooling. These source-git repositories can live anywhere: we have support for GitHub right now and are working on supporting pagure.
Would there be an interest within the community, as opt-in, to have such source-git repositories created for respective dist-git repositories? The idea is that you would work in the source-git repo and then let packit handle synchronization with a respective dist-git repo. Our aim is to provide the contribution experience you have in GitHub when working on your packages. Dist-git would still be the authoritative source and a place where official builds are done - the source-git repo would work as a way to collaborate. We also don’t have plans right now to integrate packit into fedpkg.
The main reason I am sending this is to gather feedback from all of you whether there is an interest in such a workflow. We don’t have concrete plans for Fedora right now but based on your feedback we could.
Hello Tomas,
I have a fair bit of experience with operating in both so-called "source-git" and "dist-git" workflows. I've known them by the names of "merged-source" and "split-source" trees respectively, so forgive me if I use that terminology, since it makes conveying the point a bit easier.
In the merged-source world, the packaging is an aspect of managing the software codebase. This is common in Debian and ALT Linux, where the standard practice with their tooling is to fork the codebase and integrate the packaging files into the tree. Changes then are managed as part of evolving the sources, and packaging is mainly touched when preparing to push to build. And for $DAYJOB, I've implemented this model for software that $DAYJOB makes (we use the split-source model for stuff we didn't write).
Obviously, you understand the advantages of this approach (managing patches is easier as Git commits, you have access to rebase and merge logic for code, etc.). However, in my experience seeing these in use at a large scale, the major downside is that it inhibits the need to work with the software developers of the project to contribute improvements. Sometimes this is unavoidable (the RHEL ipa, kernel, rpm, samba, and systemd packages come to mind here), but most of the time, I don't see these large fork trees being necessary in RHEL or Fedora. In general, where I've seen this implemented on a distro-wide scale, the contribution levels from the distribution drop by a large margin. There is also the added issue of it becoming a lot more difficult to sort through the differences between upstream and downstream changes. They all look the same in the merged-source model, which makes it hard for others to discover Fedora-only changes and potentially help to bring those changes upstream.
In the split-source model, it is very clear what changes are downstream in Fedora only. The downstream changes are all patches, and moving to new versions often requires dealing with that patch set, evaluating what is still needed and doing the required technical work to support moving forward. This minor wrinkle is often enough to get packagers to get in touch with upstream projects and communicate with them. Most people need that tiny bit of extra friction to be pushed to contribute upstream, especially some of those who work on Fedora because they have to, not because they want to. It's also easy to tell at a glance whether a package is "messy" or not, because you can easily tell how much downstream work is required to make it suitable for Fedora. At least from my perspective, the patch load is a factor in judging how difficult a package is to maintain. The split-source model ultimately makes it clear who is responsible for behavioral changes to a package. If it's the result of a downstream patch, it is our fault. If it isn't, it's upstream's. The merged-source model makes this determination much harder. Not impossible, but harder.
Am I completely against the idea of optionally offering merged-source trees for packaging in Fedora? No. But merged-source requires a lot more discipline than split-source, and I'd like for us to figure out technical and social solutions for encouraging that we clearly identify upstream/downstream changes in merged-source package trees and provide a means to encourage people to continue to stay close to upstream projects[1] as part of using a merged-source/source-git model.
There is also that any source-git/merged-source model would require forking into Fedora's server (src.fedoraproject.org) in a new namespace (sources) and have the same restrictions that the split-source/dist-git model has (no rebasing, no branch deletion, no tag updating, etc.). Not doing so would cause major problems in terms of reproducible builds, but this also makes working with the source tree a lot more painful. Perhaps if we never directly built from it and exported released sources as tarballs, then it wouldn't be necessary, but those are details to figure out if we move forward with this idea.
[1]: https://fedoraproject.org/wiki/Staying_close_to_upstream_projects
-- 真実はいつも一つ!/ Always, there's only one truth!