On Mon, May 4, 2020 at 11:06 AM Tomas Tomecek <ttomecek(a)redhat.com> wrote:
Let’s talk about dist-git, as a place where we work. For us,
packagers, it’s a well-known place. Yet for newcomers, it may take a
while to learn all the details. Even though we operate with projects
in a dist-git repository, the layout doesn’t resemble the respective
There is a multitude of tasks we tend to perform in a dist-git repo:
* Bumping a release field for sake of a rebuild.
* Updating to the latest upstream release.
* Resolving CVEs.
* Fixing bugs by…
* Changing a spec file.
* Pulling a commit from upstream.
* Or even backporting a commit.
* And more...
For some tasks, the workflow is just fine and pretty straightforward.
But for the other, it’s very gruesome - the moment you need to touch
patch files, the horror comes in. The fact that we operate with patch
files, in a git repository, is just mind-boggling to me.
Luckily, we have tooling which supports the repository layout -
`fedpkg prep`, `srpm` or `mockbuild` are such handy commands - you can
easily inspect the source tree or make sure your local change builds.
Where am I getting with this?
Over the years there have been multiple tools created to improve the
rdopkg [r], rpkg-util [ru], tito [t] and probably much much more (e.g.
the way Fedora kernel developers work on kernel [k]).
In the packit project, we work in source-git repositories. These are
pretty much upstream repositories combined with Fedora downstream
packaging files. An example: I recently added a project called nyancat
[n] to Fedora. I have worked [w] on packaging the project in the
GitHub repo and then just pushed the changes to dist-git using packit
tooling. These source-git repositories can live anywhere: we have
support for GitHub right now and are working on supporting pagure.
Would there be an interest within the community, as opt-in, to have
such source-git repositories created for respective dist-git
repositories? The idea is that you would work in the source-git repo
and then let packit handle synchronization with a respective dist-git
repo. Our aim is to provide the contribution experience you have in
GitHub when working on your packages. Dist-git would still be the
authoritative source and a place where official builds are done - the
source-git repo would work as a way to collaborate. We also don’t have
plans right now to integrate packit into fedpkg.
The main reason I am sending this is to gather feedback from all of
you whether there is an interest in such a workflow. We don’t have
concrete plans for Fedora right now but based on your feedback we
I have a fair bit of experience with operating in both so-called
"source-git" and "dist-git" workflows. I've known them by the
"merged-source" and "split-source" trees respectively, so forgive me
if I use that terminology, since it makes conveying the point a bit
In the merged-source world, the packaging is an aspect of managing the
software codebase. This is common in Debian and ALT Linux, where the
standard practice with their tooling is to fork the codebase and
integrate the packaging files into the tree. Changes then are managed
as part of evolving the sources, and packaging is mainly touched when
preparing to push to build. And for $DAYJOB, I've implemented this
model for software that $DAYJOB makes (we use the split-source model
for stuff we didn't write).
Obviously, you understand the advantages of this approach (managing
patches is easier as Git commits, you have access to rebase and merge
logic for code, etc.). However, in my experience seeing these in use
at a large scale, the major downside is that it inhibits the need to
work with the software developers of the project to contribute
improvements. Sometimes this is unavoidable (the RHEL ipa, kernel,
rpm, samba, and systemd packages come to mind here), but most of the
time, I don't see these large fork trees being necessary in RHEL or
Fedora. In general, where I've seen this implemented on a distro-wide
scale, the contribution levels from the distribution drop by a large
margin. There is also the added issue of it becoming a lot more
difficult to sort through the differences between upstream and
downstream changes. They all look the same in the merged-source model,
which makes it hard for others to discover Fedora-only changes and
potentially help to bring those changes upstream.
In the split-source model, it is very clear what changes are
downstream in Fedora only. The downstream changes are all patches, and
moving to new versions often requires dealing with that patch set,
evaluating what is still needed and doing the required technical work
to support moving forward. This minor wrinkle is often enough to get
packagers to get in touch with upstream projects and communicate with
them. Most people need that tiny bit of extra friction to be pushed to
contribute upstream, especially some of those who work on Fedora
because they have to, not because they want to. It's also easy to tell
at a glance whether a package is "messy" or not, because you can
easily tell how much downstream work is required to make it suitable
for Fedora. At least from my perspective, the patch load is a factor
in judging how difficult a package is to maintain. The split-source
model ultimately makes it clear who is responsible for behavioral
changes to a package. If it's the result of a downstream patch, it is
our fault. If it isn't, it's upstream's. The merged-source model makes
this determination much harder. Not impossible, but harder.
Am I completely against the idea of optionally offering merged-source
trees for packaging in Fedora? No. But merged-source requires a lot
more discipline than split-source, and I'd like for us to figure out
technical and social solutions for encouraging that we clearly
identify upstream/downstream changes in merged-source package trees
and provide a means to encourage people to continue to stay close to
upstream projects as part of using a merged-source/source-git
There is also that any source-git/merged-source model would require
forking into Fedora's server (src.fedoraproject.org
) in a new
namespace (sources) and have the same restrictions that the
split-source/dist-git model has (no rebasing, no branch deletion, no
tag updating, etc.). Not doing so would cause major problems in terms
of reproducible builds, but this also makes working with the source
tree a lot more painful. Perhaps if we never directly built from it
and exported released sources as tarballs, then it wouldn't be
necessary, but those are details to figure out if we move forward with
真実はいつも一つ！/ Always, there's only one truth!