Let’s talk about dist-git, as a place where we work. For us, packagers, it’s a well-known place. Yet for newcomers, it may take a while to learn all the details. Even though we operate with projects in a dist-git repository, the layout doesn’t resemble the respective upstream project.
There is a multitude of tasks we tend to perform in a dist-git repo: * Bumping a release field for sake of a rebuild. * Updating to the latest upstream release. * Resolving CVEs. * Fixing bugs by… * Changing a spec file. * Pulling a commit from upstream. * Or even backporting a commit. * And more...
For some tasks, the workflow is just fine and pretty straightforward. But for the other, it’s very gruesome - the moment you need to touch patch files, the horror comes in. The fact that we operate with patch files, in a git repository, is just mind-boggling to me.
Luckily, we have tooling which supports the repository layout - `fedpkg prep`, `srpm` or `mockbuild` are such handy commands - you can easily inspect the source tree or make sure your local change builds.
Where am I getting with this?
Over the years there have been multiple tools created to improve the development experience: rdopkg [r], rpkg-util [ru], tito [t] and probably much much more (e.g. the way Fedora kernel developers work on kernel [k]).
In the packit project, we work in source-git repositories. These are pretty much upstream repositories combined with Fedora downstream packaging files. An example: I recently added a project called nyancat [n] to Fedora. I have worked [w] on packaging the project in the GitHub repo and then just pushed the changes to dist-git using packit tooling. These source-git repositories can live anywhere: we have support for GitHub right now and are working on supporting pagure.
Would there be an interest within the community, as opt-in, to have such source-git repositories created for respective dist-git repositories? The idea is that you would work in the source-git repo and then let packit handle synchronization with a respective dist-git repo. Our aim is to provide the contribution experience you have in GitHub when working on your packages. Dist-git would still be the authoritative source and a place where official builds are done - the source-git repo would work as a way to collaborate. We also don’t have plans right now to integrate packit into fedpkg.
The main reason I am sending this is to gather feedback from all of you whether there is an interest in such a workflow. We don’t have concrete plans for Fedora right now but based on your feedback we could.
[r] https://github.com/softwarefactory-project/rdopkg [ru] https://pagure.io/rpkg-util [t] https://github.com/rpm-software-management/tito [k] https://lists.fedoraproject.org/archives/list/infrastructure@lists.fedorapro... [n] https://github.com/TomasTomecek/nyancat [w] https://github.com/TomasTomecek/nyancat/pull/2
Cheers, Tomas
This may not be related enough to discuss here so I'll take it to another thread if needed, but...
One thing that really bugs me is there's still a catch-22. When you're working on a new package there is no "git" to work with. I used to just install all the -devel packages and work with rpmbuild directly but you have to override the default:
rpmbuild/{BUILD,BUILDROOT,SPECS,SOURCES,RPMS,SRPMS} mess
For the longest time I at least overrode it so it wouldn't mix everything together by putting the package name in the mix: rpmbuild/<pkg>/...
But that's still not ideal, so I started creating pagure projects to get me a fedpkg like experience, however, rpmbuild still pollutes the directory as I haven't found the perfect .rpmmacros setup, but nothing "fedpkg clean -x" doesn't clean up.
Once a package is accepted I import it into dist-git and delete the pagure project. I know some people just initialize a git repo manually...
All that to say it would be nice to have a well documented workflow from start to finish.
Thanks, Richard
* Tomas Tomecek:
In the packit project, we work in source-git repositories. These are pretty much upstream repositories combined with Fedora downstream packaging files. An example: I recently added a project called nyancat [n] to Fedora. I have worked [w] on packaging the project in the GitHub repo and then just pushed the changes to dist-git using packit tooling. These source-git repositories can live anywhere: we have support for GitHub right now and are working on supporting pagure.
I think Fedora has decided to adopt Gitlab, not Github (or Pagure). Are there plans to port Packit over to Gitlab?
Thanks, Florian
On 04. 05. 20 17:05, Tomas Tomecek wrote:
Dist-git would still be the authoritative source and a place where official builds are done - the source-git repo would work as a way to collaborate. We also don’t have plans right now to integrate packit into fedpkg.
So if packager A decides to use this fancy new thing you propose for package P, and a packager B does some change in the dist-git repo of package P, what exactly happens next time packager A wants to do some changes?
On Monday, 4 May 2020 16.41.26 WEST Richard Shaw wrote:
One thing that really bugs me is there's still a catch-22. When you're working on a new package there is no "git" to work with. I used to just install all the -devel packages and work with rpmbuild directly but you have to override the default:
rpmbuild/{BUILD,BUILDROOT,SPECS,SOURCES,RPMS,SRPMS} mess
For the longest time I at least overrode it so it wouldn't mix everything together by putting the package name in the mix: rpmbuild/<pkg>/...
So did I, for the last 16 years I have in ~/.rpmmacros
%_topdir /home/jamatos/rpm/ %_rpmtopdir %{_topdir}/build/%{?name:%name} %_sourcedir %{_rpmtopdir} %_specdir %{_rpmtopdir} %_rpmdir %{_rpmtopdir} %_srcrpmdir %{_rpmtopdir}
Le lundi 04 mai 2020 à 17:31 +0100, José Abílio Matos a écrit :
On Monday, 4 May 2020 16.41.26 WEST Richard Shaw wrote:
For the longest time I at least overrode it so it wouldn't mix
everything
together by putting the package name in the mix: rpmbuild/<pkg>/...
So did I, for the last 16 years I have in ~/.rpmmacros
%_topdir /home/jamatos/rpm/ %_rpmtopdir %{_topdir}/build/%{?name:%name} %_sourcedir %{_rpmtopdir} %_specdir %{_rpmtopdir} %_rpmdir %{_rpmtopdir} %_srcrpmdir %{_rpmtopdir}
That is broken nowadays because rpm 4.15 evaluates %{_sourcedir} before reading the spec file that sets %{name}, rpm 4.16 will warn loudly of the use of an unitialised variable in %{_sourcedir} (things still work, for now) and a future rpm release will probably error not warn on those.
warning: undefined macro(s) in %{_sourcedir}: …/rpmbuild/SOURCES/%{name}
You may fool things for a while with the %_rpmtopdir + %{?name:%name} but I doubt that will survive the purge long.
Regards,
So what is the workflow, how you update to the latest upstream? Or how you apply custom patch?
Vít
Dne 04. 05. 20 v 17:05 Tomas Tomecek napsal(a):
Let’s talk about dist-git, as a place where we work. For us, packagers, it’s a well-known place. Yet for newcomers, it may take a while to learn all the details. Even though we operate with projects in a dist-git repository, the layout doesn’t resemble the respective upstream project.
There is a multitude of tasks we tend to perform in a dist-git repo:
- Bumping a release field for sake of a rebuild.
- Updating to the latest upstream release.
- Resolving CVEs.
- Fixing bugs by…
- Changing a spec file.
- Pulling a commit from upstream.
- Or even backporting a commit.
- And more...
For some tasks, the workflow is just fine and pretty straightforward. But for the other, it’s very gruesome - the moment you need to touch patch files, the horror comes in. The fact that we operate with patch files, in a git repository, is just mind-boggling to me.
Luckily, we have tooling which supports the repository layout - `fedpkg prep`, `srpm` or `mockbuild` are such handy commands - you can easily inspect the source tree or make sure your local change builds.
Where am I getting with this?
Over the years there have been multiple tools created to improve the development experience: rdopkg [r], rpkg-util [ru], tito [t] and probably much much more (e.g. the way Fedora kernel developers work on kernel [k]).
In the packit project, we work in source-git repositories. These are pretty much upstream repositories combined with Fedora downstream packaging files. An example: I recently added a project called nyancat [n] to Fedora. I have worked [w] on packaging the project in the GitHub repo and then just pushed the changes to dist-git using packit tooling. These source-git repositories can live anywhere: we have support for GitHub right now and are working on supporting pagure.
Would there be an interest within the community, as opt-in, to have such source-git repositories created for respective dist-git repositories? The idea is that you would work in the source-git repo and then let packit handle synchronization with a respective dist-git repo. Our aim is to provide the contribution experience you have in GitHub when working on your packages. Dist-git would still be the authoritative source and a place where official builds are done - the source-git repo would work as a way to collaborate. We also don’t have plans right now to integrate packit into fedpkg.
The main reason I am sending this is to gather feedback from all of you whether there is an interest in such a workflow. We don’t have concrete plans for Fedora right now but based on your feedback we could.
[r] https://github.com/softwarefactory-project/rdopkg [ru] https://pagure.io/rpkg-util [t] https://github.com/rpm-software-management/tito [k] https://lists.fedoraproject.org/archives/list/infrastructure@lists.fedorapro... [n] https://github.com/TomasTomecek/nyancat [w] https://github.com/TomasTomecek/nyancat/pull/2
Cheers, Tomas _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
On Mon, May 04, 2020 at 05:05:02PM +0200, Tomas Tomecek wrote:
Over the years there have been multiple tools created to improve the development experience: rdopkg [r], rpkg-util [ru], tito [t] and probably much much more (e.g. the way Fedora kernel developers work on kernel [k]).
In the packit project, we work in source-git repositories. These are pretty much upstream repositories combined with Fedora downstream packaging files.
And then come another distribution with a request to combine its dist-git into the upstream. Fedora is not the only distribution. Do you know how many distributions exist? From my point of view as a upstream it's one big NO.
From point of view of a Fedora packager, it's just moving Fedora bits into another repository with the burden of synchronizing that repository with dist-git (and back because of what an authoritative source for Fedora is).
If you want to introduce an intermediary third repository between the upstream and the distribution, a repository that would normalize (read git-ify) the upstream and overlay downstream patches and metadata, then, ehm, it's a nice project for exploration how far we go with unification among the distributions. But I'm quite sceptical regarding it's adoption. But don't take my prognosis seriously. I can be mistaken. There are some positive prior arts like release-monitoring.org.
-- Petr
Florian, a very good point. Yes, we are planning to support GitLab - we have a GSoC project for it: https://pagure.io/mentored-projects/issue/69
On Mon, May 4, 2020 at 6:07 PM Florian Weimer fweimer@redhat.com wrote:
- Tomas Tomecek:
In the packit project, we work in source-git repositories. These are pretty much upstream repositories combined with Fedora downstream packaging files. An example: I recently added a project called nyancat [n] to Fedora. I have worked [w] on packaging the project in the GitHub repo and then just pushed the changes to dist-git using packit tooling. These source-git repositories can live anywhere: we have support for GitHub right now and are working on supporting pagure.
I think Fedora has decided to adopt Gitlab, not Github (or Pagure). Are there plans to port Packit over to Gitlab?
Thanks, Florian
* Tomas Tomecek:
Florian, a very good point. Yes, we are planning to support GitLab - we have a GSoC project for it:
Is a GSoC project really the appropriate vehicle for this?
Gitlab has one major advantage over Github: it is possible to restrict merge requests. This makes it much simpler to experiment with it. With Github, once you are on that platform, you have to deal with pull requests in some way, even if you don't want to.
Thanks, Florian
On Mon, May 4, 2020 at 5:43 PM Richard Shaw hobbes1069@gmail.com wrote:
This may not be related enough to discuss here so I'll take it to another thread if needed, but...
One thing that really bugs me is there's still a catch-22. When you're working on a new package there is no "git" to work with. I used to just install all the -devel packages and work with rpmbuild directly but you have to override the default:
rpmbuild/{BUILD,BUILDROOT,SPECS,SOURCES,RPMS,SRPMS} mess
For the longest time I at least overrode it so it wouldn't mix everything together by putting the package name in the mix: rpmbuild/<pkg>/...
But that's still not ideal, so I started creating pagure projects to get me a fedpkg like experience, however, rpmbuild still pollutes the directory as I haven't found the perfect .rpmmacros setup, but nothing "fedpkg clean -x" doesn't clean up.
Once a package is accepted I import it into dist-git and delete the pagure project. I know some people just initialize a git repo manually...
All that to say it would be nice to have a well documented workflow from start to finish.
Thanks, Richard
Richard, thanks for describing your experience.
I actually agree with you that the way of adding new packages is not ideal.
For me, I always begin in a git repo which in the end hopefully turns into the dist-git repo, or recently, create a source-git repo and host it on github and just let packit sync the content to dist-git.
As we can see, based on all the responses, we all do it differently - it would be awesome to have an efficient default which would work for the all of us.
Tomas
Thank you all for raising all the questions and concerns.
Before I reply, I'd like to stress that we are still in a prototype phase - not everything is solved (clearly) and at this point, we experiment with the workflow mostly.
Luckily, force-pushes are not allowed in dist-git, which makes the update/sync process easier (knowing that history cannot be changed). Therefore when a new commit lands in dist-git, we'd just "transform" it to source-git and pushed it to the source-git repo. We could even ask all the contributors to rebase their PRs when this happens. On the other hand, when a new commit lands in the source-git repo, we could either transform and push to dist-git directly or open a PR. The maintainer should be in control of this process. I understand the synchronisation adds friction to the overall architecture and may be the cause of many problems in the future - hence we are starting this discussion and using the technology ourselves to catch these issues asap. Víťo, does this answer your question?
Miro, you are talking about conflicts: I'd say that conflicts on the git level are normal and git has solid tools to resolve them. For the use case of 2 different people changes the same thing, we would treat dist-git as the authoritative place and let the person in source-git know about the conflict. But this can happen nowadays easily as well: 2 different people can open the same PR or even push to dist-git directly while only one would succeed.
Petr, I should have probably stressed that our target is Fedora (or even all Red Hat operating systems). Yes, there are hundreds of distributions and we cannot solve their problems. We are open for collaboration though - we cannot drive changes in distributions which we don't know or use.
Tomas
On Tue, May 5, 2020 at 8:56 AM Petr Pisar ppisar@redhat.com wrote:
On Mon, May 04, 2020 at 05:05:02PM +0200, Tomas Tomecek wrote:
Over the years there have been multiple tools created to improve the development experience: rdopkg [r], rpkg-util [ru], tito [t] and probably much much more (e.g. the way Fedora kernel developers work on kernel [k]).
In the packit project, we work in source-git repositories. These are pretty much upstream repositories combined with Fedora downstream packaging files.
And then come another distribution with a request to combine its dist-git into the upstream. Fedora is not the only distribution. Do you know how many distributions exist? From my point of view as a upstream it's one big NO.
From point of view of a Fedora packager, it's just moving Fedora bits into another repository with the burden of synchronizing that repository with dist-git (and back because of what an authoritative source for Fedora is).
If you want to introduce an intermediary third repository between the upstream and the distribution, a repository that would normalize (read git-ify) the upstream and overlay downstream patches and metadata, then, ehm, it's a nice project for exploration how far we go with unification among the distributions. But I'm quite sceptical regarding it's adoption. But don't take my prognosis seriously. I can be mistaken. There are some positive prior arts like release-monitoring.org.
-- Petr _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
On 05. 05. 20 12:41, Tomas Tomecek wrote:
Before I reply, I'd like to stress that we are still in a prototype phase - not everything is solved (clearly) and at this point, we experiment with the workflow mostly.
Experimenting is cool. Go ahead. As long as this is totally opt-in and does not affect me as a contributor even if the main package maintainer chooses to use it, all is good.
Luckily, force-pushes are not allowed in dist-git, which makes the update/sync process easier (knowing that history cannot be changed). Therefore when a new commit lands in dist-git, we'd just "transform" it to source-git and pushed it to the source-git repo. We could even ask all the contributors to rebase their PRs when this happens. On the other hand, when a new commit lands in the source-git repo, we could either transform and push to dist-git directly or open a PR. The maintainer should be in control of this process.
That sounds overly complicated. I thought the idea is to make thing easier, what am I missing?
I understand the synchronisation adds friction to the overall architecture and may be the cause of many problems in the future - hence we are starting this discussion and using the technology ourselves to catch these issues asap.
Good!
Miro, you are talking about conflicts: I'd say that conflicts on the git level are normal and git has solid tools to resolve them. For the use case of 2 different people changes the same thing, we would treat dist-git as the authoritative place and let the person in source-git know about the conflict. But this can happen nowadays easily as well: 2 different people can open the same PR or even push to dist-git directly while only one would succeed.
No, actually, that is a misunderstanding. I was simply talking about synchronization. But you have basically already answered my question above: When changes are done in dist-git, they will be somehow replicated in the source-git thing.
Petr, I should have probably stressed that our target is Fedora (or even all Red Hat operating systems). Yes, there are hundreds of distributions and we cannot solve their problems. We are open for collaboration though - we cannot drive changes in distributions which we don't know or use.
Do I understand correctly that you no longer aim for the "put Fedora spec files upstream" thing, but rather, "create an intermediate upstream sources / fedora scpec file" git repo hybrid? What's the benefit?
----
See for example with Python. We have some patches (although we are trying to get rid of them) and rebasing those has proven to be quite tedious with patch files only.
Hence, we keep this fork: https://github.com/fedora-python/cpython
It has our patches on top. When new version is released, we rebase our branch with git. We format-patch the commits and put them to dist git.
If I had time, I'd create automation that does this for me. Unfortunately I don't and I follow https://xkcd.com/1205/
In what way does keeping the spec file in our fork help us?
On Tue, 5 May 2020 at 13:06, Miro Hrončok mhroncok@redhat.com wrote:
On 05. 05. 20 12:41, Tomas Tomecek wrote:
Before I reply, I'd like to stress that we are still in a prototype phase - not everything is solved (clearly) and at this point, we experiment with the workflow mostly.
Experimenting is cool. Go ahead. As long as this is totally opt-in and does not affect me as a contributor even if the main package maintainer chooses to use it, all is good.
Luckily, force-pushes are not allowed in dist-git, which makes the update/sync process easier (knowing that history cannot be changed). Therefore when a new commit lands in dist-git, we'd just "transform" it to source-git and pushed it to the source-git repo. We could even ask all the contributors to rebase their PRs when this happens. On the other hand, when a new commit lands in the source-git repo, we could either transform and push to dist-git directly or open a PR. The maintainer should be in control of this process.
That sounds overly complicated. I thought the idea is to make thing easier, what am I missing?
I understand the synchronisation adds friction to the overall architecture and may be the cause of many problems in the future - hence we are starting this discussion and using the technology ourselves to catch these issues asap.
Good!
Miro, you are talking about conflicts: I'd say that conflicts on the git level are normal and git has solid tools to resolve them. For the use case of 2 different people changes the same thing, we would treat dist-git as the authoritative place and let the person in source-git know about the conflict. But this can happen nowadays easily as well: 2 different people can open the same PR or even push to dist-git directly while only one would succeed.
No, actually, that is a misunderstanding. I was simply talking about synchronization. But you have basically already answered my question above: When changes are done in dist-git, they will be somehow replicated in the source-git thing.
Petr, I should have probably stressed that our target is Fedora (or even all Red Hat operating systems). Yes, there are hundreds of distributions and we cannot solve their problems. We are open for collaboration though - we cannot drive changes in distributions which we don't know or use.
Do I understand correctly that you no longer aim for the "put Fedora spec files upstream" thing, but rather, "create an intermediate upstream sources / fedora scpec file" git repo hybrid? What's the benefit?
See for example with Python. We have some patches (although we are trying to get rid of them) and rebasing those has proven to be quite tedious with patch files only.
Hence, we keep this fork: https://github.com/fedora-python/cpython
Imho, it would be nice if this could live on src.fp.o in a separate dedicated namespace for source repos.
It has our patches on top. When new version is released, we rebase our branch with git. We format-patch the commits and put them to dist git.
If I had time, I'd create automation that does this for me. Unfortunately I don't and I follow https://xkcd.com/1205/
In what way does keeping the spec file in our fork help us?
-- Miro Hrončok -- Phone: +420777974800 IRC: mhroncok _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
On Tue, May 05, 2020 at 12:41:06PM +0200, Tomas Tomecek wrote:
Petr, I should have probably stressed that our target is Fedora (or even all Red Hat operating systems). Yes, there are hundreds of distributions and we cannot solve their problems. We are open for collaboration though - we cannot drive changes in distributions which we don't know or use.
If you only target Fedora, then it means that the same amount of Fedora maintainers will maintain twofold amount of repositories. Does it indeed save work? What's the benefit of maintaining more repositories?
Therefore I assumed you had targeted more distribution to share and externilize the maintenance.
Maybe my problem is that I don't buy your argument that if Fedora dist-git looks as Github, then Fedora will attract new packagers.
-- Petr
On Mon, May 4, 2020, at 11:05 AM, Tomas Tomecek wrote:
In the packit project, we work in source-git repositories
There's really 3 options:
- dist-git as it exists today (what we thought made sense in the days of CVS converted into git without much re-engineering) - source-git (used in Debian at least in some places) - OpenEmbedded-style "converged" git repos
I personally much prefer the OpenEmbedded style as a default; see: https://github.com/openembedded/openembedded-core/tree/master/meta
For most language packaging it' d be way saner e.g. a specs-$lang like github.com/fedora/specs-rust and github.com/fedora/specs-golang than to move from what we have now to source git.
Among other things, adding e.g. 3 packages at one can be done as a single pull request and tested as a unit.
Plus, they could share code directly in the git repo (as happens with OE/bitbake) rather than sharing code via packaging RPM macros.
Source git I could see for things where we basically need to fork and the upstream is complex; kernel/systemd perhaps.
Tomas Tomecek ttomecek@redhat.com writes:
Thank you all for raising all the questions and concerns.
Before I reply, I'd like to stress that we are still in a prototype phase - not everything is solved (clearly) and at this point, we experiment with the workflow mostly.
Luckily, force-pushes are not allowed in dist-git, which makes the update/sync process easier (knowing that history cannot be changed). Therefore when a new commit lands in dist-git, we'd just "transform" it to source-git and pushed it to the source-git repo. We could even ask all the contributors to rebase their PRs when this happens.
This "rebase all PRs" thing seems to be a recurring theme... What is the reason to ask contributors to rebase? (I mean, are we trying to go back to the days of centralized version control systems?)
In my experience, there is rarely a good reason to rebase (rebasing because the CI system tests the contributor's branch rather than the resulting merge is not a good reason; the CI system should be fixed instead) and asking everyone to rebase just slows things down. Please let's not go down that road, if possible.
Other than that, I like the idea of source git and I'm looking forward to using it, once the synchronization issues are resolved. Thanks for working on it.
Ondřej Lysoněk
On the other hand, when a new commit lands in the source-git repo, we could either transform and push to dist-git directly or open a PR. The maintainer should be in control of this process. I understand the synchronisation adds friction to the overall architecture and may be the cause of many problems in the future - hence we are starting this discussion and using the technology ourselves to catch these issues asap. Víťo, does this answer your question?
Miro, you are talking about conflicts: I'd say that conflicts on the git level are normal and git has solid tools to resolve them. For the use case of 2 different people changes the same thing, we would treat dist-git as the authoritative place and let the person in source-git know about the conflict. But this can happen nowadays easily as well: 2 different people can open the same PR or even push to dist-git directly while only one would succeed.
Petr, I should have probably stressed that our target is Fedora (or even all Red Hat operating systems). Yes, there are hundreds of distributions and we cannot solve their problems. We are open for collaboration though - we cannot drive changes in distributions which we don't know or use.
Tomas
On Tue, May 5, 2020 at 8:56 AM Petr Pisar ppisar@redhat.com wrote:
On Mon, May 04, 2020 at 05:05:02PM +0200, Tomas Tomecek wrote:
Over the years there have been multiple tools created to improve the development experience: rdopkg [r], rpkg-util [ru], tito [t] and probably much much more (e.g. the way Fedora kernel developers work on kernel [k]).
In the packit project, we work in source-git repositories. These are pretty much upstream repositories combined with Fedora downstream packaging files.
And then come another distribution with a request to combine its dist-git into the upstream. Fedora is not the only distribution. Do you know how many distributions exist? From my point of view as a upstream it's one big NO.
From point of view of a Fedora packager, it's just moving Fedora bits into another repository with the burden of synchronizing that repository with dist-git (and back because of what an authoritative source for Fedora is).
If you want to introduce an intermediary third repository between the upstream and the distribution, a repository that would normalize (read git-ify) the upstream and overlay downstream patches and metadata, then, ehm, it's a nice project for exploration how far we go with unification among the distributions. But I'm quite sceptical regarding it's adoption. But don't take my prognosis seriously. I can be mistaken. There are some positive prior arts like release-monitoring.org.
-- Petr _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Sorry little reant ahead:
In general, merge commits are hideous, they can conceal merge conflict resolution that can a) introduce bugs, b) make it hard to figure out when bugs were introduced by preventing use of git bisect and c) generally make it very hard to understand the history of changes.
In general rebase and squash requirement tend to generate much cleaner changes, and catches errors before they are merged.
Merge commits are useful for gigantic projects like the Linux kernel where A) the maintainers are extremely knowledgeable and can properly resolve merge conflicts B) the people proposing the merge may not actually know how to properly resolve a merge conflict as it is happening outside of their area of expertise C) the feedback loop would be excessive and dwarf the history linearity benefits.
Those projects are rare, using merge commits on small projects is just lazyness that you pay later. It's accruing technical debts unnecessarily.
That said, for non-code projects history may no matter that much, as you do not need to resolve "bugs", and introducing "bugs" via merge commits conflict resolution is rare, so for those it may not matter what do you use to merge in stuff, you are using the SCM just as a central repository and "merge tool" and sometimes as a way to apply release tags.
Whenever using the history is useful I find I *really* hate merge commits as they muddy everything, and make the work of figuring out what happened after the fact (and sometimes *during the fact*) very hard.
Simo.
On Tue, 2020-05-05 at 15:29 +0200, Ondřej Lysoněk wrote:
Tomas Tomecek ttomecek@redhat.com writes:
Thank you all for raising all the questions and concerns.
Before I reply, I'd like to stress that we are still in a prototype phase - not everything is solved (clearly) and at this point, we experiment with the workflow mostly.
Luckily, force-pushes are not allowed in dist-git, which makes the update/sync process easier (knowing that history cannot be changed). Therefore when a new commit lands in dist-git, we'd just "transform" it to source-git and pushed it to the source-git repo. We could even ask all the contributors to rebase their PRs when this happens.
This "rebase all PRs" thing seems to be a recurring theme... What is the reason to ask contributors to rebase? (I mean, are we trying to go back to the days of centralized version control systems?)
In my experience, there is rarely a good reason to rebase (rebasing because the CI system tests the contributor's branch rather than the resulting merge is not a good reason; the CI system should be fixed instead) and asking everyone to rebase just slows things down. Please let's not go down that road, if possible.
Other than that, I like the idea of source git and I'm looking forward to using it, once the synchronization issues are resolved. Thanks for working on it.
Ondřej Lysoněk
On the other hand, when a new commit lands in the source-git repo, we could either transform and push to dist-git directly or open a PR. The maintainer should be in control of this process. I understand the synchronisation adds friction to the overall architecture and may be the cause of many problems in the future - hence we are starting this discussion and using the technology ourselves to catch these issues asap. Víťo, does this answer your question?
Miro, you are talking about conflicts: I'd say that conflicts on the git level are normal and git has solid tools to resolve them. For the use case of 2 different people changes the same thing, we would treat dist-git as the authoritative place and let the person in source-git know about the conflict. But this can happen nowadays easily as well: 2 different people can open the same PR or even push to dist-git directly while only one would succeed.
Petr, I should have probably stressed that our target is Fedora (or even all Red Hat operating systems). Yes, there are hundreds of distributions and we cannot solve their problems. We are open for collaboration though - we cannot drive changes in distributions which we don't know or use.
Tomas
On Tue, May 5, 2020 at 8:56 AM Petr Pisar ppisar@redhat.com wrote:
On Mon, May 04, 2020 at 05:05:02PM +0200, Tomas Tomecek wrote:
Over the years there have been multiple tools created to improve the development experience: rdopkg [r], rpkg-util [ru], tito [t] and probably much much more (e.g. the way Fedora kernel developers work on kernel [k]).
In the packit project, we work in source-git repositories. These are pretty much upstream repositories combined with Fedora downstream packaging files.
And then come another distribution with a request to combine its dist-git into the upstream. Fedora is not the only distribution. Do you know how many distributions exist? From my point of view as a upstream it's one big NO.
From point of view of a Fedora packager, it's just moving Fedora bits into another repository with the burden of synchronizing that repository with dist-git (and back because of what an authoritative source for Fedora is).
If you want to introduce an intermediary third repository between the upstream and the distribution, a repository that would normalize (read git-ify) the upstream and overlay downstream patches and metadata, then, ehm, it's a nice project for exploration how far we go with unification among the distributions. But I'm quite sceptical regarding it's adoption. But don't take my prognosis seriously. I can be mistaken. There are some positive prior arts like release-monitoring.org.
-- Petr _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
On Tue, May 5, 2020 at 12:13 PM Florian Weimer fweimer@redhat.com wrote:
- Tomas Tomecek:
Florian, a very good point. Yes, we are planning to support GitLab - we have a GSoC project for it:
Is a GSoC project really the appropriate vehicle for this?
I cannot answer that. We just submitted it as an idea to the Fedora GSoC project list and it was accepted.
I'd like to clarify that it was proposed because we got requests [1] in packit upstream from people who are using gitlab and would love to use packit. The fact that Fedora decided for Gitlab and the GSoC project proposal are unrelated.
[1] https://github.com/packit-service/packit-service/issues/249
On Tue, May 5, 2020 at 3:29 PM Ondřej Lysoněk olysonek@redhat.com wrote:
This "rebase all PRs" thing seems to be a recurring theme... What is the reason to ask contributors to rebase? (I mean, are we trying to go back to the days of centralized version control systems?)
In my experience, there is rarely a good reason to rebase (rebasing because the CI system tests the contributor's branch rather than the resulting merge is not a good reason; the CI system should be fixed instead) and asking everyone to rebase just slows things down. Please let's not go down that road, if possible.
Agreed on the CI part. That should either be available for configuration to the maintainers or be strictly set to rebase the PR against master before building and testing it. One thing to note here if you want to allow contributors to change CI and build system configuration.
One strong argument for rebasing PRs is to have a clean upstream git history.
Other than that, I like the idea of source git and I'm looking forward to using it, once the synchronization issues are resolved. Thanks for working on it.
Thank you for your interest!
Tomas
On Tue, May 5, 2020 at 1:41 PM Petr Pisar ppisar@redhat.com wrote:
On Tue, May 05, 2020 at 12:41:06PM +0200, Tomas Tomecek wrote:
Petr, I should have probably stressed that our target is Fedora (or even all Red Hat operating systems). Yes, there are hundreds of distributions and we cannot solve their problems. We are open for collaboration though - we cannot drive changes in distributions which we don't know or use.
If you only target Fedora, then it means that the same amount of Fedora maintainers will maintain twofold amount of repositories. Does it indeed save work? What's the benefit of maintaining more repositories?
My personal expectation here would be that if I enabled source-git for my packages, I wouldn't want to touch dist-git and only work in the source-git repos. Yes, there would still be changes coming to dist-git, and I'd inspect those from source-git. I'd even ask contributors to use source-git for PR contributions if possible.
Therefore I assumed you had targeted more distribution to share and externilize the maintenance.
Maybe my problem is that I don't buy your argument that if Fedora dist-git looks as Github, then Fedora will attract new packagers.
As I said, it's a prototype and we'll see.
Thanks for your feedback, Tomas
On Tue, May 5, 2020 at 1:09 PM clime clime@fedoraproject.org wrote:
Imho, it would be nice if this could live on src.fp.o in a separate dedicated namespace for source repos.
Agreed, that would be ideal!
Tomas
On Tue, May 5, 2020 at 1:04 PM Miro Hrončok mhroncok@redhat.com wrote:
On 05. 05. 20 12:41, Tomas Tomecek wrote:
Before I reply, I'd like to stress that we are still in a prototype phase - not everything is solved (clearly) and at this point, we experiment with the workflow mostly.
Experimenting is cool. Go ahead. As long as this is totally opt-in and does not affect me as a contributor even if the main package maintainer chooses to use it, all is good.
That's how we want to approach this. Make it completely opt-in while keeping the present workflow intact.
Luckily, force-pushes are not allowed in dist-git, which makes the update/sync process easier (knowing that history cannot be changed). Therefore when a new commit lands in dist-git, we'd just "transform" it to source-git and pushed it to the source-git repo. We could even ask all the contributors to rebase their PRs when this happens. On the other hand, when a new commit lands in the source-git repo, we could either transform and push to dist-git directly or open a PR. The maintainer should be in control of this process.
That sounds overly complicated. I thought the idea is to make thing easier, what am I missing?
I'm sorry but I'm unable to comment here.
Petr, I should have probably stressed that our target is Fedora (or even all Red Hat operating systems). Yes, there are hundreds of distributions and we cannot solve their problems. We are open for collaboration though - we cannot drive changes in distributions which we don't know or use.
Do I understand correctly that you no longer aim for the "put Fedora spec files upstream" thing, but rather, "create an intermediate upstream sources / fedora scpec file" git repo hybrid? What's the benefit?
Well, having spec file upstream is still the ideal case (or at least utilizing the respective Fedora spec during the upstream development). Sadly, it's not feasible for many projects, so yes, we propose to have this intermediate place.
Benefits?
* One works with real source files and not with `SHA512 (nyancat-1.5.2.tar.gz) = 8eee5da8afacdbe8b6b5f6...` * I can easily pull commits from upstream if needed * I can also easily propose patches upstream from such a repository * Updating to latest upstream release is no longer such a pain - rebasing patch files is gone * I can work with the package as I was working in the upstream repository * When a contribution comes, I suddenly review real changes and not patch files
See for example with Python. We have some patches (although we are trying to get rid of them) and rebasing those has proven to be quite tedious with patch files only.
Hence, we keep this fork: https://github.com/fedora-python/cpython
It has our patches on top. When new version is released, we rebase our branch with git. We format-patch the commits and put them to dist git.
If I had time, I'd create automation that does this for me. Unfortunately I don't and I follow https://xkcd.com/1205/
In what way does keeping the spec file in our fork help us?
(speechless for like a minute)
Don't you wanna create (S)RPMs out of that repository? Don't you wanna be sure that when you add a change to that repository it builds fine on rawhide and the latest stable fedora?
Tomas
On 05. 05. 20 18:12, Tomas Tomecek wrote:
Benefits?
- One works with real source files and not with `SHA512
(nyancat-1.5.2.tar.gz) = 8eee5da8afacdbe8b6b5f6...`
- I can easily pull commits from upstream if needed
- I can also easily propose patches upstream from such a repository
- Updating to latest upstream release is no longer such a pain -
rebasing patch files is gone
- I can work with the package as I was working in the upstream repository
- When a contribution comes, I suddenly review real changes and not patch files
See for example with Python. We have some patches (although we are trying to get rid of them) and rebasing those has proven to be quite tedious with patch files only.
Hence, we keep this fork:https://github.com/fedora-python/cpython
It has our patches on top. When new version is released, we rebase our branch with git. We format-patch the commits and put them to dist git.
If I had time, I'd create automation that does this for me. Unfortunately I don't and I followhttps://xkcd.com/1205/
In what way does keeping the spec file in our fork help us?
(speechless for like a minute)
I don't really understand this comment. Speechless because our workflow is tedious?
Don't you wanna create (S)RPMs out of that repository? Don't you wanna be sure that when you add a change to that repository it builds fine on rawhide and the latest stable fedora?
That would be cool. I don't understand why do I have to keep the spec file in there for that.
On Mon, May 4, 2020 at 5:06 PM Tomas Tomecek ttomecek@redhat.com wrote:
Hi Tomas,
I'll respond below with some of my experiences and opinions ...
Let’s talk about dist-git, as a place where we work. For us, packagers, it’s a well-known place. Yet for newcomers, it may take a while to learn all the details. Even though we operate with projects in a dist-git repository, the layout doesn’t resemble the respective upstream project.
There is a multitude of tasks we tend to perform in a dist-git repo:
- Bumping a release field for sake of a rebuild.
- Updating to the latest upstream release.
- Resolving CVEs.
- Fixing bugs by…
- Changing a spec file.
- Pulling a commit from upstream.
- Or even backporting a commit.
- And more...
For some tasks, the workflow is just fine and pretty straightforward. But for the other, it’s very gruesome - the moment you need to touch patch files, the horror comes in. The fact that we operate with patch files, in a git repository, is just mind-boggling to me.
Luckily, we have tooling which supports the repository layout - `fedpkg prep`, `srpm` or `mockbuild` are such handy commands - you can easily inspect the source tree or make sure your local change builds.
Where am I getting with this?
Over the years there have been multiple tools created to improve the development experience: rdopkg [r], rpkg-util [ru], tito [t] and probably much much more (e.g. the way Fedora kernel developers work on kernel [k]).
(snip)
In the packit project, we work in source-git repositories. These are pretty much upstream repositories combined with Fedora downstream packaging files. An example: I recently added a project called nyancat [n] to Fedora. I have worked [w] on packaging the project in the GitHub repo and then just pushed the changes to dist-git using packit tooling. These source-git repositories can live anywhere: we have support for GitHub right now and are working on supporting pagure.
Would there be an interest within the community, as opt-in, to have such source-git repositories created for respective dist-git repositories? The idea is that you would work in the source-git repo and then let packit handle synchronization with a respective dist-git repo. Our aim is to provide the contribution experience you have in GitHub when working on your packages. Dist-git would still be the authoritative source and a place where official builds are done - the source-git repo would work as a way to collaborate. We also don’t have plans right now to integrate packit into fedpkg.
So, in my experience, source-git might be a workable solution for packages with *big* downstream modifications. However, for everything else, I think it's a way to make it easy to accrue technical debt and to do cargo-culting with downstream patches.
The vast majority of packages has *no patches* (or at most, one or two of them), and uses *unmodified upstream sources / tarballs*. I never want to deal with exploded upstream sources, unless I'm creating a patch for something.
When it's an upstream commit that applies cleanly to the latest sources, I'll just add it in the .spec file, and let the tooling handle the rest. It's pretty neat to directly link to upstream commits (it works with github and gitlab and pagure, as far as I know), and let our tooling (spectool, fedpkg) do everything else. I don't have to download, patch, or touch sources myself in any way for that.
When I need to make changes that I am able to push back upstream, I don't do that in packaging, but fork upstream, do my changes, create a pull request, and again point my .spec file to the patches from there. No need to touch dist-git there, instead I'm working closely with upstream.
In the rare occasion that I need to make downstream-only changes with patches, I usually just explode the upstream tarball, run "git init", then "git add .", "git commit -m import", apply my changes, and then do "git diff --patch > ../00-my-changes.patch" (if it's just one commit), or "git format-patch -o ../" if there are multiple commits, and then delete the exploded sources again.
I maintain ~400 packages in fedora, and the only one with substantial downstream modifications (about 10 patches on top of upstream) is Jekyll (rubygem-jekyll), where I primarily disable tests for features that are not enabled in fedora anyway (if I didn't want to run any tests, I could just drop the number of patches to 1 or 2, making the fork unnecessary - but I like running the tests).
So while I agree that for *some* packages with *huge*, non-upstreamable diffs between upstream and fedora the source-git approach might work, I doubt that it would help in 99% of cases, or even make it too easy for packagers to make more and more downstream-only changes.
Fabio
The main reason I am sending this is to gather feedback from all of you whether there is an interest in such a workflow. We don’t have concrete plans for Fedora right now but based on your feedback we could.
[r] https://github.com/softwarefactory-project/rdopkg [ru] https://pagure.io/rpkg-util [t] https://github.com/rpm-software-management/tito [k] https://lists.fedoraproject.org/archives/list/infrastructure@lists.fedorapro... [n] https://github.com/TomasTomecek/nyancat [w] https://github.com/TomasTomecek/nyancat/pull/2
Cheers, Tomas _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
On Tue, 2020-05-05 at 17:45 +0200, Tomas Tomecek wrote:
On Tue, May 5, 2020 at 1:41 PM Petr Pisar ppisar@redhat.com wrote:
On Tue, May 05, 2020 at 12:41:06PM +0200, Tomas Tomecek wrote:
Petr, I should have probably stressed that our target is Fedora (or even all Red Hat operating systems). Yes, there are hundreds of distributions and we cannot solve their problems. We are open for collaboration though - we cannot drive changes in distributions which we don't know or use.
If you only target Fedora, then it means that the same amount of Fedora maintainers will maintain twofold amount of repositories. Does it indeed save work? What's the benefit of maintaining more repositories?
My personal expectation here would be that if I enabled source-git for my packages, I wouldn't want to touch dist-git and only work in the source-git repos. Yes, there would still be changes coming to dist-git, and I'd inspect those from source-git. I'd even ask contributors to use source-git for PR contributions if possible.
To give a provenpackager perspective on this - it rarely turns out to be possible. Usually when we need to touch someone else's package, it's to deal with an urgent problem - say an unannounced soname bump that requires a bunch of packages to be rebuilt, a bug preventing a nightly compose from running or causing a serious problem in it, something like that.
In those situations we usually want to fix the problem *now*, not "whenever someone has time to review the 'upstream' PR and merge it and do whatever they have to do to trigger a build 'downstream'".
So when I'm trying to fix an urgent issue in a package that tries to keep its spec file elsewhere, I usually just fix it in dist-git and issue apologies later. I don't see a way this is ever going to not be the case unless you give all provenpackagers commit rights to the 'upstream' repo, or have a completely automated PR merging system that also triggers a downstream build, or something like that.
On Tue, May 05, 2020 at 09:42:22AM -0700, Adam Williamson wrote:
On Tue, 2020-05-05 at 17:45 +0200, Tomas Tomecek wrote:
On Tue, May 5, 2020 at 1:41 PM Petr Pisar ppisar@redhat.com wrote:
On Tue, May 05, 2020 at 12:41:06PM +0200, Tomas Tomecek wrote:
Petr, I should have probably stressed that our target is Fedora (or even all Red Hat operating systems). Yes, there are hundreds of distributions and we cannot solve their problems. We are open for collaboration though - we cannot drive changes in distributions which we don't know or use.
If you only target Fedora, then it means that the same amount of Fedora maintainers will maintain twofold amount of repositories. Does it indeed save work? What's the benefit of maintaining more repositories?
My personal expectation here would be that if I enabled source-git for my packages, I wouldn't want to touch dist-git and only work in the source-git repos. Yes, there would still be changes coming to dist-git, and I'd inspect those from source-git. I'd even ask contributors to use source-git for PR contributions if possible.
To give a provenpackager perspective on this - it rarely turns out to be possible. Usually when we need to touch someone else's package, it's to deal with an urgent problem - say an unannounced soname bump that requires a bunch of packages to be rebuilt, a bug preventing a nightly compose from running or causing a serious problem in it, something like that.
In those situations we usually want to fix the problem *now*, not "whenever someone has time to review the 'upstream' PR and merge it and do whatever they have to do to trigger a build 'downstream'".
So when I'm trying to fix an urgent issue in a package that tries to keep its spec file elsewhere, I usually just fix it in dist-git and issue apologies later.
IME that isn't really a huge problem. We maintain master libvirt spec upstream, and when a provenpackager has had to make a critical change in Fedora it wasn't really a burden on us. We've just synced the change upstream ourselves after the fact, and thereafter everything was fine again. The kind of changes that provenpackagers are doing are usually pretty simple and easily understood & resolved.
Larger invasive changes (updating spec file to use new best practice for python macros last year was an example), are not things that are time critical. So in those cases it is more reasonable to require going to the master source-git repo.
I don't see a way this is ever going to not be
the case unless you give all provenpackagers commit rights to the 'upstream' repo, or have a completely automated PR merging system that also triggers a downstream build, or something like that.
I think both those options would be more trouble than the problem they're trying to solve. As long as need for provenpackager emergency fixes is pretty infrequent, it is easier to just accept them making quick fixes to dist-git and sync back to source-git manually after the fact.
Regards, Daniel
On Mon, May 4, 2020 at 11:06 AM Tomas Tomecek ttomecek@redhat.com wrote:
Let’s talk about dist-git, as a place where we work. For us, packagers, it’s a well-known place. Yet for newcomers, it may take a while to learn all the details. Even though we operate with projects in a dist-git repository, the layout doesn’t resemble the respective upstream project.
There is a multitude of tasks we tend to perform in a dist-git repo:
- Bumping a release field for sake of a rebuild.
- Updating to the latest upstream release.
- Resolving CVEs.
- Fixing bugs by…
- Changing a spec file.
- Pulling a commit from upstream.
- Or even backporting a commit.
- And more...
For some tasks, the workflow is just fine and pretty straightforward. But for the other, it’s very gruesome - the moment you need to touch patch files, the horror comes in. The fact that we operate with patch files, in a git repository, is just mind-boggling to me.
Luckily, we have tooling which supports the repository layout - `fedpkg prep`, `srpm` or `mockbuild` are such handy commands - you can easily inspect the source tree or make sure your local change builds.
Where am I getting with this?
Over the years there have been multiple tools created to improve the development experience: rdopkg [r], rpkg-util [ru], tito [t] and probably much much more (e.g. the way Fedora kernel developers work on kernel [k]).
In the packit project, we work in source-git repositories. These are pretty much upstream repositories combined with Fedora downstream packaging files. An example: I recently added a project called nyancat [n] to Fedora. I have worked [w] on packaging the project in the GitHub repo and then just pushed the changes to dist-git using packit tooling. These source-git repositories can live anywhere: we have support for GitHub right now and are working on supporting pagure.
Would there be an interest within the community, as opt-in, to have such source-git repositories created for respective dist-git repositories? The idea is that you would work in the source-git repo and then let packit handle synchronization with a respective dist-git repo. Our aim is to provide the contribution experience you have in GitHub when working on your packages. Dist-git would still be the authoritative source and a place where official builds are done - the source-git repo would work as a way to collaborate. We also don’t have plans right now to integrate packit into fedpkg.
The main reason I am sending this is to gather feedback from all of you whether there is an interest in such a workflow. We don’t have concrete plans for Fedora right now but based on your feedback we could.
Hello Tomas,
I have a fair bit of experience with operating in both so-called "source-git" and "dist-git" workflows. I've known them by the names of "merged-source" and "split-source" trees respectively, so forgive me if I use that terminology, since it makes conveying the point a bit easier.
In the merged-source world, the packaging is an aspect of managing the software codebase. This is common in Debian and ALT Linux, where the standard practice with their tooling is to fork the codebase and integrate the packaging files into the tree. Changes then are managed as part of evolving the sources, and packaging is mainly touched when preparing to push to build. And for $DAYJOB, I've implemented this model for software that $DAYJOB makes (we use the split-source model for stuff we didn't write).
Obviously, you understand the advantages of this approach (managing patches is easier as Git commits, you have access to rebase and merge logic for code, etc.). However, in my experience seeing these in use at a large scale, the major downside is that it inhibits the need to work with the software developers of the project to contribute improvements. Sometimes this is unavoidable (the RHEL ipa, kernel, rpm, samba, and systemd packages come to mind here), but most of the time, I don't see these large fork trees being necessary in RHEL or Fedora. In general, where I've seen this implemented on a distro-wide scale, the contribution levels from the distribution drop by a large margin. There is also the added issue of it becoming a lot more difficult to sort through the differences between upstream and downstream changes. They all look the same in the merged-source model, which makes it hard for others to discover Fedora-only changes and potentially help to bring those changes upstream.
In the split-source model, it is very clear what changes are downstream in Fedora only. The downstream changes are all patches, and moving to new versions often requires dealing with that patch set, evaluating what is still needed and doing the required technical work to support moving forward. This minor wrinkle is often enough to get packagers to get in touch with upstream projects and communicate with them. Most people need that tiny bit of extra friction to be pushed to contribute upstream, especially some of those who work on Fedora because they have to, not because they want to. It's also easy to tell at a glance whether a package is "messy" or not, because you can easily tell how much downstream work is required to make it suitable for Fedora. At least from my perspective, the patch load is a factor in judging how difficult a package is to maintain. The split-source model ultimately makes it clear who is responsible for behavioral changes to a package. If it's the result of a downstream patch, it is our fault. If it isn't, it's upstream's. The merged-source model makes this determination much harder. Not impossible, but harder.
Am I completely against the idea of optionally offering merged-source trees for packaging in Fedora? No. But merged-source requires a lot more discipline than split-source, and I'd like for us to figure out technical and social solutions for encouraging that we clearly identify upstream/downstream changes in merged-source package trees and provide a means to encourage people to continue to stay close to upstream projects[1] as part of using a merged-source/source-git model.
There is also that any source-git/merged-source model would require forking into Fedora's server (src.fedoraproject.org) in a new namespace (sources) and have the same restrictions that the split-source/dist-git model has (no rebasing, no branch deletion, no tag updating, etc.). Not doing so would cause major problems in terms of reproducible builds, but this also makes working with the source tree a lot more painful. Perhaps if we never directly built from it and exported released sources as tarballs, then it wouldn't be necessary, but those are details to figure out if we move forward with this idea.
[1]: https://fedoraproject.org/wiki/Staying_close_to_upstream_projects
-- 真実はいつも一つ!/ Always, there's only one truth!
* Neal Gompa:
In the merged-source world, the packaging is an aspect of managing the software codebase. This is common in Debian and ALT Linux, where the standard practice with their tooling is to fork the codebase and integrate the packaging files into the tree. Changes then are managed as part of evolving the sources, and packaging is mainly touched when preparing to push to build. And for $DAYJOB, I've implemented this model for software that $DAYJOB makes (we use the split-source model for stuff we didn't write).
This is not an accurate representation of what Debian does. The guidelines and tools very much encourage broken-out patches. The representation is slightly different (via the “debian” subdirectory in a source tree), but this does not mean that you can just change files outside the “debian” directory (i.e., upstream sources), build the Debian SRPM equivalent, and have it built.
Thanks, Florian
On Tue, May 5, 2020 at 1:33 PM Florian Weimer fweimer@redhat.com wrote:
- Neal Gompa:
In the merged-source world, the packaging is an aspect of managing the software codebase. This is common in Debian and ALT Linux, where the standard practice with their tooling is to fork the codebase and integrate the packaging files into the tree. Changes then are managed as part of evolving the sources, and packaging is mainly touched when preparing to push to build. And for $DAYJOB, I've implemented this model for software that $DAYJOB makes (we use the split-source model for stuff we didn't write).
This is not an accurate representation of what Debian does. The guidelines and tools very much encourage broken-out patches. The representation is slightly different (via the “debian” subdirectory in a source tree), but this does not mean that you can just change files outside the “debian” directory (i.e., upstream sources), build the Debian SRPM equivalent, and have it built.
Debian *does* have this merged-source model. There are two variants of this model: * merged source with patch trees (debian 2.0/3.0 formats) * merged source with no patch trees (debian 1.0 format)
There is no singular SRPM equivalent, this differs across variants: * singular source tarball (debian 1.0 format) * source tarball + compressed super-patch (debian 2.0 format) * source tarball + debian folder tarball (debian 3.0 format)
The 3.0 source format is the closest to our model.
-- 真実はいつも一つ!/ Always, there's only one truth!
Thanks for sharing that, Simo. I respect your opinion.
Believe me, I do like my git history clean and I go to great lengths to keep it clean. I certainly don't think I'm lazy when it comes to that. However, I don't think that maintaining a linear history helps with readability and understandability of the history. Quite the opposite - keeping the branching helps to see which changes are related and in what context a certain change was made.
Just to be clear, when I say "rebase", I literally mean the act of changing the merge base. On the other hand, squashing commits and similar things to have a nice series of logical changes is a must for me.
But I don't want to veer too much off topic here. I believe this was discussed at length in a LWN article (which just happens to be centered around the Linux kernel ;)). https://lwn.net/Articles/791284/
Ondřej
Simo Sorce simo@redhat.com writes:
Sorry little reant ahead:
In general, merge commits are hideous, they can conceal merge conflict resolution that can a) introduce bugs, b) make it hard to figure out when bugs were introduced by preventing use of git bisect and c) generally make it very hard to understand the history of changes.
In general rebase and squash requirement tend to generate much cleaner changes, and catches errors before they are merged.
Merge commits are useful for gigantic projects like the Linux kernel where A) the maintainers are extremely knowledgeable and can properly resolve merge conflicts B) the people proposing the merge may not actually know how to properly resolve a merge conflict as it is happening outside of their area of expertise C) the feedback loop would be excessive and dwarf the history linearity benefits.
Those projects are rare, using merge commits on small projects is just lazyness that you pay later. It's accruing technical debts unnecessarily.
That said, for non-code projects history may no matter that much, as you do not need to resolve "bugs", and introducing "bugs" via merge commits conflict resolution is rare, so for those it may not matter what do you use to merge in stuff, you are using the SCM just as a central repository and "merge tool" and sometimes as a way to apply release tags.
Whenever using the history is useful I find I *really* hate merge commits as they muddy everything, and make the work of figuring out what happened after the fact (and sometimes *during the fact*) very hard.
Simo.
On Tue, 2020-05-05 at 15:29 +0200, Ondřej Lysoněk wrote:
Tomas Tomecek ttomecek@redhat.com writes:
Thank you all for raising all the questions and concerns.
Before I reply, I'd like to stress that we are still in a prototype phase - not everything is solved (clearly) and at this point, we experiment with the workflow mostly.
Luckily, force-pushes are not allowed in dist-git, which makes the update/sync process easier (knowing that history cannot be changed). Therefore when a new commit lands in dist-git, we'd just "transform" it to source-git and pushed it to the source-git repo. We could even ask all the contributors to rebase their PRs when this happens.
This "rebase all PRs" thing seems to be a recurring theme... What is the reason to ask contributors to rebase? (I mean, are we trying to go back to the days of centralized version control systems?)
In my experience, there is rarely a good reason to rebase (rebasing because the CI system tests the contributor's branch rather than the resulting merge is not a good reason; the CI system should be fixed instead) and asking everyone to rebase just slows things down. Please let's not go down that road, if possible.
Other than that, I like the idea of source git and I'm looking forward to using it, once the synchronization issues are resolved. Thanks for working on it.
Ondřej Lysoněk
On the other hand, when a new commit lands in the source-git repo, we could either transform and push to dist-git directly or open a PR. The maintainer should be in control of this process. I understand the synchronisation adds friction to the overall architecture and may be the cause of many problems in the future - hence we are starting this discussion and using the technology ourselves to catch these issues asap. Víťo, does this answer your question?
Miro, you are talking about conflicts: I'd say that conflicts on the git level are normal and git has solid tools to resolve them. For the use case of 2 different people changes the same thing, we would treat dist-git as the authoritative place and let the person in source-git know about the conflict. But this can happen nowadays easily as well: 2 different people can open the same PR or even push to dist-git directly while only one would succeed.
Petr, I should have probably stressed that our target is Fedora (or even all Red Hat operating systems). Yes, there are hundreds of distributions and we cannot solve their problems. We are open for collaboration though - we cannot drive changes in distributions which we don't know or use.
Tomas
On Tue, May 5, 2020 at 8:56 AM Petr Pisar ppisar@redhat.com wrote:
On Mon, May 04, 2020 at 05:05:02PM +0200, Tomas Tomecek wrote:
Over the years there have been multiple tools created to improve the development experience: rdopkg [r], rpkg-util [ru], tito [t] and probably much much more (e.g. the way Fedora kernel developers work on kernel [k]).
In the packit project, we work in source-git repositories. These are pretty much upstream repositories combined with Fedora downstream packaging files.
And then come another distribution with a request to combine its dist-git into the upstream. Fedora is not the only distribution. Do you know how many distributions exist? From my point of view as a upstream it's one big NO.
From point of view of a Fedora packager, it's just moving Fedora bits into another repository with the burden of synchronizing that repository with dist-git (and back because of what an authoritative source for Fedora is).
If you want to introduce an intermediary third repository between the upstream and the distribution, a repository that would normalize (read git-ify) the upstream and overlay downstream patches and metadata, then, ehm, it's a nice project for exploration how far we go with unification among the distributions. But I'm quite sceptical regarding it's adoption. But don't take my prognosis seriously. I can be mistaken. There are some positive prior arts like release-monitoring.org.
-- Petr _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
-- Simo Sorce RHEL Crypto Team Red Hat, Inc
devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
On Tue, May 5, 2020 at 11:43 AM Adam Williamson adamwill@fedoraproject.org wrote:
On Tue, 2020-05-05 at 17:45 +0200, Tomas Tomecek wrote:
On Tue, May 5, 2020 at 1:41 PM Petr Pisar ppisar@redhat.com wrote:
On Tue, May 05, 2020 at 12:41:06PM +0200, Tomas Tomecek wrote:
Petr, I should have probably stressed that our target is Fedora (or even all Red Hat operating systems). Yes, there are hundreds of distributions and we cannot solve their problems. We are open for collaboration though - we cannot drive changes in distributions which we don't know or use.
If you only target Fedora, then it means that the same amount of Fedora maintainers will maintain twofold amount of repositories. Does it indeed save work? What's the benefit of maintaining more repositories?
My personal expectation here would be that if I enabled source-git for my packages, I wouldn't want to touch dist-git and only work in the source-git repos. Yes, there would still be changes coming to dist-git, and I'd inspect those from source-git. I'd even ask contributors to use source-git for PR contributions if possible.
To give a provenpackager perspective on this - it rarely turns out to be possible. Usually when we need to touch someone else's package, it's to deal with an urgent problem - say an unannounced soname bump that requires a bunch of packages to be rebuilt, a bug preventing a nightly compose from running or causing a serious problem in it, something like that.
In those situations we usually want to fix the problem *now*, not "whenever someone has time to review the 'upstream' PR and merge it and do whatever they have to do to trigger a build 'downstream'".
So when I'm trying to fix an urgent issue in a package that tries to keep its spec file elsewhere, I usually just fix it in dist-git and issue apologies later. I don't see a way this is ever going to not be the case unless you give all provenpackagers commit rights to the 'upstream' repo, or have a completely automated PR merging system that also triggers a downstream build, or something like that.
With the model that the kernel has switched to, this would technically still work. Of course by default, if we didn't pay attention, the next time we do a build it would literally overwrite anything you changed. I expect it is going to be a learning process to get the few people who do actually commit to the kernel to work within the new model, but in the meantime we can work with them to make sure that nothing gets lost. Of course this works better with the fact that there is someone dedicated full time to maintaining the kernel package, and might not work so well with other things packages.
Ideally, at some point, dist-git just gets replaced with src-git as the mechanism for package maintenance.
On Tue, 2020-05-05 at 13:02 -0500, Justin Forbes wrote:
So when I'm trying to fix an urgent issue in a package that tries to keep its spec file elsewhere, I usually just fix it in dist-git and issue apologies later. I don't see a way this is ever going to not be the case unless you give all provenpackagers commit rights to the 'upstream' repo, or have a completely automated PR merging system that also triggers a downstream build, or something like that.
With the model that the kernel has switched to, this would technically still work. Of course by default, if we didn't pay attention, the next time we do a build it would literally overwrite anything you changed.
This is pretty standard in this situation, yeah. Often the fix we're putting in is literally a backport from upstream, so this isn't really an issue, because the next time someone does a build from upstream it'll likely be a new version that includes the change anyway. Otherwise if it's something that feels like it'd be "fragile" I try to send a PR or at least file an issue to get the change ported back upstream.
* Neal Gompa:
On Tue, May 5, 2020 at 1:33 PM Florian Weimer fweimer@redhat.com wrote:
- Neal Gompa:
In the merged-source world, the packaging is an aspect of managing the software codebase. This is common in Debian and ALT Linux, where the standard practice with their tooling is to fork the codebase and integrate the packaging files into the tree. Changes then are managed as part of evolving the sources, and packaging is mainly touched when preparing to push to build. And for $DAYJOB, I've implemented this model for software that $DAYJOB makes (we use the split-source model for stuff we didn't write).
This is not an accurate representation of what Debian does. The guidelines and tools very much encourage broken-out patches. The representation is slightly different (via the “debian” subdirectory in a source tree), but this does not mean that you can just change files outside the “debian” directory (i.e., upstream sources), build the Debian SRPM equivalent, and have it built.
Debian *does* have this merged-source model. There are two variants of this model:
- merged source with patch trees (debian 2.0/3.0 formats)
- merged source with no patch trees (debian 1.0 format)
There is no singular SRPM equivalent, this differs across variants:
- singular source tarball (debian 1.0 format)
1.0 covers both singular source tarball (so-called native package) and tarball plus .diff.gz file.
- source tarball + compressed super-patch (debian 2.0 format)
- source tarball + debian folder tarball (debian 3.0 format)
Actually, 2.0 and 3.0 (quilt) both have broken-out patches. 3.0 (native) is very similar to 1.0 without a separate patch file.
The 3.0 source format is the closest to our model.
And the tools (and not just them) push you gently towards using the 3.0 (quilt) format, so most packages do. And in that case, you cannot simply make changes to the source tree and build a new source package.
Even with the original .orig.tar.gz plus .diff.gz approach, *many* packages had broken-out patches in debian/patches and used various approaches to apply them during the build. The 3.0 (quilt) format just made that part declarative.
Thanks, Florian
Tomas Tomecek ttomecek@redhat.com writes:
Thank you all for raising all the questions and concerns.
Before I reply, I'd like to stress that we are still in a prototype phase - not everything is solved (clearly) and at this point, we experiment with the workflow mostly.
Luckily, force-pushes are not allowed in dist-git,
That's a "current state of affairs" statement, not an ideal, as I understand it. Assuming that force-pushes aren't allowed means we'll never be able to have, e.g., non-distro branches (for testing etc.) that we can force push.
This has been a pain point with RHEL dist-git; among other things, it means that branches can't be deleted.
Thanks, --Robbie
On Mon, May 04, 2020 at 05:05:02PM +0200, Tomas Tomecek wrote:
For some tasks, the workflow is just fine and pretty straightforward. But for the other, it’s very gruesome - the moment you need to touch patch files, the horror comes in. The fact that we operate with patch files, in a git repository, is just mind-boggling to me.
I wouldn't quite call it gruesome, but maybe I've got Stockholm syndrome a little bit. I just experienced this with a package I maintain where there was a build failure caused by changes in glibc headers. There's an upstream patch to fix that. The whole mechanism for pulling that patch from upstream git and then adding it as a separate patch and changing the spec file to apply it does seem somewhat Rube Goldbergian.
So....
Would there be an interest within the community, as opt-in, to have such source-git repositories created for respective dist-git repositories? The idea is that you would work in the source-git repo and then let packit handle synchronization with a respective dist-git repo. Our aim is to provide the contribution experience you have in
... yeah, I'd be very interested in trying this.
Dne 05. 05. 20 v 21:26 Robbie Harwood napsal(a):
Tomas Tomecek ttomecek@redhat.com writes:
Thank you all for raising all the questions and concerns.
Before I reply, I'd like to stress that we are still in a prototype phase - not everything is solved (clearly) and at this point, we experiment with the workflow mostly.
Luckily, force-pushes are not allowed in dist-git,
That's a "current state of affairs" statement, not an ideal, as I understand it. Assuming that force-pushes aren't allowed means we'll never be able to have, e.g., non-distro branches (for testing etc.) that we can force push.
This has been a pain point with RHEL dist-git; among other things, it means that branches can't be deleted.
That this is problem only when you cannot use PRs. If you can use PRs, pushing some random branches into remote git repo is the biggest sin IMO, because while you might delete the branch in remote repo once it is not needed, I have this branch very likely pulled to my repo and the amount of branches in my local repo I have no clue about just rises. So if deleting branches was a point of RHEL dist-git, then this is sad news for me. Pushing branches was probably useful in CVS days, but that should not be the case anymore.
Vít
Thanks, --Robbie
Dne 05. 05. 20 v 18:42 Adam Williamson napsal(a):
On Tue, 2020-05-05 at 17:45 +0200, Tomas Tomecek wrote:
On Tue, May 5, 2020 at 1:41 PM Petr Pisar ppisar@redhat.com wrote:
On Tue, May 05, 2020 at 12:41:06PM +0200, Tomas Tomecek wrote:
Petr, I should have probably stressed that our target is Fedora (or even all Red Hat operating systems). Yes, there are hundreds of distributions and we cannot solve their problems. We are open for collaboration though - we cannot drive changes in distributions which we don't know or use.
If you only target Fedora, then it means that the same amount of Fedora maintainers will maintain twofold amount of repositories. Does it indeed save work? What's the benefit of maintaining more repositories?
My personal expectation here would be that if I enabled source-git for my packages, I wouldn't want to touch dist-git and only work in the source-git repos. Yes, there would still be changes coming to dist-git, and I'd inspect those from source-git. I'd even ask contributors to use source-git for PR contributions if possible.
To give a provenpackager perspective on this - it rarely turns out to be possible. Usually when we need to touch someone else's package, it's to deal with an urgent problem - say an unannounced soname bump that requires a bunch of packages to be rebuilt, a bug preventing a nightly compose from running or causing a serious problem in it, something like that.
In those situations we usually want to fix the problem *now*, not "whenever someone has time to review the 'upstream' PR and merge it and do whatever they have to do to trigger a build 'downstream'".
So when I'm trying to fix an urgent issue in a package that tries to keep its spec file elsewhere, I usually just fix it in dist-git and issue apologies later. I don't see a way this is ever going to not be the case unless you give all provenpackagers commit rights to the 'upstream' repo, or have a completely automated PR merging system that also triggers a downstream build, or something like that.
On this place, I would like to remind this guideline:
https://docs.fedoraproject.org/en-US/packaging-guidelines/#_spec_maintenance...
And I don't think this is in place just due to one off fixes as Adam mentioned, but because of mass cleanup of Fedora .spec files. In recent history, I remember removal of %clean sections, Group tags and removing the scriptlets.
Vít
Dne 05. 05. 20 v 18:37 Fabio Valentini napsal(a):
On Mon, May 4, 2020 at 5:06 PM Tomas Tomecek ttomecek@redhat.com wrote:
Hi Tomas,
I'll respond below with some of my experiences and opinions ...
Let’s talk about dist-git, as a place where we work. For us, packagers, it’s a well-known place. Yet for newcomers, it may take a while to learn all the details. Even though we operate with projects in a dist-git repository, the layout doesn’t resemble the respective upstream project.
There is a multitude of tasks we tend to perform in a dist-git repo:
- Bumping a release field for sake of a rebuild.
- Updating to the latest upstream release.
- Resolving CVEs.
- Fixing bugs by…
- Changing a spec file.
- Pulling a commit from upstream.
- Or even backporting a commit.
- And more...
For some tasks, the workflow is just fine and pretty straightforward. But for the other, it’s very gruesome - the moment you need to touch patch files, the horror comes in. The fact that we operate with patch files, in a git repository, is just mind-boggling to me.
Luckily, we have tooling which supports the repository layout - `fedpkg prep`, `srpm` or `mockbuild` are such handy commands - you can easily inspect the source tree or make sure your local change builds.
Where am I getting with this?
Over the years there have been multiple tools created to improve the development experience: rdopkg [r], rpkg-util [ru], tito [t] and probably much much more (e.g. the way Fedora kernel developers work on kernel [k]).
(snip)
In the packit project, we work in source-git repositories. These are pretty much upstream repositories combined with Fedora downstream packaging files. An example: I recently added a project called nyancat [n] to Fedora. I have worked [w] on packaging the project in the GitHub repo and then just pushed the changes to dist-git using packit tooling. These source-git repositories can live anywhere: we have support for GitHub right now and are working on supporting pagure.
Would there be an interest within the community, as opt-in, to have such source-git repositories created for respective dist-git repositories? The idea is that you would work in the source-git repo and then let packit handle synchronization with a respective dist-git repo. Our aim is to provide the contribution experience you have in GitHub when working on your packages. Dist-git would still be the authoritative source and a place where official builds are done - the source-git repo would work as a way to collaborate. We also don’t have plans right now to integrate packit into fedpkg.
So, in my experience, source-git might be a workable solution for packages with *big* downstream modifications. However, for everything else, I think it's a way to make it easy to accrue technical debt and to do cargo-culting with downstream patches.
The vast majority of packages has *no patches* (or at most, one or two of them)
I don't really want to argue with this point, I tend to agree. Just out of interest, do we have some statistics to support this? O:-)
, and uses *unmodified upstream sources / tarballs*. I never want to deal with exploded upstream sources, unless I'm creating a patch for something.
When it's an upstream commit that applies cleanly to the latest sources, I'll just add it in the .spec file, and let the tooling handle the rest. It's pretty neat to directly link to upstream commits (it works with github and gitlab and pagure, as far as I know), and let our tooling (spectool, fedpkg) do everything else. I don't have to download, patch, or touch sources myself in any way for that.
Unfortunately, in Ruby world, this unfortunately works less and less, because the released packages does not contain test suite these days. So if there is fix for some feature and associated test, then the patch has to be modified (the test part has to be stripped or split out). Otherwise I like this approach as well.
When I need to make changes that I am able to push back upstream, I don't do that in packaging, but fork upstream, do my changes, create a pull request, and again point my .spec file to the patches from there. No need to touch dist-git there, instead I'm working closely with upstream.
In the rare occasion that I need to make downstream-only changes with patches, I usually just explode the upstream tarball, run "git init", then "git add .", "git commit -m import", apply my changes, and then do "git diff --patch > ../00-my-changes.patch" (if it's just one commit), or "git format-patch -o ../" if there are multiple commits, and then delete the exploded sources again.
Having infrastructure for exploding sources from the package would be very interesting.
Vít
I maintain ~400 packages in fedora, and the only one with substantial downstream modifications (about 10 patches on top of upstream) is Jekyll (rubygem-jekyll), where I primarily disable tests for features that are not enabled in fedora anyway (if I didn't want to run any tests, I could just drop the number of patches to 1 or 2, making the fork unnecessary - but I like running the tests).
So while I agree that for *some* packages with *huge*, non-upstreamable diffs between upstream and fedora the source-git approach might work, I doubt that it would help in 99% of cases, or even make it too easy for packagers to make more and more downstream-only changes.
Fabio
The main reason I am sending this is to gather feedback from all of you whether there is an interest in such a workflow. We don’t have concrete plans for Fedora right now but based on your feedback we could.
[r] https://github.com/softwarefactory-project/rdopkg [ru] https://pagure.io/rpkg-util [t] https://github.com/rpm-software-management/tito [k] https://lists.fedoraproject.org/archives/list/infrastructure@lists.fedorapro... [n] https://github.com/TomasTomecek/nyancat [w] https://github.com/TomasTomecek/nyancat/pull/2
Cheers, Tomas _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
On Tue, May 5, 2020 at 6:16 PM Miro Hrončok mhroncok@redhat.com wrote:
In what way does keeping the spec file in our fork help us?
(speechless for like a minute)
I don't really understand this comment. Speechless because our workflow is tedious?
I just couldn't understand why you are asking me about source-git when you already track your downstream patches as git commits :D
Don't you wanna create (S)RPMs out of that repository? Don't you wanna be sure that when you add a change to that repository it builds fine on rawhide and the latest stable fedora?
That would be cool. I don't understand why do I have to keep the spec file in there for that.
You don't.
With these ~20 lines you can get RPM builds for every PR in chroots of your choice:
https://github.com/TomasTomecek/cpython/pull/1
(the build is failing, even SRPM can't be created, seems like there is a file in the repo which can't be processed by tar & gzip, need to take a closer look)
The spec file is being fetched from rawhide's dist-git for every build. Your use case is a little bit more complex since you patch conditionally so we'd probably need a mechanism in packit: 1. not to add 'Patch: 0001...' lines into spec 2. not touch %setup line 3. and map respective git commits to Patch lines within a spec
so that all would work well for your use case - there are also different ways how to solve it but that would be a lot of typing
Tomas
Hi,
Also Fedora is driving a lot of spec syntax enhancements, both at the rpm and the macro layer. Pushing spec files upstream is a sure way to freeze spec syntax in stone and have everything behave in rpm 3.x mode (with rpm 3.x limitations) 20 years from now.
The whole thing is just a variation of the bundling/vendoring aproach, relocate everything in a single private place to avoid the hassle of interacting with the upstreams the project depends on, with the usual result that the apex of the vendored pyramid is well maintained, everything bellow suffers, and the project becomes impossible to contribute to independently without cloning its complex closed garden environment.
Every Fedora package has a dual upstream, the source project for the project code, and Fedora rpm/macro enhancements for the spec code.
Regards,
Dne 06. 05. 20 v 11:19 Tomas Tomecek napsal(a):
On Tue, May 5, 2020 at 6:16 PM Miro Hrončok mhroncok@redhat.com wrote:
In what way does keeping the spec file in our fork help us?
(speechless for like a minute)
I don't really understand this comment. Speechless because our workflow is tedious?
I just couldn't understand why you are asking me about source-git when you already track your downstream patches as git commits :D
Don't you wanna create (S)RPMs out of that repository? Don't you wanna be sure that when you add a change to that repository it builds fine on rawhide and the latest stable fedora?
That would be cool. I don't understand why do I have to keep the spec file in there for that.
You don't.
With these ~20 lines you can get RPM builds for every PR in chroots of your choice:
This is a bit of irony:
~~~
post-upstream-clone: - curl -O https://src.fedoraproject.org/rpms/python3/raw/master/f/python3.spec - curl -O https://src.fedoraproject.org/rpms/python3/raw/master/f/idle3.appdata.xml - curl -O https://src.fedoraproject.org/rpms/python3/raw/master/f/idle3.desktop - curl -O https://src.fedoraproject.org/rpms/python3/raw/master/f/check-pyc-timestamps... # packit will apply the patches itself - sed '/^Patch/d' -i python3.spec # patckit uses %autosetup - and yes, the line below doesn't make sense given # how python3's spec look, this is just a demonstration of packit's functionality - sed '/^%patch/d' -i python3.spec
~~~
So why does not Packit do this by itself? Just the `curl -O https://src.fedoraproject.org/rpms/python3/raw/master/f/python3.spec%60 should be kept in some form ....
Vít
(the build is failing, even SRPM can't be created, seems like there is a file in the repo which can't be processed by tar & gzip, need to take a closer look)
The spec file is being fetched from rawhide's dist-git for every build. Your use case is a little bit more complex since you patch conditionally so we'd probably need a mechanism in packit:
- not to add 'Patch: 0001...' lines into spec
- not touch %setup line
- and map respective git commits to Patch lines within a spec
so that all would work well for your use case - there are also different ways how to solve it but that would be a lot of typing
Tomas _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
On Tue, May 5, 2020 at 7:25 PM Neal Gompa ngompa13@gmail.com wrote:
Hello Tomas,
I have a fair bit of experience with operating in both so-called "source-git" and "dist-git" workflows. I've known them by the names of "merged-source" and "split-source" trees respectively, so forgive me if I use that terminology, since it makes conveying the point a bit easier.
Thank you for the description and context.
I'm actually not a fan of the term "source-git" honestly - I'd love to call it "upstream git" since that's what we are trying to do - use the repository layout which is well-known in the upstream community.
Obviously, you understand the advantages of this approach (managing patches is easier as Git commits, you have access to rebase and merge logic for code, etc.). However, in my experience seeing these in use at a large scale, the major downside is that it inhibits the need to work with the software developers of the project to contribute improvements. Sometimes this is unavoidable (the RHEL ipa, kernel, rpm, samba, and systemd packages come to mind here), but most of the time, I don't see these large fork trees being necessary in RHEL or Fedora. In general, where I've seen this implemented on a distro-wide scale, the contribution levels from the distribution drop by a large margin. There is also the added issue of it becoming a lot more difficult to sort through the differences between upstream and downstream changes. They all look the same in the merged-source model, which makes it hard for others to discover Fedora-only changes and potentially help to bring those changes upstream.
I'd say this is a good point and I still recall as we discussed this in person on Summit last year.
What I really love about Fedora (and Red Hat) is the upstream first principle - when a downstream bug (or just a problem) appears, the maintainers are focused on bringing the solution upstream as the first thing to do. I can still see how people do this and I'm just so proud.
Neal, you're right that with the source-git model, maintainers may be tempted not to bring the changes upstream - I'd say that should be a point where we should introduce changes to the system to motivate people to follow upstream first and help them achieve it.
There is also that any source-git/merged-source model would require forking into Fedora's server (src.fedoraproject.org) in a new namespace (sources) and have the same restrictions that the split-source/dist-git model has (no rebasing, no branch deletion, no tag updating, etc.). Not doing so would cause major problems in terms of reproducible builds, but this also makes working with the source tree a lot more painful. Perhaps if we never directly built from it and exported released sources as tarballs, then it wouldn't be necessary, but those are details to figure out if we move forward with this idea.
+1
Neal, thank you for the great feedback and thorough insights, Tomas
On Wed, May 6, 2020 at 11:31 AM Vít Ondruch vondruch@redhat.com wrote:
This is a bit of irony:
post-upstream-clone: - curl -O https://src.fedoraproject.org/rpms/python3/raw/master/f/python3.spec - curl -O https://src.fedoraproject.org/rpms/python3/raw/master/f/idle3.appdata.xml - curl -O https://src.fedoraproject.org/rpms/python3/raw/master/f/idle3.desktop - curl -O https://src.fedoraproject.org/rpms/python3/raw/master/f/check-pyc-timestamps.py # packit will apply the patches itself - sed '/^Patch/d' -i python3.spec # patckit uses %autosetup - and yes, the line below doesn't make sense given # how python3's spec look, this is just a demonstration of packit's functionality - sed '/^%patch/d' -i python3.spec
So why does not Packit do this by itself? Just the `curl -O https://src.fedoraproject.org/rpms/python3/raw/master/f/python3.spec%60 should be kept in some form ....
Vít
Víťo, you are getting off track here. I'd love to focus the discussion around dist-git and source-git workflows, not talking about the internals of how packit works. If you want to discuss such topic, please start a new thread, or even better, open an upstream issue [1] and we can have the discussion there.
[1] https://github.com/packit-service/packit/issues/new
Thank you, Tomas
On Wed, May 6, 2020 at 10:37 AM Vít Ondruch vondruch@redhat.com wrote:
Dne 05. 05. 20 v 18:37 Fabio Valentini napsal(a):
On Mon, May 4, 2020 at 5:06 PM Tomas Tomecek ttomecek@redhat.com wrote:
Hi Tomas,
I'll respond below with some of my experiences and opinions ...
Let’s talk about dist-git, as a place where we work. For us, packagers, it’s a well-known place. Yet for newcomers, it may take a while to learn all the details. Even though we operate with projects in a dist-git repository, the layout doesn’t resemble the respective upstream project.
There is a multitude of tasks we tend to perform in a dist-git repo:
- Bumping a release field for sake of a rebuild.
- Updating to the latest upstream release.
- Resolving CVEs.
- Fixing bugs by…
- Changing a spec file.
- Pulling a commit from upstream.
- Or even backporting a commit.
- And more...
For some tasks, the workflow is just fine and pretty straightforward. But for the other, it’s very gruesome - the moment you need to touch patch files, the horror comes in. The fact that we operate with patch files, in a git repository, is just mind-boggling to me.
Luckily, we have tooling which supports the repository layout - `fedpkg prep`, `srpm` or `mockbuild` are such handy commands - you can easily inspect the source tree or make sure your local change builds.
Where am I getting with this?
Over the years there have been multiple tools created to improve the development experience: rdopkg [r], rpkg-util [ru], tito [t] and probably much much more (e.g. the way Fedora kernel developers work on kernel [k]).
(snip)
In the packit project, we work in source-git repositories. These are pretty much upstream repositories combined with Fedora downstream packaging files. An example: I recently added a project called nyancat [n] to Fedora. I have worked [w] on packaging the project in the GitHub repo and then just pushed the changes to dist-git using packit tooling. These source-git repositories can live anywhere: we have support for GitHub right now and are working on supporting pagure.
Would there be an interest within the community, as opt-in, to have such source-git repositories created for respective dist-git repositories? The idea is that you would work in the source-git repo and then let packit handle synchronization with a respective dist-git repo. Our aim is to provide the contribution experience you have in GitHub when working on your packages. Dist-git would still be the authoritative source and a place where official builds are done - the source-git repo would work as a way to collaborate. We also don’t have plans right now to integrate packit into fedpkg.
So, in my experience, source-git might be a workable solution for packages with *big* downstream modifications. However, for everything else, I think it's a way to make it easy to accrue technical debt and to do cargo-culting with downstream patches.
The vast majority of packages has *no patches* (or at most, one or two of them)
(snip)
I don't really want to argue with this point, I tend to agree. Just out of interest, do we have some statistics to support this? O:-)
I did not have any stats when I wrote this, but now I do. Parsing the rawhide spec files from [0] for lines matching "^Patch[0-9]*:[ \t]*.*$", I get the following distribution:
number of patches: number of packages total: 21970 0: 15638 1: 3287 2: 1232 3: 598 4: 325 5: 221 6: 154 7: 97 8: 83 9: 57 10: 41 11: 27 12: 26 13: 25 14: 13 15: 13 16: 14 17: 15 18: 5 19: 8 20: 2 21: 11 22: 2 23: 4 24: 4 25: 5 26: 3 27: 4 28: 5 29: 5 30: 2 31: 6 32: 4 33: 3 34: 1 35: 4 37: 2 38: 1 41: 1 42: 2 46: 1 47: 1 48: 3 49: 1 50: 2 51: 1 53: 1 54: 1 66: 1 68: 1 71: 1 75: 1 78: 1 79: 1 85: 1 127: 1 170: 1
In relative terms:
- 71% of packages have ZERO patches - 15% have ONE patch - 5% have TWO patches - 3% have THREE patches - 5% have MORE than THREE patches
Most packages have none (71%) - or at most two - patches (91%, my original "guess" for "vast majority"), some have 3-5 patches (5%), and a minority (4%) has six patches or more. So it seems this backs up my claim :)
Fabio
[0]: https://pkgs.fedoraproject.org/repo/rpm-specs-latest.tar.xz
, and uses *unmodified upstream sources / tarballs*. I never want to deal with exploded upstream sources, unless I'm creating a patch for something.
When it's an upstream commit that applies cleanly to the latest sources, I'll just add it in the .spec file, and let the tooling handle the rest. It's pretty neat to directly link to upstream commits (it works with github and gitlab and pagure, as far as I know), and let our tooling (spectool, fedpkg) do everything else. I don't have to download, patch, or touch sources myself in any way for that.
Unfortunately, in Ruby world, this unfortunately works less and less, because the released packages does not contain test suite these days. So if there is fix for some feature and associated test, then the patch has to be modified (the test part has to be stripped or split out). Otherwise I like this approach as well.
When I need to make changes that I am able to push back upstream, I don't do that in packaging, but fork upstream, do my changes, create a pull request, and again point my .spec file to the patches from there. No need to touch dist-git there, instead I'm working closely with upstream.
In the rare occasion that I need to make downstream-only changes with patches, I usually just explode the upstream tarball, run "git init", then "git add .", "git commit -m import", apply my changes, and then do "git diff --patch > ../00-my-changes.patch" (if it's just one commit), or "git format-patch -o ../" if there are multiple commits, and then delete the exploded sources again.
Having infrastructure for exploding sources from the package would be very interesting.
Vít
I maintain ~400 packages in fedora, and the only one with substantial downstream modifications (about 10 patches on top of upstream) is Jekyll (rubygem-jekyll), where I primarily disable tests for features that are not enabled in fedora anyway (if I didn't want to run any tests, I could just drop the number of patches to 1 or 2, making the fork unnecessary - but I like running the tests).
So while I agree that for *some* packages with *huge*, non-upstreamable diffs between upstream and fedora the source-git approach might work, I doubt that it would help in 99% of cases, or even make it too easy for packagers to make more and more downstream-only changes.
Fabio
The main reason I am sending this is to gather feedback from all of you whether there is an interest in such a workflow. We don’t have concrete plans for Fedora right now but based on your feedback we could.
[r] https://github.com/softwarefactory-project/rdopkg [ru] https://pagure.io/rpkg-util [t] https://github.com/rpm-software-management/tito [k] https://lists.fedoraproject.org/archives/list/infrastructure@lists.fedorapro... [n] https://github.com/TomasTomecek/nyancat [w] https://github.com/TomasTomecek/nyancat/pull/2
Cheers, Tomas _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Dne 05. 05. 20 v 18:37 Fabio Valentini napsal(a):
So, in my experience, source-git might be a workable solution for packages with *big* downstream modifications.
Big +1. Been there, done that (with Tito).
In the rare occasion that I need to make downstream-only changes with patches, I usually just explode the upstream tarball, run "git init", then "git add .", "git commit -m import", apply my changes, and then do "git diff --patch > ../00-my-changes.patch" (if it's just one commit), or "git format-patch -o ../" if there are multiple commits, and then delete the exploded sources again.
When I am forced to do this, I quite often spend a lot of time at resolving conflicts. I can easily spend half of the day on it. While when I am working with source-git I spent like 30 seconds on whole release process.
Dne 04. 05. 20 v 17:05 Tomas Tomecek napsal(a):
The main reason I am sending this is to gather feedback from all of you whether there is an interest in such a workflow.
I am +1 as long as:
a) this is opt-in (cannot imagine anything else) b) you resolve the gordic knot of easy sync of changes from dist-git back to source-git. It must be easy for both maintainer and for proven packager doing the changes directly in dist-git.
Dne 06. 05. 20 v 13:20 Fabio Valentini napsal(a):
On Wed, May 6, 2020 at 10:37 AM Vít Ondruch vondruch@redhat.com wrote:
Dne 05. 05. 20 v 18:37 Fabio Valentini napsal(a):
On Mon, May 4, 2020 at 5:06 PM Tomas Tomecek ttomecek@redhat.com wrote:
Hi Tomas,
I'll respond below with some of my experiences and opinions ...
Let’s talk about dist-git, as a place where we work. For us, packagers, it’s a well-known place. Yet for newcomers, it may take a while to learn all the details. Even though we operate with projects in a dist-git repository, the layout doesn’t resemble the respective upstream project.
There is a multitude of tasks we tend to perform in a dist-git repo:
- Bumping a release field for sake of a rebuild.
- Updating to the latest upstream release.
- Resolving CVEs.
- Fixing bugs by…
- Changing a spec file.
- Pulling a commit from upstream.
- Or even backporting a commit.
- And more...
For some tasks, the workflow is just fine and pretty straightforward. But for the other, it’s very gruesome - the moment you need to touch patch files, the horror comes in. The fact that we operate with patch files, in a git repository, is just mind-boggling to me.
Luckily, we have tooling which supports the repository layout - `fedpkg prep`, `srpm` or `mockbuild` are such handy commands - you can easily inspect the source tree or make sure your local change builds.
Where am I getting with this?
Over the years there have been multiple tools created to improve the development experience: rdopkg [r], rpkg-util [ru], tito [t] and probably much much more (e.g. the way Fedora kernel developers work on kernel [k]).
(snip)
In the packit project, we work in source-git repositories. These are pretty much upstream repositories combined with Fedora downstream packaging files. An example: I recently added a project called nyancat [n] to Fedora. I have worked [w] on packaging the project in the GitHub repo and then just pushed the changes to dist-git using packit tooling. These source-git repositories can live anywhere: we have support for GitHub right now and are working on supporting pagure.
Would there be an interest within the community, as opt-in, to have such source-git repositories created for respective dist-git repositories? The idea is that you would work in the source-git repo and then let packit handle synchronization with a respective dist-git repo. Our aim is to provide the contribution experience you have in GitHub when working on your packages. Dist-git would still be the authoritative source and a place where official builds are done - the source-git repo would work as a way to collaborate. We also don’t have plans right now to integrate packit into fedpkg.
So, in my experience, source-git might be a workable solution for packages with *big* downstream modifications. However, for everything else, I think it's a way to make it easy to accrue technical debt and to do cargo-culting with downstream patches.
The vast majority of packages has *no patches* (or at most, one or two of them)
(snip)
I don't really want to argue with this point, I tend to agree. Just out of interest, do we have some statistics to support this? O:-)
I did not have any stats when I wrote this, but now I do. Parsing the rawhide spec files from [0] for lines matching "^Patch[0-9]*:[ \t]*.*$", I get the following distribution:
number of patches: number of packages total: 21970 0: 15638 1: 3287 2: 1232 3: 598 4: 325 5: 221 6: 154 7: 97 8: 83 9: 57 10: 41 11: 27 12: 26 13: 25 14: 13 15: 13 16: 14 17: 15 18: 5 19: 8 20: 2 21: 11 22: 2 23: 4 24: 4 25: 5 26: 3 27: 4 28: 5 29: 5 30: 2 31: 6 32: 4 33: 3 34: 1 35: 4 37: 2 38: 1 41: 1 42: 2 46: 1 47: 1 48: 3 49: 1 50: 2 51: 1 53: 1 54: 1 66: 1 68: 1 71: 1 75: 1 78: 1 79: 1 85: 1 127: 1 170: 1
In relative terms:
- 71% of packages have ZERO patches
- 15% have ONE patch
- 5% have TWO patches
- 3% have THREE patches
- 5% have MORE than THREE patches
Most packages have none (71%) - or at most two - patches (91%, my original "guess" for "vast majority"), some have 3-5 patches (5%), and a minority (4%) has six patches or more. So it seems this backs up my claim :)
Fabio
Thank you very much indeed!
Vít
Vít Ondruch vondruch@redhat.com writes:
Dne 05. 05. 20 v 21:26 Robbie Harwood napsal(a):
Tomas Tomecek ttomecek@redhat.com writes:
Thank you all for raising all the questions and concerns.
Before I reply, I'd like to stress that we are still in a prototype phase - not everything is solved (clearly) and at this point, we experiment with the workflow mostly.
Luckily, force-pushes are not allowed in dist-git,
That's a "current state of affairs" statement, not an ideal, as I understand it. Assuming that force-pushes aren't allowed means we'll never be able to have, e.g., non-distro branches (for testing etc.) that we can force push.
This has been a pain point with RHEL dist-git; among other things, it means that branches can't be deleted.
That this is problem only when you cannot use PRs. If you can use PRs, pushing some random branches into remote git repo is the biggest sin IMO, because while you might delete the branch in remote repo once it is not needed, I have this branch very likely pulled to my repo and the amount of branches in my local repo I have no clue about just rises. So if deleting branches was a point of RHEL dist-git, then this is sad news for me. Pushing branches was probably useful in CVS days, but that should not be the case anymore.
Well, your workflow is not my workflow.
I very often have to ship test builds (bugfixes, new features, compatibility testing, ...). Yes, the build itself goes in COPR most of the time (or scratch on brewkoji), but the source needs to live somewhere - and I'd prefer it be "not just my laptop".
A branch disappearing on the remote doesn't break anything. You don't lose your local copy. Even a force push is pretty easy to adjust to (git reset or git rebase). This happens all the time for development branches and I honestly doubt you notice. Force pushes are only a problem if you're basing work on the branch.
But sure, maybe I'm sinning by doing my job. More pull requests won't help either way.
Thanks, --Robbie
Dne 06. 05. 20 v 16:15 Robbie Harwood napsal(a):
Vít Ondruch vondruch@redhat.com writes:
Dne 05. 05. 20 v 21:26 Robbie Harwood napsal(a):
Tomas Tomecek ttomecek@redhat.com writes:
Thank you all for raising all the questions and concerns.
Before I reply, I'd like to stress that we are still in a prototype phase - not everything is solved (clearly) and at this point, we experiment with the workflow mostly.
Luckily, force-pushes are not allowed in dist-git,
That's a "current state of affairs" statement, not an ideal, as I understand it. Assuming that force-pushes aren't allowed means we'll never be able to have, e.g., non-distro branches (for testing etc.) that we can force push.
This has been a pain point with RHEL dist-git; among other things, it means that branches can't be deleted.
That this is problem only when you cannot use PRs. If you can use PRs, pushing some random branches into remote git repo is the biggest sin IMO, because while you might delete the branch in remote repo once it is not needed, I have this branch very likely pulled to my repo and the amount of branches in my local repo I have no clue about just rises. So if deleting branches was a point of RHEL dist-git, then this is sad news for me. Pushing branches was probably useful in CVS days, but that should not be the case anymore.
Well, your workflow is not my workflow.
Probably. But applying CVS workflow to Git workflow with PR is probably not the best idea.
I very often have to ship test builds (bugfixes, new features, compatibility testing, ...). Yes, the build itself goes in COPR most of the time (or scratch on brewkoji), but the source needs to live somewhere - and I'd prefer it be "not just my laptop".
Right, then can live in your fork and that does not necessarily means your local copy. More likely it is remote as it commonly understood.
A branch disappearing on the remote doesn't break anything. You don't lose your local copy. Even a force push is pretty easy to adjust to (git reset or git rebase). This happens all the time for development branches and I honestly doubt you notice. Force pushes are only a problem if you're basing work on the branch.
I am not concerned about remote branches disappearing. I am concerned about the complete opposite, when the remote branches appearing in my local copy and not disappearing once the remote copies go.
But sure, maybe I'm sinning by doing my job. More pull requests won't help either way.
Honestly, this is not necessarily about PRs, but about work organization. I would argue the doing pushes into your fork or into the origin makes no difference for the workflow. At the end the changes has to appear somehow in the origin/master and PR is just one of the mechanisms.
But doing pushes into origin/somebranch might have negative impact on my workflow which is not what I like.
It has also negative impact on yourself, because then you want to be able to force push to delete or update the branch, while in my fork, it is not concern at all, because I am free to do there whatever I want, including force push.
Vít
On Wednesday, May 6, 2020 4:35:11 PM CEST Vít Ondruch wrote:
I am not concerned about remote branches disappearing. I am concerned about the complete opposite, when the remote branches appearing in my local copy and not disappearing once the remote copies go.
Isn't this exactly what `git remote prune` and `git fetch --prune` are for?
You can also use `git config` to make it the default behavior if you like.
Kamil
On Wed, 6 May 2020 at 13:21, Fabio Valentini decathorpe@gmail.com wrote:
On Wed, May 6, 2020 at 10:37 AM Vít Ondruch vondruch@redhat.com wrote:
Dne 05. 05. 20 v 18:37 Fabio Valentini napsal(a):
On Mon, May 4, 2020 at 5:06 PM Tomas Tomecek ttomecek@redhat.com wrote:
Hi Tomas,
I'll respond below with some of my experiences and opinions ...
Let’s talk about dist-git, as a place where we work. For us, packagers, it’s a well-known place. Yet for newcomers, it may take a while to learn all the details. Even though we operate with projects in a dist-git repository, the layout doesn’t resemble the respective upstream project.
There is a multitude of tasks we tend to perform in a dist-git repo:
- Bumping a release field for sake of a rebuild.
- Updating to the latest upstream release.
- Resolving CVEs.
- Fixing bugs by…
- Changing a spec file.
- Pulling a commit from upstream.
- Or even backporting a commit.
- And more...
For some tasks, the workflow is just fine and pretty straightforward. But for the other, it’s very gruesome - the moment you need to touch patch files, the horror comes in. The fact that we operate with patch files, in a git repository, is just mind-boggling to me.
Luckily, we have tooling which supports the repository layout - `fedpkg prep`, `srpm` or `mockbuild` are such handy commands - you can easily inspect the source tree or make sure your local change builds.
Where am I getting with this?
Over the years there have been multiple tools created to improve the development experience: rdopkg [r], rpkg-util [ru], tito [t] and probably much much more (e.g. the way Fedora kernel developers work on kernel [k]).
(snip)
In the packit project, we work in source-git repositories. These are pretty much upstream repositories combined with Fedora downstream packaging files. An example: I recently added a project called nyancat [n] to Fedora. I have worked [w] on packaging the project in the GitHub repo and then just pushed the changes to dist-git using packit tooling. These source-git repositories can live anywhere: we have support for GitHub right now and are working on supporting pagure.
Would there be an interest within the community, as opt-in, to have such source-git repositories created for respective dist-git repositories? The idea is that you would work in the source-git repo and then let packit handle synchronization with a respective dist-git repo. Our aim is to provide the contribution experience you have in GitHub when working on your packages. Dist-git would still be the authoritative source and a place where official builds are done - the source-git repo would work as a way to collaborate. We also don’t have plans right now to integrate packit into fedpkg.
So, in my experience, source-git might be a workable solution for packages with *big* downstream modifications. However, for everything else, I think it's a way to make it easy to accrue technical debt and to do cargo-culting with downstream patches.
The vast majority of packages has *no patches* (or at most, one or two of them)
(snip)
I don't really want to argue with this point, I tend to agree. Just out of interest, do we have some statistics to support this? O:-)
I did not have any stats when I wrote this, but now I do. Parsing the rawhide spec files from [0] for lines matching "^Patch[0-9]*:[ \t]*.*$", I get the following distribution:
number of patches: number of packages total: 21970 0: 15638 1: 3287 2: 1232 3: 598 4: 325 5: 221 6: 154 7: 97 8: 83 9: 57 10: 41 11: 27 12: 26 13: 25 14: 13 15: 13 16: 14 17: 15 18: 5 19: 8 20: 2 21: 11 22: 2 23: 4 24: 4 25: 5 26: 3 27: 4 28: 5 29: 5 30: 2 31: 6 32: 4 33: 3 34: 1 35: 4 37: 2 38: 1 41: 1 42: 2 46: 1 47: 1 48: 3 49: 1 50: 2 51: 1 53: 1 54: 1 66: 1 68: 1 71: 1 75: 1 78: 1 79: 1 85: 1 127: 1 170: 1
In relative terms:
- 71% of packages have ZERO patches
- 15% have ONE patch
- 5% have TWO patches
- 3% have THREE patches
- 5% have MORE than THREE patches
Most packages have none (71%) - or at most two - patches (91%, my original "guess" for "vast majority"), some have 3-5 patches (5%), and a minority (4%) has six patches or more. So it seems this backs up my claim :)
These are some great stats!
But I would like to note that exploded repos (or source-git repos) have at least two other advantages.
1) they consume less space than tarballs for each version because objects in git repo are deduplicated 2) instead of downloading/uploading tarballs, you can just do something like: git pull --rebase upstream master; git push
So they are imho more convenient to work with even if you don't have any patches.
Continuing of communication with upstream should not be imposed by crappy experience when maintaining patches. It should be a question of work ethics and avoiding future conflicts with upstream.
clime
Fabio
, and uses *unmodified upstream sources / tarballs*. I never want to deal with exploded upstream sources, unless I'm creating a patch for something.
When it's an upstream commit that applies cleanly to the latest sources, I'll just add it in the .spec file, and let the tooling handle the rest. It's pretty neat to directly link to upstream commits (it works with github and gitlab and pagure, as far as I know), and let our tooling (spectool, fedpkg) do everything else. I don't have to download, patch, or touch sources myself in any way for that.
Unfortunately, in Ruby world, this unfortunately works less and less, because the released packages does not contain test suite these days. So if there is fix for some feature and associated test, then the patch has to be modified (the test part has to be stripped or split out). Otherwise I like this approach as well.
When I need to make changes that I am able to push back upstream, I don't do that in packaging, but fork upstream, do my changes, create a pull request, and again point my .spec file to the patches from there. No need to touch dist-git there, instead I'm working closely with upstream.
In the rare occasion that I need to make downstream-only changes with patches, I usually just explode the upstream tarball, run "git init", then "git add .", "git commit -m import", apply my changes, and then do "git diff --patch > ../00-my-changes.patch" (if it's just one commit), or "git format-patch -o ../" if there are multiple commits, and then delete the exploded sources again.
Having infrastructure for exploding sources from the package would be very interesting.
Vít
I maintain ~400 packages in fedora, and the only one with substantial downstream modifications (about 10 patches on top of upstream) is Jekyll (rubygem-jekyll), where I primarily disable tests for features that are not enabled in fedora anyway (if I didn't want to run any tests, I could just drop the number of patches to 1 or 2, making the fork unnecessary - but I like running the tests).
So while I agree that for *some* packages with *huge*, non-upstreamable diffs between upstream and fedora the source-git approach might work, I doubt that it would help in 99% of cases, or even make it too easy for packagers to make more and more downstream-only changes.
Fabio
The main reason I am sending this is to gather feedback from all of you whether there is an interest in such a workflow. We don’t have concrete plans for Fedora right now but based on your feedback we could.
[r] https://github.com/softwarefactory-project/rdopkg [ru] https://pagure.io/rpkg-util [t] https://github.com/rpm-software-management/tito [k] https://lists.fedoraproject.org/archives/list/infrastructure@lists.fedorapro... [n] https://github.com/TomasTomecek/nyancat [w] https://github.com/TomasTomecek/nyancat/pull/2
Cheers, Tomas _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
On Wed, May 06, 2020 at 08:39:19PM +0200, clime wrote:
But I would like to note that exploded repos (or source-git repos) have at least two other advantages.
- they consume less space than tarballs for each version because
objects in git repo are deduplicated 2) instead of downloading/uploading tarballs, you can just do something like: git pull --rebase upstream master; git push
Just a note that this is not something you can do today since a rebase rewrite history, so you would have to do `git push --force` which isn't allowed currently. So if we were to move forward with this model, we will need to find a solution for the question that has led us to forbid force pushes until now.
Pierre
On Wed, 6 May 2020 at 21:00, Pierre-Yves Chibon pingou@pingoured.fr wrote:
On Wed, May 06, 2020 at 08:39:19PM +0200, clime wrote:
But I would like to note that exploded repos (or source-git repos) have at least two other advantages.
- they consume less space than tarballs for each version because
objects in git repo are deduplicated 2) instead of downloading/uploading tarballs, you can just do something like: git pull --rebase upstream master; git push
Just a note that this is not something you can do today since a rebase rewrite history, so you would have to do `git push --force` which isn't allowed currently.
Good point.
So if we were to move forward with this model, we will need to find a solution for the question that has led us to forbid force pushes until now.
Pierre _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
On Wed, 2020-05-06 at 20:59 +0200, Pierre-Yves Chibon wrote:
On Wed, May 06, 2020 at 08:39:19PM +0200, clime wrote:
But I would like to note that exploded repos (or source-git repos) have at least two other advantages.
- they consume less space than tarballs for each version because
objects in git repo are deduplicated 2) instead of downloading/uploading tarballs, you can just do something like: git pull --rebase upstream master; git push
Just a note that this is not something you can do today since a rebase rewrite history, so you would have to do `git push --force` which isn't allowed currently. So if we were to move forward with this model, we will need to find a solution for the question that has led us to forbid force pushes until now.
Well, a way to allow force pushes would be to have a git hook that branches the tree before the force push. (creating a branch named something like audit-force-push-<timestamp>) That way you can retain data for legal/auditing reasons, while allowing every day history to be rewritten. Not sure how "nice" that would be for an auditor that has to reconstruct what happened over multiple force pushes that way, it also will generate quite an amount of noisy metadata (branches), but it could work.
If the only differences are going to be a bunch of "patch-commits" on top of an upstream tree then you have the same level of churn you have in dist-git + upstream objects count + these new branches metadata noise, form the pov of space used.
* Pierre-Yves Chibon:
On Wed, May 06, 2020 at 08:39:19PM +0200, clime wrote:
But I would like to note that exploded repos (or source-git repos) have at least two other advantages.
- they consume less space than tarballs for each version because
objects in git repo are deduplicated 2) instead of downloading/uploading tarballs, you can just do something like: git pull --rebase upstream master; git push
Just a note that this is not something you can do today since a rebase rewrite history, so you would have to do `git push --force` which isn't allowed currently. So if we were to move forward with this model, we will need to find a solution for the question that has led us to forbid force pushes until now.
You could do a fake merge (git merge -s ours) to include the old master branch in the history, so that from a Git perspective, it's again a fast-forward push.
I'm more concerned that a standard git rebase will not produce great results, producing a history that contains the new upstream version with a bit of cruft on top of it, only some of it actually needed. But it is worth a try.
Thanks, Florian
* Fabio Valentini:
In the rare occasion that I need to make downstream-only changes with patches, I usually just explode the upstream tarball, run "git init", then "git add .", "git commit -m import", apply my changes, and then do "git diff --patch > ../00-my-changes.patch" (if it's just one commit), or "git format-patch -o ../" if there are multiple commits, and then delete the exploded sources again.
That's what we do for glibc, too. In Fedora, we do not have that many patches, but we do in downstream.
Like so many of us, I wrote some fairly elaborate scripts for that:
https://pagure.io/glibc-maintainer-scripts
Carlos wrote some documentation for them:
https://fedoraproject.org/wiki/GlibcRawhideSync
However, they still require training in an unusual process, and there are corner cases which are difficult to fix. Most of that would just go away if the generated Git tree where the primary artifact developers worked with.
There would still be corner cases (especially if dist-git were kept in the background), but I expect that they would not interrupt developer workflow (unlike with the non-authoritative, on the fly repository). They could still produce their patches, submit merge requests, and get reviews and (CI) test results that way. The dist-git update (with a potential merge first in the other direction) would have to be handled by people experienced with that process, though. But for large packages with a lot of backporting activity, that might be fine.
Thanks, Florian
Dne 06. 05. 20 v 20:39 clime napsal(a):
On Wed, 6 May 2020 at 13:21, Fabio Valentini decathorpe@gmail.com wrote:
On Wed, May 6, 2020 at 10:37 AM Vít Ondruch vondruch@redhat.com wrote:
Dne 05. 05. 20 v 18:37 Fabio Valentini napsal(a):
On Mon, May 4, 2020 at 5:06 PM Tomas Tomecek ttomecek@redhat.com wrote:
Hi Tomas,
I'll respond below with some of my experiences and opinions ...
Let’s talk about dist-git, as a place where we work. For us, packagers, it’s a well-known place. Yet for newcomers, it may take a while to learn all the details. Even though we operate with projects in a dist-git repository, the layout doesn’t resemble the respective upstream project.
There is a multitude of tasks we tend to perform in a dist-git repo:
- Bumping a release field for sake of a rebuild.
- Updating to the latest upstream release.
- Resolving CVEs.
- Fixing bugs by…
- Changing a spec file.
- Pulling a commit from upstream.
- Or even backporting a commit.
- And more...
For some tasks, the workflow is just fine and pretty straightforward. But for the other, it’s very gruesome - the moment you need to touch patch files, the horror comes in. The fact that we operate with patch files, in a git repository, is just mind-boggling to me.
Luckily, we have tooling which supports the repository layout - `fedpkg prep`, `srpm` or `mockbuild` are such handy commands - you can easily inspect the source tree or make sure your local change builds.
Where am I getting with this?
Over the years there have been multiple tools created to improve the development experience: rdopkg [r], rpkg-util [ru], tito [t] and probably much much more (e.g. the way Fedora kernel developers work on kernel [k]).
(snip)
In the packit project, we work in source-git repositories. These are pretty much upstream repositories combined with Fedora downstream packaging files. An example: I recently added a project called nyancat [n] to Fedora. I have worked [w] on packaging the project in the GitHub repo and then just pushed the changes to dist-git using packit tooling. These source-git repositories can live anywhere: we have support for GitHub right now and are working on supporting pagure.
Would there be an interest within the community, as opt-in, to have such source-git repositories created for respective dist-git repositories? The idea is that you would work in the source-git repo and then let packit handle synchronization with a respective dist-git repo. Our aim is to provide the contribution experience you have in GitHub when working on your packages. Dist-git would still be the authoritative source and a place where official builds are done - the source-git repo would work as a way to collaborate. We also don’t have plans right now to integrate packit into fedpkg.
So, in my experience, source-git might be a workable solution for packages with *big* downstream modifications. However, for everything else, I think it's a way to make it easy to accrue technical debt and to do cargo-culting with downstream patches.
The vast majority of packages has *no patches* (or at most, one or two of them)
(snip)
I don't really want to argue with this point, I tend to agree. Just out of interest, do we have some statistics to support this? O:-)
I did not have any stats when I wrote this, but now I do. Parsing the rawhide spec files from [0] for lines matching "^Patch[0-9]*:[ \t]*.*$", I get the following distribution:
number of patches: number of packages total: 21970 0: 15638 1: 3287 2: 1232 3: 598 4: 325 5: 221 6: 154 7: 97 8: 83 9: 57 10: 41 11: 27 12: 26 13: 25 14: 13 15: 13 16: 14 17: 15 18: 5 19: 8 20: 2 21: 11 22: 2 23: 4 24: 4 25: 5 26: 3 27: 4 28: 5 29: 5 30: 2 31: 6 32: 4 33: 3 34: 1 35: 4 37: 2 38: 1 41: 1 42: 2 46: 1 47: 1 48: 3 49: 1 50: 2 51: 1 53: 1 54: 1 66: 1 68: 1 71: 1 75: 1 78: 1 79: 1 85: 1 127: 1 170: 1
In relative terms:
- 71% of packages have ZERO patches
- 15% have ONE patch
- 5% have TWO patches
- 3% have THREE patches
- 5% have MORE than THREE patches
Most packages have none (71%) - or at most two - patches (91%, my original "guess" for "vast majority"), some have 3-5 patches (5%), and a minority (4%) has six patches or more. So it seems this backs up my claim :)
These are some great stats!
But I would like to note that exploded repos (or source-git repos) have at least two other advantages.
- they consume less space than tarballs for each version because
objects in git repo are deduplicated 2) instead of downloading/uploading tarballs, you can just do something like: git pull --rebase upstream master; git push
You still need to have tarballs for SRPM. Therefore the exploded sources
actually consumes more space, because you still have tarballs and on top of that you have git repo.
Vít
So they are imho more convenient to work with even if you don't have any patches.
Continuing of communication with upstream should not be imposed by crappy experience when maintaining patches. It should be a question of work ethics and avoiding future conflicts with upstream.
clime
Fabio
, and uses *unmodified upstream sources / tarballs*. I never want to deal with exploded upstream sources, unless I'm creating a patch for something.
When it's an upstream commit that applies cleanly to the latest sources, I'll just add it in the .spec file, and let the tooling handle the rest. It's pretty neat to directly link to upstream commits (it works with github and gitlab and pagure, as far as I know), and let our tooling (spectool, fedpkg) do everything else. I don't have to download, patch, or touch sources myself in any way for that.
Unfortunately, in Ruby world, this unfortunately works less and less, because the released packages does not contain test suite these days. So if there is fix for some feature and associated test, then the patch has to be modified (the test part has to be stripped or split out). Otherwise I like this approach as well.
When I need to make changes that I am able to push back upstream, I don't do that in packaging, but fork upstream, do my changes, create a pull request, and again point my .spec file to the patches from there. No need to touch dist-git there, instead I'm working closely with upstream.
In the rare occasion that I need to make downstream-only changes with patches, I usually just explode the upstream tarball, run "git init", then "git add .", "git commit -m import", apply my changes, and then do "git diff --patch > ../00-my-changes.patch" (if it's just one commit), or "git format-patch -o ../" if there are multiple commits, and then delete the exploded sources again.
Having infrastructure for exploding sources from the package would be very interesting.
Vít
I maintain ~400 packages in fedora, and the only one with substantial downstream modifications (about 10 patches on top of upstream) is Jekyll (rubygem-jekyll), where I primarily disable tests for features that are not enabled in fedora anyway (if I didn't want to run any tests, I could just drop the number of patches to 1 or 2, making the fork unnecessary - but I like running the tests).
So while I agree that for *some* packages with *huge*, non-upstreamable diffs between upstream and fedora the source-git approach might work, I doubt that it would help in 99% of cases, or even make it too easy for packagers to make more and more downstream-only changes.
Fabio
The main reason I am sending this is to gather feedback from all of you whether there is an interest in such a workflow. We don’t have concrete plans for Fedora right now but based on your feedback we could.
[r] https://github.com/softwarefactory-project/rdopkg [ru] https://pagure.io/rpkg-util [t] https://github.com/rpm-software-management/tito [k] https://lists.fedoraproject.org/archives/list/infrastructure@lists.fedorapro... [n] https://github.com/TomasTomecek/nyancat [w] https://github.com/TomasTomecek/nyancat/pull/2
Cheers, Tomas _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
On 06. 05. 20 11:19, Tomas Tomecek wrote:
On Tue, May 5, 2020 at 6:16 PM Miro Hrončokmhroncok@redhat.com wrote:
In what way does keeping the spec file in our fork help us?
(speechless for like a minute)
I don't really understand this comment. Speechless because our workflow is tedious?
I just couldn't understand why you are asking me about source-git when you already track your downstream patches as git commits :D
Notable difference: the github.com/fedora-python/cpython repo is an implementation detail to the maintainer(s). E.g. I use it when I need to rebase the patches and there the "officiality" ends. I.e.:
1) I push --force in there with each rebase. (and yes, I tag the old HEADs, but that just me being a nerd)
2) Any nonpatch spec changes don't require this repo, nor the knowledge of it.
3) Drive-by contributors and provenpackagers don't require this repo, nor the knowledge of it.
4) No synchronization back and forth is needed. I use the repo to generate patches -> it's a tool, not a canonical source. If deleted/lost, it can be recreated from dist-git at any time.
5) In an unlikely event of a drive-by contributor needing to touch the patches, the repo is in broken state and I've made a conscious decision that this is an acceptable trade off, considering the (extremely sparse) history of drive-by contributors changing patches in Fedora Python package.
For me, this resembles the "explode sources, git init, git add ." approach on steroids, nothing more.
Dne čt 7. kvě 2020 12:19 uživatel Vít Ondruch vondruch@redhat.com napsal:
Dne 06. 05. 20 v 20:39 clime napsal(a):
On Wed, 6 May 2020 at 13:21, Fabio Valentini decathorpe@gmail.com
wrote:
On Wed, May 6, 2020 at 10:37 AM Vít Ondruch vondruch@redhat.com
wrote:
Dne 05. 05. 20 v 18:37 Fabio Valentini napsal(a):
On Mon, May 4, 2020 at 5:06 PM Tomas Tomecek ttomecek@redhat.com
wrote:
Hi Tomas,
I'll respond below with some of my experiences and opinions ...
Let’s talk about dist-git, as a place where we work. For us, packagers, it’s a well-known place. Yet for newcomers, it may take a while to learn all the details. Even though we operate with projects in a dist-git repository, the layout doesn’t resemble the respective upstream project.
There is a multitude of tasks we tend to perform in a dist-git repo:
- Bumping a release field for sake of a rebuild.
- Updating to the latest upstream release.
- Resolving CVEs.
- Fixing bugs by…
- Changing a spec file.
- Pulling a commit from upstream.
- Or even backporting a commit.
- And more...
For some tasks, the workflow is just fine and pretty straightforward. But for the other, it’s very gruesome - the moment you need to touch patch files, the horror comes in. The fact that we operate with patch files, in a git repository, is just mind-boggling to me.
Luckily, we have tooling which supports the repository layout - `fedpkg prep`, `srpm` or `mockbuild` are such handy commands - you
can
easily inspect the source tree or make sure your local change builds.
Where am I getting with this?
Over the years there have been multiple tools created to improve the development experience: rdopkg [r], rpkg-util [ru], tito [t] and probably much much more
(e.g.
the way Fedora kernel developers work on kernel [k]).
(snip)
In the packit project, we work in source-git repositories. These are pretty much upstream repositories combined with Fedora downstream packaging files. An example: I recently added a project called
nyancat
[n] to Fedora. I have worked [w] on packaging the project in the GitHub repo and then just pushed the changes to dist-git using packit tooling. These source-git repositories can live anywhere: we have support for GitHub right now and are working on supporting pagure.
Would there be an interest within the community, as opt-in, to have such source-git repositories created for respective dist-git repositories? The idea is that you would work in the source-git repo and then let packit handle synchronization with a respective dist-git repo. Our aim is to provide the contribution experience you have in GitHub when working on your packages. Dist-git would still be the authoritative source and a place where official builds are done - the source-git repo would work as a way to collaborate. We also don’t
have
plans right now to integrate packit into fedpkg.
So, in my experience, source-git might be a workable solution for packages with *big* downstream modifications. However, for everything else, I think it's a way to make it easy to accrue technical debt and to do cargo-culting with downstream patches.
The vast majority of packages has *no patches* (or at most, one or two of them)
(snip)
I don't really want to argue with this point, I tend to agree. Just out of interest, do we have some statistics to support this? O:-)
I did not have any stats when I wrote this, but now I do. Parsing the rawhide spec files from [0] for lines matching "^Patch[0-9]*:[ \t]*.*$", I get the following distribution:
number of patches: number of packages total: 21970 0: 15638 1: 3287 2: 1232 3: 598 4: 325 5: 221 6: 154 7: 97 8: 83 9: 57 10: 41 11: 27 12: 26 13: 25 14: 13 15: 13 16: 14 17: 15 18: 5 19: 8 20: 2 21: 11 22: 2 23: 4 24: 4 25: 5 26: 3 27: 4 28: 5 29: 5 30: 2 31: 6 32: 4 33: 3 34: 1 35: 4 37: 2 38: 1 41: 1 42: 2 46: 1 47: 1 48: 3 49: 1 50: 2 51: 1 53: 1 54: 1 66: 1 68: 1 71: 1 75: 1 78: 1 79: 1 85: 1 127: 1 170: 1
In relative terms:
- 71% of packages have ZERO patches
- 15% have ONE patch
- 5% have TWO patches
- 3% have THREE patches
- 5% have MORE than THREE patches
Most packages have none (71%) - or at most two - patches (91%, my original "guess" for "vast majority"), some have 3-5 patches (5%), and a minority (4%) has six patches or more. So it seems this backs up my claim :)
These are some great stats!
But I would like to note that exploded repos (or source-git repos) have at least two other advantages.
- they consume less space than tarballs for each version because
objects in git repo are deduplicated 2) instead of downloading/uploading tarballs, you can just do something like: git pull --rebase upstream master; git push
You still need to have tarballs for SRPM. Therefore the exploded sources
actually consumes more space, because you still have tarballs and on top of that you have git repo.
Can the tarballs for srpms be generated dynamically in build system from the exploded sources?
Vít
So they are imho more convenient to work with even if you don't have any patches.
Continuing of communication with upstream should not be imposed by crappy experience when maintaining patches. It should be a question of work ethics and avoiding future conflicts with upstream.
clime
Fabio
, and uses *unmodified upstream sources / tarballs*. I never want to deal with exploded upstream sources, unless I'm creating a patch for something.
When it's an upstream commit that applies cleanly to the latest sources, I'll just add it in the .spec file, and let the tooling handle the rest. It's pretty neat to directly link to upstream commits (it works with github and gitlab and pagure, as far as I know), and let our tooling (spectool, fedpkg) do everything else. I don't have to download, patch, or touch sources myself in any way for that.
Unfortunately, in Ruby world, this unfortunately works less and less, because the released packages does not contain test suite these days.
So
if there is fix for some feature and associated test, then the patch
has
to be modified (the test part has to be stripped or split out). Otherwise I like this approach as well.
When I need to make changes that I am able to push back upstream, I don't do that in packaging, but fork upstream, do my changes, create a pull request, and again point my .spec file to the patches from there. No need to touch dist-git there, instead I'm working closely with upstream.
In the rare occasion that I need to make downstream-only changes with patches, I usually just explode the upstream tarball, run "git init", then "git add .", "git commit -m import", apply my changes, and then do "git diff --patch > ../00-my-changes.patch" (if it's just one commit), or "git format-patch -o ../" if there are multiple commits, and then delete the exploded sources again.
Having infrastructure for exploding sources from the package would be very interesting.
Vít
I maintain ~400 packages in fedora, and the only one with substantial downstream modifications (about 10 patches on top of upstream) is Jekyll (rubygem-jekyll), where I primarily disable tests for features that are not enabled in fedora anyway (if I didn't want to run any tests, I could just drop the number of patches to 1 or 2, making the fork unnecessary - but I like running the tests).
So while I agree that for *some* packages with *huge*, non-upstreamable diffs between upstream and fedora the source-git approach might work, I doubt that it would help in 99% of cases, or even make it too easy for packagers to make more and more downstream-only changes.
Fabio
The main reason I am sending this is to gather feedback from all of you whether there is an interest in such a workflow. We don’t have concrete plans for Fedora right now but based on your feedback we could.
[r] https://github.com/softwarefactory-project/rdopkg [ru] https://pagure.io/rpkg-util [t] https://github.com/rpm-software-management/tito [k]
https://lists.fedoraproject.org/archives/list/infrastructure@lists.fedorapro...
[n] https://github.com/TomasTomecek/nyancat [w] https://github.com/TomasTomecek/nyancat/pull/2
Cheers, Tomas _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct:
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines:
https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives:
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct:
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines:
https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives:
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct:
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines:
https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives:
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct:
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives:
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct:
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives:
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
On Thu, 7 May 2020 at 06:54, clime clime@fedoraproject.org wrote:
Dne čt 7. kvě 2020 12:19 uživatel Vít Ondruch vondruch@redhat.com napsal:
Dne 06. 05. 20 v 20:39 clime napsal(a):
On Wed, 6 May 2020 at 13:21, Fabio Valentini decathorpe@gmail.com wrote:
On Wed, May 6, 2020 at 10:37 AM Vít Ondruch vondruch@redhat.com wrote:
Dne 05. 05. 20 v 18:37 Fabio Valentini napsal(a):
On Mon, May 4, 2020 at 5:06 PM Tomas Tomecek ttomecek@redhat.com wrote:
You still need to have tarballs for SRPM. Therefore the exploded sources
actually consumes more space, because you still have tarballs and on top of that you have git repo.
Can the tarballs for srpms be generated dynamically in build system from the exploded sources?
If there is one thing I have learned from our build system, anything is possible with enough time, manpower and duct-tape. Just realize that once it is built there will be a dozens of extra features needed to debug all the transient problems that come from it. Also realize that it will NEVER get removed from the system because some other parts will be glued on top which needed one little thing from it.
A program will need a lot of debugging to deal with a lot of transient errors which happen because we do a lot of builds. The heavy hitter on using the builders are MBS and koschei which both be building transient packages all the time. I expect it will find various things where the build mostly worked but
a) the checkout partially failed.. b) there was a permission problem c) the builder ran out of memory/disk space/something else d) fill in a hundred different transient ghosts we run into regularly with our current rube goldberg build system.
That will need various levels of debugging to figure out why it didn't work and it would need staff to track down and fix it.
On Wed, May 6, 2020 at 10:24 PM Simo Sorce simo@redhat.com wrote:
Well, a way to allow force pushes would be to have a git hook that branches the tree before the force push. (creating a branch named something like audit-force-push-<timestamp>) That way you can retain data for legal/auditing reasons, while allowing every day history to be rewritten.
Wouldn't it be easier to approach this from a build system perspective and let for example the build system (or tools) tag the commits which were built from with some for-ever-living tags? This would still ensure a complete audit trail for whatever was built and shipped, but could eliminate the need for a complete lock down of dist/source-git.
Not sure how "nice" that would be for an auditor that has to reconstruct what happened over multiple force pushes that way, it also will generate quite an amount of noisy metadata (branches), but it could work.
Refs created for auditing purposes could be kept in a separate git namespace so they don't create noise in everyday workflows.
Vít Ondruch vondruch@redhat.com writes:
Dne 06. 05. 20 v 20:39 clime napsal(a):
On Wed, 6 May 2020 at 13:21, Fabio Valentini decathorpe@gmail.com wrote:
On Wed, May 6, 2020 at 10:37 AM Vít Ondruch vondruch@redhat.com wrote:
Dne 05. 05. 20 v 18:37 Fabio Valentini napsal(a):
On Mon, May 4, 2020 at 5:06 PM Tomas Tomecek ttomecek@redhat.com wrote:
Hi Tomas,
I'll respond below with some of my experiences and opinions ...
Let’s talk about dist-git, as a place where we work. For us, packagers, it’s a well-known place. Yet for newcomers, it may take a while to learn all the details. Even though we operate with projects in a dist-git repository, the layout doesn’t resemble the respective upstream project.
There is a multitude of tasks we tend to perform in a dist-git repo:
- Bumping a release field for sake of a rebuild.
- Updating to the latest upstream release.
- Resolving CVEs.
- Fixing bugs by…
- Changing a spec file.
- Pulling a commit from upstream.
- Or even backporting a commit.
- And more...
For some tasks, the workflow is just fine and pretty straightforward. But for the other, it’s very gruesome - the moment you need to touch patch files, the horror comes in. The fact that we operate with patch files, in a git repository, is just mind-boggling to me.
Luckily, we have tooling which supports the repository layout - `fedpkg prep`, `srpm` or `mockbuild` are such handy commands - you can easily inspect the source tree or make sure your local change builds.
Where am I getting with this?
Over the years there have been multiple tools created to improve the development experience: rdopkg [r], rpkg-util [ru], tito [t] and probably much much more (e.g. the way Fedora kernel developers work on kernel [k]).
(snip)
In the packit project, we work in source-git repositories. These are pretty much upstream repositories combined with Fedora downstream packaging files. An example: I recently added a project called nyancat [n] to Fedora. I have worked [w] on packaging the project in the GitHub repo and then just pushed the changes to dist-git using packit tooling. These source-git repositories can live anywhere: we have support for GitHub right now and are working on supporting pagure.
Would there be an interest within the community, as opt-in, to have such source-git repositories created for respective dist-git repositories? The idea is that you would work in the source-git repo and then let packit handle synchronization with a respective dist-git repo. Our aim is to provide the contribution experience you have in GitHub when working on your packages. Dist-git would still be the authoritative source and a place where official builds are done - the source-git repo would work as a way to collaborate. We also don’t have plans right now to integrate packit into fedpkg.
So, in my experience, source-git might be a workable solution for packages with *big* downstream modifications. However, for everything else, I think it's a way to make it easy to accrue technical debt and to do cargo-culting with downstream patches.
The vast majority of packages has *no patches* (or at most, one or two of them)
(snip)
I don't really want to argue with this point, I tend to agree. Just out of interest, do we have some statistics to support this? O:-)
I did not have any stats when I wrote this, but now I do. Parsing the rawhide spec files from [0] for lines matching "^Patch[0-9]*:[ \t]*.*$", I get the following distribution:
number of patches: number of packages total: 21970 0: 15638 1: 3287 2: 1232 3: 598 4: 325 5: 221 6: 154 7: 97 8: 83 9: 57 10: 41 11: 27 12: 26 13: 25 14: 13 15: 13 16: 14 17: 15 18: 5 19: 8 20: 2 21: 11 22: 2 23: 4 24: 4 25: 5 26: 3 27: 4 28: 5 29: 5 30: 2 31: 6 32: 4 33: 3 34: 1 35: 4 37: 2 38: 1 41: 1 42: 2 46: 1 47: 1 48: 3 49: 1 50: 2 51: 1 53: 1 54: 1 66: 1 68: 1 71: 1 75: 1 78: 1 79: 1 85: 1 127: 1 170: 1
In relative terms:
- 71% of packages have ZERO patches
- 15% have ONE patch
- 5% have TWO patches
- 3% have THREE patches
- 5% have MORE than THREE patches
Most packages have none (71%) - or at most two - patches (91%, my original "guess" for "vast majority"), some have 3-5 patches (5%), and a minority (4%) has six patches or more. So it seems this backs up my claim :)
These are some great stats!
But I would like to note that exploded repos (or source-git repos) have at least two other advantages.
- they consume less space than tarballs for each version because
objects in git repo are deduplicated 2) instead of downloading/uploading tarballs, you can just do something like: git pull --rebase upstream master; git push
You still need to have tarballs for SRPM. Therefore the exploded sources actually consumes more space, because you still have tarballs and on top of that you have git repo.
That's the case now, but doesn't have to be the case. We could decompress and dedup if we wanted, but it's not worth the effort to do that.
Like many things, it's down to how much effort it's wroth.
Thanks, --Robbie
On Tue, May 05, 2020 at 03:26:59PM -0400, Robbie Harwood wrote:
Tomas Tomecek ttomecek@redhat.com writes:
Thank you all for raising all the questions and concerns.
Before I reply, I'd like to stress that we are still in a prototype phase - not everything is solved (clearly) and at this point, we experiment with the workflow mostly.
Luckily, force-pushes are not allowed in dist-git,
That's a "current state of affairs" statement, not an ideal, as I understand it. Assuming that force-pushes aren't allowed means we'll never be able to have, e.g., non-distro branches (for testing etc.) that we can force push.
This has been a pain point with RHEL dist-git; among other things, it means that branches can't be deleted.
Can't forks be used for this?
Push all your changes to the fork, force push your day away until you get it ready, test, force push some more, submit PR, force push changes, finally merge and no more force pushing.
kevin
On Wed, May 06, 2020 at 08:39:19PM +0200, clime wrote: ...snip... please folks... please trim your posts? :)
These are some great stats!
But I would like to note that exploded repos (or source-git repos) have at least two other advantages.
- they consume less space than tarballs for each version because
objects in git repo are deduplicated
But they consume tons more inodes which makes them painfull to backup/restore/mirror.
kevin
<snip>
In the rare occasion that I need to make downstream-only changes with patches, I usually just explode the upstream tarball, run "git init", then "git add .", "git commit -m import", apply my changes, and then do "git diff --patch > ../00-my-changes.patch" (if it's just one commit), or "git format-patch -o ../" if there are multiple commits, and then delete the exploded sources again.
In any case, I think this functionality could be included in rpkg/fedpkg...?
If there are no objections, I will open a ticket for this.
Maybe also something from https://pagure.io/glibc-maintainer-scripts/blob/master/f/glibc-sync-upstream... could be included too?
On Thu, 7 May 2020 at 20:58, Kevin Fenzi kevin@scrye.com wrote:
On Wed, May 06, 2020 at 08:39:19PM +0200, clime wrote: ...snip... please folks... please trim your posts? :)
These are some great stats!
But I would like to note that exploded repos (or source-git repos) have at least two other advantages.
- they consume less space than tarballs for each version because
objects in git repo are deduplicated
But they consume tons more inodes which makes them painfull to backup/restore/mirror.
But maybe still less painful than to do this with upstream tarballs? :) I guess it depends on average number of upstream releases per package. If the number is 1, then for sure tarballs will win. If the number is, let's say closer to 10, the storage size difference might be already quite significant to make the above operations harder.
kevin _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
On Thu, 7 May 2020 at 22:53, clime clime@fedoraproject.org wrote:
<snip>
In the rare occasion that I need to make downstream-only changes with patches, I usually just explode the upstream tarball, run "git init", then "git add .", "git commit -m import", apply my changes, and then do "git diff --patch > ../00-my-changes.patch" (if it's just one commit), or "git format-patch -o ../" if there are multiple commits, and then delete the exploded sources again.
In any case, I think this functionality could be included in rpkg/fedpkg...?
If there are no objections, I will open a ticket for this.
Maybe also something from https://pagure.io/glibc-maintainer-scripts/blob/master/f/glibc-sync-upstream... could be included too?
I have opened https://pagure.io/rpkg/issue/502.
But I think the source-git (or exploded sources) repos would be great to pursue as well.
Out of the mentioned models to avoid history overwriting by force pushes (i.e. tags mentioned by Miro Hroncok and Hunor Csomortáni, branches mentioned by Simo Sorce, git merge -s ours mentioned by Florian Weimer), I like the branches approach the most with a slight tweak that branches are named according to upstream versions. I.e. for each "rebase", we create a new branch containing the new updated upstream sources and place our possibly updated patches on top. Effectively, a branch is just a single file in .git/refs/heads so it should be cheap and there will be no force pushes. Just the dist-git interface for working with branches should be ready for the fact that there may be a lot of them :). I think it is a small price for enabling this potentially very exciting, new approach.
Best regards clime
On Fri, 8 May 2020 at 09:59, clime clime@fedoraproject.org wrote:
On Thu, 7 May 2020 at 20:58, Kevin Fenzi kevin@scrye.com wrote:
On Wed, May 06, 2020 at 08:39:19PM +0200, clime wrote: ...snip... please folks... please trim your posts? :)
These are some great stats!
But I would like to note that exploded repos (or source-git repos) have at least two other advantages.
- they consume less space than tarballs for each version because
objects in git repo are deduplicated
But they consume tons more inodes which makes them painfull to backup/restore/mirror.
But maybe still less painful than to do this with upstream tarballs?
No because the things that backups and rsync do works in a slow way. We can do the backup the look-aside cache with tar-balls in a couple of hours. We can also rsync that in the same amount of time. It takes that long or longer to do that with a couple of git trees which are much smaller in size but larger in file numbers. Every file in a git tree is stat'd and while there is some deduplication, there is a lot of files.
Could this be solved by moving to some other sort of file system model... possibly but we a) Have no time to pursue that investigation in a large enough size to prove/disprove it b) Have no money to purchase the equipment that these file systems work on.
On Fri, 8 May 2020 at 16:22, Stephen John Smoogen smooge@gmail.com wrote:
On Fri, 8 May 2020 at 09:59, clime clime@fedoraproject.org wrote:
On Thu, 7 May 2020 at 20:58, Kevin Fenzi kevin@scrye.com wrote:
On Wed, May 06, 2020 at 08:39:19PM +0200, clime wrote: ...snip... please folks... please trim your posts? :)
These are some great stats!
But I would like to note that exploded repos (or source-git repos) have at least two other advantages.
- they consume less space than tarballs for each version because
objects in git repo are deduplicated
But they consume tons more inodes which makes them painfull to backup/restore/mirror.
But maybe still less painful than to do this with upstream tarballs?
No because the things that backups and rsync do works in a slow way. We can do the backup the look-aside cache with tar-balls in a couple of hours. We can also rsync that in the same amount of time. It takes that long or longer to do that with a couple of git trees which are much smaller in size but larger in file numbers. Every file in a git tree is stat'd and while there is some deduplication, there is a lot of files.
Well from scratch, it will be much harder to rsync git repos but imho that changes once the initial rsync is done because then just new objects are transferred. + we could possibly use repospanner to mirror lookaside cache too? The deduplication in git is quite significant when compared to several tarballs from separate upstream releases.
Anyway, I don't have the operative experience that you guys have so it's quite pointless for me to argue here.
Could this be solved by moving to some other sort of file system model... possibly but we a) Have no time to pursue that investigation in a large enough size to prove/disprove it b) Have no money to purchase the equipment that these file systems work on.
-- Stephen J Smoogen. _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
On Wed, May 06, 2020 at 11:41:37AM +0200, Tomas Tomecek wrote:
I'm actually not a fan of the term "source-git" honestly - I'd love to call it "upstream git" since that's what we are trying to do - use the repository layout which is well-known in the upstream community.
The problem with that is that it's too easily confused with the actual upstream project's source control. In most (or all?) cases here, we're talking about a layer in between, right?
On Wed, May 6, 2020 at 2:24 PM Simo Sorce simo@redhat.com wrote:
Well, a way to allow force pushes would be to have a git hook that branches the tree before the force push. (creating a branch named something like audit-force-push-<timestamp>)
In Ceph we do this at a slightly different point of time. We use "rdopkg tag-patches" to save each of the "patches" refs that we've translated into patch series in dist-git. Each Git tag is the NVR of the package.
We rebase and force-push our "patches" branches frequently, so a patches branch is one "history", and dist-git becomes a "history of histories".
It's critical to have this flexibility + auditability so we can move fast and still go back and reproduce everything.
- Ken
Hi, I'm a bit late to the party, but here's my 2¢.
On Mon, May 04, 2020 at 05:05:02PM +0200, Tomas Tomecek wrote:
In the packit project, we work in source-git repositories. These are pretty much upstream repositories combined with Fedora downstream packaging files.
I think source-git would be an interesting avenue to explore... There's some hairy issues to figure out wrt. to rebasing of downstream-only patches, but if that is solved, there would be great potential to make certain styles of packaging much nicer.
For more complicated projects, rebasing of patches would require some git wizardry, but we'd reap the benefit of how good git is with rebasing patches. From the workflows people described, it is clear that many of us are doing some variant of a custom git branch to make rebasing easier, building custom tooling around that.
Would there be an interest within the community, as opt-in, to have such source-git repositories created for respective dist-git repositories? The idea is that you would work in the source-git repo and then let packit handle synchronization with a respective dist-git repo.
I agree with what Miro and others said about this: this brings a lot of complication. I expect requirement to have synchronization both ways is going to be a constant source of problems. We lose the invariant that dist-git is the canonical source of truth. (Automatic synchro is OK if it's just one way, but here it clearly needs to be both ways because some maintainers would modify source-git and other maintainers would modify dist-git.)
IMO, source-git as a third repo in between the project and dist-git is not useful. Instead, it would make sense when integrated with dist-git. Tools like fedpkg and koji would need to learn to build a project directly from this git repo, building a tarball on the fly. (What smooge said about reliability: I wouldn't worry too much. 'git archive' is reliable, and we'd be doing this locally, so this wouldn't be too different from copying a tarball.) We then have a dist/source-git repo that is very similar to upstream, and we don't have yet another component in the system, but simple change how patches are represented in dist-git.
(Hybrid approaches like Debian's quilt model don't make sense to me: let's either use git or not use git, but doing both, and requiring people to much with patch files is no better than our current dist-git.)
Our aim is to provide the contribution experience you have in GitHub when working on your packages. Dist-git would still be the authoritative source and a place where official builds are done - the source-git repo would work as a way to collaborate. We also don’t have plans right now to integrate packit into fedpkg.
I don't get this. We need fewer (and better, more closely integrated) tools, not yet another layer of helpers for other helpers.
Zbyszek
On Mon, May 04, 2020 at 05:05:02PM +0200, Tomas Tomecek wrote:
Let’s talk about dist-git, as a place where we work. For us, packagers, it’s a well-known place. Yet for newcomers, it may take a while to learn all the details. Even though we operate with projects in a dist-git repository, the layout doesn’t resemble the respective upstream project.
There is a multitude of tasks we tend to perform in a dist-git repo:
- Bumping a release field for sake of a rebuild.
- Updating to the latest upstream release.
- Resolving CVEs.
- Fixing bugs by…
- Changing a spec file.
- Pulling a commit from upstream.
- Or even backporting a commit.
- And more...
For some tasks, the workflow is just fine and pretty straightforward. But for the other, it’s very gruesome - the moment you need to touch patch files, the horror comes in. The fact that we operate with patch files, in a git repository, is just mind-boggling to me.
Luckily, we have tooling which supports the repository layout - `fedpkg prep`, `srpm` or `mockbuild` are such handy commands - you can easily inspect the source tree or make sure your local change builds.
Where am I getting with this?
Over the years there have been multiple tools created to improve the development experience: rdopkg [r], rpkg-util [ru], tito [t] and probably much much more (e.g. the way Fedora kernel developers work on kernel [k]).
In the packit project, we work in source-git repositories. These are pretty much upstream repositories combined with Fedora downstream packaging files. An example: I recently added a project called nyancat [n] to Fedora. I have worked [w] on packaging the project in the GitHub repo and then just pushed the changes to dist-git using packit tooling. These source-git repositories can live anywhere: we have support for GitHub right now and are working on supporting pagure.
Would there be an interest within the community, as opt-in, to have such source-git repositories created for respective dist-git repositories? The idea is that you would work in the source-git repo and then let packit handle synchronization with a respective dist-git repo. Our aim is to provide the contribution experience you have in GitHub when working on your packages. Dist-git would still be the authoritative source and a place where official builds are done - the source-git repo would work as a way to collaborate. We also don’t have plans right now to integrate packit into fedpkg.
The main reason I am sending this is to gather feedback from all of you whether there is an interest in such a workflow. We don’t have concrete plans for Fedora right now but based on your feedback we could.
Tomas,
This is an interesting idea and it is a direction I would like to see dist-git move. I do not think it's possible to find a one size fits all approach since every package has and needs varying workflows. And we should be flexible to let teams and developers do what they need to do. For me, moving spec files upstream does not seem that appealing from a package maintenance standpoint. I still like the clear distinction between the upstream project and the 'Fedora bits' that make it a package we ship. But that might not be the case for every package.
I have read through this thread as of 3pm Boston time on 08-May and there's a lot of great feedback. I wanted to offer my own thoughts on what I'd like to see related to this topic:
WHAT I WANT TO BE ABLE TO DO:
* View Fedora's dist-git repos as authoritative for packages built for Fedora. That is, I want to see a package on my Fedora system and be able to visit its dist-git repo to see how it's packaged.
* Make the lookaside cache optional. For SourceX lines, I want to be able to specify a git URL to a specific tag. fedpkg should use git archive to include that in the SRPM. e.g.:
Source0: https://github.com/rpminspect/rpminspect/archive/v0.12
* If we offer the above, honor signed git tags for verification at build time.
* Make PatchX lines optional. In dist-git, I should be able to set a remote pointing to the upstream repo. Then do the Fedora work on the appropriate Fedora branch. SourceX should still become a tarball using git archive and the tag. Patches should be automatically generated for SRPM construction using git format-patch or something comparing the Fedora dist-git branch with the remote branch. Multiple remotes should be possible should new and old versions of the upstream project need to be supported. Fedora dist-git branches should know their remote.
* I still want to be able to do 'fedpkg srpm' and get a standalone ready-to-build SRPM file that I can carry around.
* Possibly extend fedpkg to helper package maintainers submit patches from the package to the upstream project.
PRs in dist-git would be more meaningful to me if we were able to have the upstream repo as a remote in dist-git and our branches just an extension of that.
Thanks,
On Fri, May 08, 2020 at 03:12:15PM -0400, David Cantrell wrote:
WHAT I WANT TO BE ABLE TO DO:
- View Fedora's dist-git repos as authoritative for packages built for Fedora. That is, I want to see a package on my Fedora system and be able to visit its dist-git repo to see how it's packaged.
Well said.
Make the lookaside cache optional. For SourceX lines, I want to be able to specify a git URL to a specific tag. fedpkg should use git archive to include that in the SRPM. e.g.:
Source0: https://github.com/rpminspect/rpminspect/archive/v0.12
Yes. This is somewhat orthogonal to the dist-git / source-git question. It would be absolutely great to have this right now on top of dist-git, so we don't need to do the step of 'amend Source0, spectool -g, fedpkg new-sources, git commit'.
If we offer the above, honor signed git tags for verification at build time.
Make PatchX lines optional. In dist-git, I should be able to set a remote pointing to the upstream repo. Then do the Fedora work on the appropriate Fedora branch. SourceX should still become a tarball using git archive and the tag. Patches should be automatically generated for SRPM construction using git format-patch or something comparing the Fedora dist-git branch with the remote branch.
Hmm, but if we specify a git ref as source, why bother with patches at all? The step of generation and application of patches is error-prone, and if have a git ref, we have the tree object linked to it, and we should unpack that as the source dir without any further ado.
Multiple remotes should be possible should new and old versions of the upstream project need to be supported. Fedora dist-git branches should know their remote.
I still want to be able to do 'fedpkg srpm' and get a standalone ready-to-build SRPM file that I can carry around.
Possibly extend fedpkg to helper package maintainers submit patches from the package to the upstream project.
Is 'fedpkg' the best place for this? Submitting PRs from a git branch is a very generic thing, and there's plenty of tools to do that already. And those tools might even be forge-specific. E.g. github has hub and now an official gh tool, and it's unlikely that fedpkg will ever do github PRs as well as gh. And when fedora patches are just a branch, then the generic tool can be used.
PRs in dist-git would be more meaningful to me if we were able to have the upstream repo as a remote in dist-git and our branches just an extension of that.
Me likes. This would solve the perennial problem of "should I abuse proven-pakcager privs to do 'fedpkg new-sources' before submitting a PR?", which has two bad answers: "yes, and annoy the maintainer by polluting the cache if the PR is rejected", and "no, and have all CI fail".
Zbyszek
On Fri, 8 May 2020 at 20:05, Zbigniew Jędrzejewski-Szmek zbyszek@in.waw.pl wrote:
Hi, I'm a bit late to the party, but here's my 2¢.
On Mon, May 04, 2020 at 05:05:02PM +0200, Tomas Tomecek wrote:
In the packit project, we work in source-git repositories. These are pretty much upstream repositories combined with Fedora downstream packaging files.
I think source-git would be an interesting avenue to explore... There's some hairy issues to figure out wrt. to rebasing of downstream-only patches, but if that is solved, there would be great potential to make certain styles of packaging much nicer.
For more complicated projects, rebasing of patches would require some git wizardry, but we'd reap the benefit of how good git is with rebasing patches. From the workflows people described, it is clear that many of us are doing some variant of a custom git branch to make rebasing easier, building custom tooling around that.
Would there be an interest within the community, as opt-in, to have such source-git repositories created for respective dist-git repositories? The idea is that you would work in the source-git repo and then let packit handle synchronization with a respective dist-git repo.
I agree with what Miro and others said about this: this brings a lot of complication. I expect requirement to have synchronization both ways is going to be a constant source of problems. We lose the invariant that dist-git is the canonical source of truth. (Automatic synchro is OK if it's just one way, but here it clearly needs to be both ways because some maintainers would modify source-git and other maintainers would modify dist-git.)
IMO, source-git as a third repo in between the project and dist-git is not useful. Instead, it would make sense when integrated with dist-git.
I am curious. Zbyszek, what do you mean by "integrated with dist-git", here?
Tools like fedpkg and koji would need to learn to build a project directly from this git repo, building a tarball on the fly. (What smooge said about reliability: I wouldn't worry too much. 'git archive' is reliable, and we'd be doing this locally, so this wouldn't be too different from copying a tarball.) We then have a dist/source-git repo that is very similar to upstream, and we don't have yet another component in the system, but simple change how patches are represented in dist-git.
(Hybrid approaches like Debian's quilt model don't make sense to me: let's either use git or not use git, but doing both, and requiring people to much with patch files is no better than our current dist-git.)
Our aim is to provide the contribution experience you have in GitHub when working on your packages. Dist-git would still be the authoritative source and a place where official builds are done - the source-git repo would work as a way to collaborate. We also don’t have plans right now to integrate packit into fedpkg.
I don't get this. We need fewer (and better, more closely integrated) tools, not yet another layer of helpers for other helpers.
Zbyszek _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
On Fri, May 8, 2020 at 9:55 PM Zbigniew Jędrzejewski-Szmek zbyszek@in.waw.pl wrote:
On Fri, May 08, 2020 at 03:12:15PM -0400, David Cantrell wrote:
WHAT I WANT TO BE ABLE TO DO:
- View Fedora's dist-git repos as authoritative for packages built for Fedora. That is, I want to see a package on my Fedora system and be able to visit its dist-git repo to see how it's packaged.
Well said.
Make the lookaside cache optional. For SourceX lines, I want to be able to specify a git URL to a specific tag. fedpkg should use git archive to include that in the SRPM. e.g.:
Source0: https://github.com/rpminspect/rpminspect/archive/v0.12
Yes. This is somewhat orthogonal to the dist-git / source-git question. It would be absolutely great to have this right now on top of dist-git, so we don't need to do the step of 'amend Source0, spectool -g, fedpkg new-sources, git commit'.
Huh? You mean have koji download sources from upstream directly? I don't think that's a good idea, and it doesn't have external network access anyway ...
And why do you need to amend Source0? I need never touch this Source0 tag since it's always pointing to the version I need: https://src.fedoraproject.org/rpms/granite/blob/master/f/granite.spec#_12
If we offer the above, honor signed git tags for verification at build time.
Make PatchX lines optional. In dist-git, I should be able to set a remote pointing to the upstream repo. Then do the Fedora work on the appropriate Fedora branch. SourceX should still become a tarball using git archive and the tag. Patches should be automatically generated for SRPM construction using git format-patch or something comparing the Fedora dist-git branch with the remote branch.
Hmm, but if we specify a git ref as source, why bother with patches at all? The step of generation and application of patches is error-prone, and if have a git ref, we have the tree object linked to it, and we should unpack that as the source dir without any further ado.
You know that you can specify patches remotely as well, right? It works with github URLs and with pagure URLs (I haven't tested gitlab yet):
pagure commit - Patch0: https://src.fedoraproject.org/rpms/granite/c/5f706c7.patch pagure PR - Patch0: https://src.fedoraproject.org/rpms/jackson-bom/pull-request/6.patch github commit - Patch0: https://github.com/elementary/granite/commit/2fe8b69.patch github PR - Patch0: https://patch-diff.githubusercontent.com/raw/fedora-java/javapackages/pull/7...
It's probably not a good idea to point .spec files to PR diffs since they can change over time, but remote commits work just fine ... spectool -g fetches them just like it fetches sources.
Fabio
Multiple remotes should be possible should new and old versions of the upstream project need to be supported. Fedora dist-git branches should know their remote.
I still want to be able to do 'fedpkg srpm' and get a standalone ready-to-build SRPM file that I can carry around.
Possibly extend fedpkg to helper package maintainers submit patches from the package to the upstream project.
Is 'fedpkg' the best place for this? Submitting PRs from a git branch is a very generic thing, and there's plenty of tools to do that already. And those tools might even be forge-specific. E.g. github has hub and now an official gh tool, and it's unlikely that fedpkg will ever do github PRs as well as gh. And when fedora patches are just a branch, then the generic tool can be used.
PRs in dist-git would be more meaningful to me if we were able to have the upstream repo as a remote in dist-git and our branches just an extension of that.
Me likes. This would solve the perennial problem of "should I abuse proven-pakcager privs to do 'fedpkg new-sources' before submitting a PR?", which has two bad answers: "yes, and annoy the maintainer by polluting the cache if the PR is rejected", and "no, and have all CI fail".
Zbyszek _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
On Fri, May 8, 2020 at 4:25 PM Fabio Valentini decathorpe@gmail.com wrote:
On Fri, May 8, 2020 at 9:55 PM Zbigniew Jędrzejewski-Szmek zbyszek@in.waw.pl wrote:
On Fri, May 08, 2020 at 03:12:15PM -0400, David Cantrell wrote:
WHAT I WANT TO BE ABLE TO DO:
- View Fedora's dist-git repos as authoritative for packages built for Fedora. That is, I want to see a package on my Fedora system and be able to visit its dist-git repo to see how it's packaged.
Well said.
Make the lookaside cache optional. For SourceX lines, I want to be able to specify a git URL to a specific tag. fedpkg should use git archive to include that in the SRPM. e.g.:
Source0: https://github.com/rpminspect/rpminspect/archive/v0.12
Yes. This is somewhat orthogonal to the dist-git / source-git question. It would be absolutely great to have this right now on top of dist-git, so we don't need to do the step of 'amend Source0, spectool -g, fedpkg new-sources, git commit'.
Huh? You mean have koji download sources from upstream directly? I don't think that's a good idea, and it doesn't have external network access anyway ...
Having autofetching by Koji would require the ability to specify the checksum for the file in the spec, IMO: https://github.com/rpm-software-management/rpm/issues/463
A central way to validate the source is "valid" that is portable across systems (koji, copr, obs, etc.) would make this a lot easier to trust.
On Fri, May 08, 2020 at 07:54:11PM +0000, Zbigniew Jędrzejewski-Szmek wrote:
On Fri, May 08, 2020 at 03:12:15PM -0400, David Cantrell wrote:
WHAT I WANT TO BE ABLE TO DO:
- View Fedora's dist-git repos as authoritative for packages built for Fedora. That is, I want to see a package on my Fedora system and be able to visit its dist-git repo to see how it's packaged.
Well said.
Make the lookaside cache optional. For SourceX lines, I want to be able to specify a git URL to a specific tag. fedpkg should use git archive to include that in the SRPM. e.g.:
Source0: https://github.com/rpminspect/rpminspect/archive/v0.12
Yes. This is somewhat orthogonal to the dist-git / source-git question. It would be absolutely great to have this right now on top of dist-git, so we don't need to do the step of 'amend Source0, spectool -g, fedpkg new-sources, git commit'.
I think it's related to the concept of integration with upstream. My desire here to is skip the parts of fetching a release archive, putting it in the lookaside cache, and updating the Source0 line.
If we offer the above, honor signed git tags for verification at build time.
Make PatchX lines optional. In dist-git, I should be able to set a remote pointing to the upstream repo. Then do the Fedora work on the appropriate Fedora branch. SourceX should still become a tarball using git archive and the tag. Patches should be automatically generated for SRPM construction using git format-patch or something comparing the Fedora dist-git branch with the remote branch.
Hmm, but if we specify a git ref as source, why bother with patches at all? The step of generation and application of patches is error-prone, and if have a git ref, we have the tree object linked to it, and we should unpack that as the source dir without any further ado.
My thinking here is a project that exists in git upstream and we package in Fedora but with a few patches of our own. You specify a remote in dist-git for the package, fetch that, and then merge it on to the Fedora branch in the dist-git repo. Rather than storing patch files there, you just make the edits and commit them. The Source0 line would specify a git tag URL and part of the SRPM process would be to do something like 'git format-patch' against the remote to get the changes we've made locally. This is only necessary to construct an SRPM file, but it's kind of nice and could be automated. Working on the package wouldn't have you dealing with patches like we currently do, you just commit changes in the dist-git repo which ends up being on the project.
Multiple remotes should be possible should new and old versions of the upstream project need to be supported. Fedora dist-git branches should know their remote.
I still want to be able to do 'fedpkg srpm' and get a standalone ready-to-build SRPM file that I can carry around.
Possibly extend fedpkg to helper package maintainers submit patches from the package to the upstream project.
Is 'fedpkg' the best place for this? Submitting PRs from a git branch is a very generic thing, and there's plenty of tools to do that already. And those tools might even be forge-specific. E.g. github has hub and now an official gh tool, and it's unlikely that fedpkg will ever do github PRs as well as gh. And when fedora patches are just a branch, then the generic tool can be used.
s/fedpkg/anything-else/
This would be non-trivial, but I would like to have an integration that makes it easier for package maintainers to submit PRs to their upstream repos. Maybe it's not possible.
PRs in dist-git would be more meaningful to me if we were able to have the upstream repo as a remote in dist-git and our branches just an extension of that.
Me likes. This would solve the perennial problem of "should I abuse proven-pakcager privs to do 'fedpkg new-sources' before submitting a PR?", which has two bad answers: "yes, and annoy the maintainer by polluting the cache if the PR is rejected", and "no, and have all CI fail".
Right. Still track Fedora-specifics in git but eliminate information we can get programatically or from other sources.
Thanks,
On Fri, May 08, 2020 at 10:23:45PM +0200, Fabio Valentini wrote:
On Fri, May 8, 2020 at 9:55 PM Zbigniew Jędrzejewski-Szmek zbyszek@in.waw.pl wrote:
On Fri, May 08, 2020 at 03:12:15PM -0400, David Cantrell wrote:
WHAT I WANT TO BE ABLE TO DO:
- View Fedora's dist-git repos as authoritative for packages built for Fedora. That is, I want to see a package on my Fedora system and be able to visit its dist-git repo to see how it's packaged.
Well said.
Make the lookaside cache optional. For SourceX lines, I want to be able to specify a git URL to a specific tag. fedpkg should use git archive to include that in the SRPM. e.g.:
Source0: https://github.com/rpminspect/rpminspect/archive/v0.12
Yes. This is somewhat orthogonal to the dist-git / source-git question. It would be absolutely great to have this right now on top of dist-git, so we don't need to do the step of 'amend Source0, spectool -g, fedpkg new-sources, git commit'.
Huh? You mean have koji download sources from upstream directly? I don't think that's a good idea, and it doesn't have external network access anyway ...
And why do you need to amend Source0? I need never touch this Source0 tag since it's always pointing to the version I need: https://src.fedoraproject.org/rpms/granite/blob/master/f/granite.spec#_12
Yes, I know. I guess what I was more interested in is eliminating the need for me to download the package and do 'fedpkg new-sources'.
Koji doesn't need to download, that can be done client side as you submit the build.
If we offer the above, honor signed git tags for verification at build time.
Make PatchX lines optional. In dist-git, I should be able to set a remote pointing to the upstream repo. Then do the Fedora work on the appropriate Fedora branch. SourceX should still become a tarball using git archive and the tag. Patches should be automatically generated for SRPM construction using git format-patch or something comparing the Fedora dist-git branch with the remote branch.
Hmm, but if we specify a git ref as source, why bother with patches at all? The step of generation and application of patches is error-prone, and if have a git ref, we have the tree object linked to it, and we should unpack that as the source dir without any further ado.
You know that you can specify patches remotely as well, right? It works with github URLs and with pagure URLs (I haven't tested gitlab yet):
pagure commit - Patch0: https://src.fedoraproject.org/rpms/granite/c/5f706c7.patch pagure PR - Patch0: https://src.fedoraproject.org/rpms/jackson-bom/pull-request/6.patch github commit - Patch0: https://github.com/elementary/granite/commit/2fe8b69.patch github PR - Patch0: https://patch-diff.githubusercontent.com/raw/fedora-java/javapackages/pull/7...
It's probably not a good idea to point .spec files to PR diffs since they can change over time, but remote commits work just fine ... spectool -g fetches them just like it fetches sources.
Yes, I know I can do this as well, but I don't want to. I just want to do the work directly in a git repo. At build time I want the tools to construct an SRPM that contains whatever patches I've overlaid on the repo and the necessary commands to apply them to the source--that it also fetches and adds to the spec file. We can improve the package maintainer process while also still outputting complete SRPM files.
Thanks,
On Fri, May 08, 2020 at 04:28:37PM -0400, Neal Gompa wrote:
On Fri, May 8, 2020 at 4:25 PM Fabio Valentini decathorpe@gmail.com wrote:
On Fri, May 8, 2020 at 9:55 PM Zbigniew Jędrzejewski-Szmek zbyszek@in.waw.pl wrote:
On Fri, May 08, 2020 at 03:12:15PM -0400, David Cantrell wrote:
WHAT I WANT TO BE ABLE TO DO:
- View Fedora's dist-git repos as authoritative for packages built for Fedora. That is, I want to see a package on my Fedora system and be able to visit its dist-git repo to see how it's packaged.
Well said.
Make the lookaside cache optional. For SourceX lines, I want to be able to specify a git URL to a specific tag. fedpkg should use git archive to include that in the SRPM. e.g.:
Source0: https://github.com/rpminspect/rpminspect/archive/v0.12
Yes. This is somewhat orthogonal to the dist-git / source-git question. It would be absolutely great to have this right now on top of dist-git, so we don't need to do the step of 'amend Source0, spectool -g, fedpkg new-sources, git commit'.
Huh? You mean have koji download sources from upstream directly? I don't think that's a good idea, and it doesn't have external network access anyway ...
Having autofetching by Koji would require the ability to specify the checksum for the file in the spec, IMO: https://github.com/rpm-software-management/rpm/issues/463
A central way to validate the source is "valid" that is portable across systems (koji, copr, obs, etc.) would make this a lot easier to trust.
Agreed though I would also add that checking GPG signatures on signed tags if the tag is signed is also valuable. Those would be complementary.
Thanks,
On Fri, May 08, 2020 at 10:13:25PM +0200, clime wrote:
On Fri, 8 May 2020 at 20:05, Zbigniew Jędrzejewski-Szmek zbyszek@in.waw.pl wrote:
Hi, I'm a bit late to the party, but here's my 2¢.
On Mon, May 04, 2020 at 05:05:02PM +0200, Tomas Tomecek wrote:
In the packit project, we work in source-git repositories. These are pretty much upstream repositories combined with Fedora downstream packaging files.
I think source-git would be an interesting avenue to explore... There's some hairy issues to figure out wrt. to rebasing of downstream-only patches, but if that is solved, there would be great potential to make certain styles of packaging much nicer.
For more complicated projects, rebasing of patches would require some git wizardry, but we'd reap the benefit of how good git is with rebasing patches. From the workflows people described, it is clear that many of us are doing some variant of a custom git branch to make rebasing easier, building custom tooling around that.
Would there be an interest within the community, as opt-in, to have such source-git repositories created for respective dist-git repositories? The idea is that you would work in the source-git repo and then let packit handle synchronization with a respective dist-git repo.
I agree with what Miro and others said about this: this brings a lot of complication. I expect requirement to have synchronization both ways is going to be a constant source of problems. We lose the invariant that dist-git is the canonical source of truth. (Automatic synchro is OK if it's just one way, but here it clearly needs to be both ways because some maintainers would modify source-git and other maintainers would modify dist-git.)
IMO, source-git as a third repo in between the project and dist-git is not useful. Instead, it would make sense when integrated with dist-git.
I am curious. Zbyszek, what do you mean by "integrated with dist-git", here?
Clone the upstream repo, add a fedora-specific branch, in that branch add the .spec file and whatever else we now carry in dist-git. I.e. use a single repo for both the source and the packaging in native git form.
Zbyszek
Hi,
Well, *my* packaging workflow is pretty simple:
1. point to the upstream git repo in my spec file with %{forgeurl} and the rest of the forge macros 2. point to the target upstream tag or commit with the associated variable 3. spectool (or co from lookaside if already there) 4. build
If I need changes in upstream code I fork the upstream git repo, make the changes there and point %{forgeurl} to my fork till the changes got merged upstream (sometimes I am the upstream).
If upstreaming stalls I export the corresponding git patches and apply them manually. This is painless enough I’ve not bothered yet to code a per-forgeurl %{patchlist} in forge macros. Someday I will spend the time here and I’ll have a safe %autoforgesetup that does not assume all the patches belong to archive 0.
Step 3 is a bit annoying in practice *but* absolutely necessary. WIP fixing can rely a lot on ephemeral multi-rebased topic branches. I *need* to cut the link to upstream git in my spec files by exporting the state the spec packages, without polluting other git repos with transient topic branches.
I don’t waste time modifying my Sources, %forgemeta produces a %{forgesourceX} that I declare as SourceX once and for all. I will, eventually, code a %forgesources that can be dropped in %sourcelist so it just works in multi-source specs, but those are few and frowned upon in Fedora.
Sometimes upstream is itself a galaxy of forked repos, again, being able to point the spec file to one of those (and change the pointer ar need) without assuming upstream is a monolithic monotonic ethernal repo is a godsend.
So, I *definitely* do not think making the Fedora git repo a clone or extension of some upstream git repo is a good idea, not that it is needed.
Better autobumping and auto changeloging would be appreciated, but not if it introduces adherences to specific Fedora git objects.
Like others wrote, the correct data model is to make the buildsys record real official Fedora build events in Fedora git, not invent esotheric rules to workaround the fact the git can not guess accurately what was actually build in what order. That can never work reliably.
Just bite the bullet, builds are controlled by the build system, no one *but* the build system can record them accurately in Fedora git.
Regards
On Fri, May 08, 2020 at 09:39:58AM -0600, Ken Dreyer wrote:
In Ceph we do this at a slightly different point of time. We use "rdopkg tag-patches" to save each of the "patches" refs that we've translated into patch series in dist-git. Each Git tag is the NVR of the package.
We rebase and force-push our "patches" branches frequently, so a patches branch is one "history", and dist-git becomes a "history of histories".
It's critical to have this flexibility + auditability so we can move fast and still go back and reproduce everything.
How do you backport fixes? Do apply the fixes directly to dist-git? Or do you apply the fixes to a corresponding patches branch that you occur to have around till needed (e.g. till the hitorical code is supported) for the purpose of backporting?
-- Petr
On Sun, May 10, 2020 at 11:51 PM Petr Pisar ppisar@redhat.com wrote:
How do you backport fixes? Do apply the fixes directly to dist-git? Or do you apply the fixes to a corresponding patches branch that you occur to have around till needed (e.g. till the hitorical code is supported) for the purpose of backporting?
It's the latter. We use "git cherry-pick" to pick changes to our "patches" branch, and then "rdopkg patch" writes those as .patch files and PatchXXX entries in our .spec file in the corresponding dist-git branch.
At a general level, a typical Fedora packager performs three kinds of operations in dist-git:
1) Rebasing to a new upstream version (eg. bumping the "Version" field in httpd.spec from 2.4.43 to 2.4.44)
2) Fixing something in RPM packaging itself (eg. removing "Groups" from httpd.spec file, fixing %check, etc)
3) Patching the source code (eg. cherry-picking a patch from upstream).
The current implementation of dist-git allows everyone and anyone to very clearly audit all three of these actions. This kind of transparency is really important to Fedora's goal of building a trusted operating system.
Upstream projects do ninja edits all the time. It's just too convenient to force-push or move Git tags, etc. Sometimes upstream authors have valid reasons for doing that kind of thing, but downstream we have different incentives. The fact that we have strong history preservation guarantees is one of the reasons I use Fedora.
rdopkg has sub-commands to automate each of the three categories above. For #3 (patching), in RH Ceph Storage we run the "rdopkg patch" operations in Jenkins, because that is the most common operation by far.
I'm watching packit, and I am interested to try it out one day to understand more about how it compares. I'm still not clear from this thread what source-git is, or how it compares technically to what we're doing with Ceph and OpenStack.
- Ken
On Mon, 11 May 2020 at 18:59, Ken Dreyer ktdreyer@ktdreyer.com wrote:
On Sun, May 10, 2020 at 11:51 PM Petr Pisar ppisar@redhat.com wrote:
How do you backport fixes? Do apply the fixes directly to dist-git? Or do you apply the fixes to a corresponding patches branch that you occur to have around till needed (e.g. till the hitorical code is supported) for the purpose of backporting?
It's the latter. We use "git cherry-pick" to pick changes to our "patches" branch, and then "rdopkg patch" writes those as .patch files and PatchXXX entries in our .spec file in the corresponding dist-git branch.
Ken, would it be, please, possible to provide links to the patch branches and mentioned dist-git repos. I would like to have a closer look.
Thanks clime
At a general level, a typical Fedora packager performs three kinds of operations in dist-git:
Rebasing to a new upstream version (eg. bumping the "Version" field in httpd.spec from 2.4.43 to 2.4.44)
Fixing something in RPM packaging itself (eg. removing "Groups" from httpd.spec file, fixing %check, etc)
Patching the source code (eg. cherry-picking a patch from upstream).
The current implementation of dist-git allows everyone and anyone to very clearly audit all three of these actions. This kind of transparency is really important to Fedora's goal of building a trusted operating system.
Upstream projects do ninja edits all the time. It's just too convenient to force-push or move Git tags, etc. Sometimes upstream authors have valid reasons for doing that kind of thing, but downstream we have different incentives. The fact that we have strong history preservation guarantees is one of the reasons I use Fedora.
rdopkg has sub-commands to automate each of the three categories above. For #3 (patching), in RH Ceph Storage we run the "rdopkg patch" operations in Jenkins, because that is the most common operation by far.
I'm watching packit, and I am interested to try it out one day to understand more about how it compares. I'm still not clear from this thread what source-git is, or how it compares technically to what we're doing with Ceph and OpenStack.
- Ken
devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
On Thu, May 7, 2020 at 3:54 PM clime clime@fedoraproject.org wrote:
<snip>
In the rare occasion that I need to make downstream-only changes with patches, I usually just explode the upstream tarball, run "git init", then "git add .", "git commit -m import", apply my changes, and then do "git diff --patch > ../00-my-changes.patch" (if it's just one commit), or "git format-patch -o ../" if there are multiple commits, and then delete the exploded sources again.
In any case, I think this functionality could be included in rpkg/fedpkg...?
If there are no objections, I will open a ticket for this.
It took me a bit to figure it out but I use quilt instead. It doesn't perfectly integrate with rpmbuild/dist-git but it works well enough.
There are two big nits I have... 1. If one patch fails to apply it stops there, so I have to go in and fix/refresh it, then back out, rm -rf the source directory, and re-quilt setup <spec>. 2. If you don't have any patches yet they are only generated in <src>/patches and not created in the dist-git directory.
Thanks, Richard
I haven't responded here for a few days - that doesn't mean I stopped caring, quite the opposite, I've read every single response but the thread grew so big that I wasn't able to keep up replying.
Given all your valuable feedback, we are aiming to come with a plan, how to provide repositories with unpacked sources ( = source-git) within Fedora. The next steps for our team are to write the plan down and then submit it as a Fedora 33 change. While we'll be working on the plan, we'd review this thread in-depth and reach out to specific comments ad-hoc.
Thank you for taking your time, Tomas
On Mon, May 4, 2020 at 5:05 PM Tomas Tomecek ttomecek@redhat.com wrote:
Let’s talk about dist-git, as a place where we work. For us, packagers, it’s a well-known place. Yet for newcomers, it may take a while to learn all the details. Even though we operate with projects in a dist-git repository, the layout doesn’t resemble the respective upstream project.
There is a multitude of tasks we tend to perform in a dist-git repo:
- Bumping a release field for sake of a rebuild.
- Updating to the latest upstream release.
- Resolving CVEs.
- Fixing bugs by…
- Changing a spec file.
- Pulling a commit from upstream.
- Or even backporting a commit.
- And more...
For some tasks, the workflow is just fine and pretty straightforward. But for the other, it’s very gruesome - the moment you need to touch patch files, the horror comes in. The fact that we operate with patch files, in a git repository, is just mind-boggling to me.
Luckily, we have tooling which supports the repository layout - `fedpkg prep`, `srpm` or `mockbuild` are such handy commands - you can easily inspect the source tree or make sure your local change builds.
Where am I getting with this?
Over the years there have been multiple tools created to improve the development experience: rdopkg [r], rpkg-util [ru], tito [t] and probably much much more (e.g. the way Fedora kernel developers work on kernel [k]).
In the packit project, we work in source-git repositories. These are pretty much upstream repositories combined with Fedora downstream packaging files. An example: I recently added a project called nyancat [n] to Fedora. I have worked [w] on packaging the project in the GitHub repo and then just pushed the changes to dist-git using packit tooling. These source-git repositories can live anywhere: we have support for GitHub right now and are working on supporting pagure.
Would there be an interest within the community, as opt-in, to have such source-git repositories created for respective dist-git repositories? The idea is that you would work in the source-git repo and then let packit handle synchronization with a respective dist-git repo. Our aim is to provide the contribution experience you have in GitHub when working on your packages. Dist-git would still be the authoritative source and a place where official builds are done - the source-git repo would work as a way to collaborate. We also don’t have plans right now to integrate packit into fedpkg.
The main reason I am sending this is to gather feedback from all of you whether there is an interest in such a workflow. We don’t have concrete plans for Fedora right now but based on your feedback we could.
[r] https://github.com/softwarefactory-project/rdopkg [ru] https://pagure.io/rpkg-util [t] https://github.com/rpm-software-management/tito [k] https://lists.fedoraproject.org/archives/list/infrastructure@lists.fedorapro... [n] https://github.com/TomasTomecek/nyancat [w] https://github.com/TomasTomecek/nyancat/pull/2
Cheers, Tomas
On Tue, 12 May 2020 at 09:47, Tomas Tomecek ttomecek@redhat.com wrote:
I haven't responded here for a few days - that doesn't mean I stopped caring, quite the opposite, I've read every single response but the thread grew so big that I wasn't able to keep up replying.
Given all your valuable feedback, we are aiming to come with a plan, how to provide repositories with unpacked sources ( = source-git) within Fedora. The next steps for our team are to write the plan down and then submit it as a Fedora 33 change. While we'll be working on the plan, we'd review this thread in-depth and reach out to specific comments ad-hoc.
I would like to put some constraints on trying to make this a Fedora 33 change. You are going to need to retool parts of koji and other build tools to work with these repositories. You are going to need additional disk space which needs to be spec'd out to exist and how it is going to do. You are possibly going to need other changes to PDC (dead software), bodhi, builders and such.
The majority of Fedora infrastructure will be moving from one datacenter to another in June and I do not expect us to be back to 'normal' until mid to late August to add new items. This means the work to do this has to either land in the next 30 days or in September before the checkmark that all changes are done. Infrastructure is already going to be implementing items for the ELN and upgrading various other dead software to newer versions to try and get past the EL6 deadline.
With this in mind, I think this would make a better Fedora 34 change versus something you can get done in 2-3 months.
Thank you for taking your time, Tomas
On Mon, May 4, 2020 at 5:05 PM Tomas Tomecek ttomecek@redhat.com wrote:
Let’s talk about dist-git, as a place where we work. For us, packagers, it’s a well-known place. Yet for newcomers, it may take a while to learn all the details. Even though we operate with projects in a dist-git repository, the layout doesn’t resemble the respective upstream project.
There is a multitude of tasks we tend to perform in a dist-git repo:
- Bumping a release field for sake of a rebuild.
- Updating to the latest upstream release.
- Resolving CVEs.
- Fixing bugs by…
- Changing a spec file.
- Pulling a commit from upstream.
- Or even backporting a commit.
- And more...
For some tasks, the workflow is just fine and pretty straightforward. But for the other, it’s very gruesome - the moment you need to touch patch files, the horror comes in. The fact that we operate with patch files, in a git repository, is just mind-boggling to me.
Luckily, we have tooling which supports the repository layout - `fedpkg prep`, `srpm` or `mockbuild` are such handy commands - you can easily inspect the source tree or make sure your local change builds.
Where am I getting with this?
Over the years there have been multiple tools created to improve the development experience: rdopkg [r], rpkg-util [ru], tito [t] and probably much much more (e.g. the way Fedora kernel developers work on kernel [k]).
In the packit project, we work in source-git repositories. These are pretty much upstream repositories combined with Fedora downstream packaging files. An example: I recently added a project called nyancat [n] to Fedora. I have worked [w] on packaging the project in the GitHub repo and then just pushed the changes to dist-git using packit tooling. These source-git repositories can live anywhere: we have support for GitHub right now and are working on supporting pagure.
Would there be an interest within the community, as opt-in, to have such source-git repositories created for respective dist-git repositories? The idea is that you would work in the source-git repo and then let packit handle synchronization with a respective dist-git repo. Our aim is to provide the contribution experience you have in GitHub when working on your packages. Dist-git would still be the authoritative source and a place where official builds are done - the source-git repo would work as a way to collaborate. We also don’t have plans right now to integrate packit into fedpkg.
The main reason I am sending this is to gather feedback from all of you whether there is an interest in such a workflow. We don’t have concrete plans for Fedora right now but based on your feedback we could.
[r] https://github.com/softwarefactory-project/rdopkg [ru] https://pagure.io/rpkg-util [t] https://github.com/rpm-software-management/tito [k] https://lists.fedoraproject.org/archives/list/infrastructure@lists.fedorapro... [n] https://github.com/TomasTomecek/nyancat [w] https://github.com/TomasTomecek/nyancat/pull/2
Cheers, Tomas
devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
On Tue, May 12, 2020 at 1:45 AM clime clime@fedoraproject.org wrote:
Ken, would it be, please, possible to provide links to the patch branches and mentioned dist-git repos. I would like to have a closer look.
Sure. I can't share the links to the RH Ceph Storage dist-git repos, so I will give one example where I used rdopkg in Fedora recently.
Here is an example where I bumped the version of a Python package and included some cherry-picked patches:
https://src.fedoraproject.org/rpms/python-jenkins-job-builder/c/78b70d24cf65...
At first glance, the two new patches I included there look like the output from "git-format-patch", and that is because rdopkg wraps git-format-patch for some operations. rdopkg automatically inserted those into the .spec file, and it also formats them with some compatibility options to preserve the .patch file formats between RHEL 7's Git 1.8.3.1 + RHEL 8's Git 2.18.2 + Fedora's Git, so that it does not matter what OS the packager is running.
So that's the change in "master" (dist-git's rawhide branch), and there is a corresponding "master-patches" branch to go along with that:
https://fedorapeople.org/cgit/ktdreyer/public_git/python-jenkins-job-builder...
In my dist-git clone on my laptop, I have three remotes, set up like this:
$ git remote -v origin ssh://ktdreyer@pkgs.fedoraproject.org/rpms/python-jenkins-job-builder (fetch) origin ssh://ktdreyer@pkgs.fedoraproject.org/rpms/python-jenkins-job-builder (push) patches ssh://fedorapeople.org/home/fedora/ktdreyer/public_git/python-jenkins-job-builder.git (fetch) patches ssh://fedorapeople.org/home/fedora/ktdreyer/public_git/python-jenkins-job-builder.git (push) upstream https://opendev.org/jjb/jenkins-job-builder.git (fetch) upstream https://opendev.org/jjb/jenkins-job-builder.git (push)
"rdopkg new-version" will update to the latest upstream version for me. Specifically it looks at the upstream repo, finds the latest Git tag, parses that tag string into a number, writes that number into the .spec file, downloads and uploads the new upstream tarball, etc. It will also rebase my "patches" branch for me and edit the Patch entries as necessary.
I haven't done that today for the sake of this example, but at some point in the future I will run "rdopkg new-version", and it will pull in 3.3.0 and eliminate those two patches, since they're both included in version 3.3.0 upstream.
In fact, you can try it on your computer if you set up the Git clones like I've done above. If you run "rdopkg new-version", then rdopkg will rewrite the "master-patches" branch, and then prompt you to force-push this to the "patches" remote. You won't have SSH access to push to my fedorapeople.org repo, so just imagine that is a team repo where many people on my team can push :) This just a really simple example with two patches in one small package.
- Ken
On Tue, 12 May 2020 at 23:06, Ken Dreyer ktdreyer@ktdreyer.com wrote:
On Tue, May 12, 2020 at 1:45 AM clime clime@fedoraproject.org wrote:
Ken, would it be, please, possible to provide links to the patch branches and mentioned dist-git repos. I would like to have a closer look.
Sure. I can't share the links to the RH Ceph Storage dist-git repos, so I will give one example where I used rdopkg in Fedora recently.
Here is an example where I bumped the version of a Python package and included some cherry-picked patches:
https://src.fedoraproject.org/rpms/python-jenkins-job-builder/c/78b70d24cf65...
At first glance, the two new patches I included there look like the output from "git-format-patch", and that is because rdopkg wraps git-format-patch for some operations. rdopkg automatically inserted those into the .spec file, and it also formats them with some compatibility options to preserve the .patch file formats between RHEL 7's Git 1.8.3.1 + RHEL 8's Git 2.18.2 + Fedora's Git, so that it does not matter what OS the packager is running.
So that's the change in "master" (dist-git's rawhide branch), and there is a corresponding "master-patches" branch to go along with that:
https://fedorapeople.org/cgit/ktdreyer/public_git/python-jenkins-job-builder...
In my dist-git clone on my laptop, I have three remotes, set up like this:
$ git remote -v origin ssh://ktdreyer@pkgs.fedoraproject.org/rpms/python-jenkins-job-builder (fetch) origin ssh://ktdreyer@pkgs.fedoraproject.org/rpms/python-jenkins-job-builder (push) patches ssh://fedorapeople.org/home/fedora/ktdreyer/public_git/python-jenkins-job-builder.git (fetch) patches ssh://fedorapeople.org/home/fedora/ktdreyer/public_git/python-jenkins-job-builder.git (push) upstream https://opendev.org/jjb/jenkins-job-builder.git (fetch) upstream https://opendev.org/jjb/jenkins-job-builder.git (push)
"rdopkg new-version" will update to the latest upstream version for me. Specifically it looks at the upstream repo, finds the latest Git tag, parses that tag string into a number, writes that number into the .spec file, downloads and uploads the new upstream tarball, etc. It will also rebase my "patches" branch for me and edit the Patch entries as necessary.
I haven't done that today for the sake of this example, but at some point in the future I will run "rdopkg new-version", and it will pull in 3.3.0 and eliminate those two patches, since they're both included in version 3.3.0 upstream.
In fact, you can try it on your computer if you set up the Git clones like I've done above. If you run "rdopkg new-version", then rdopkg will rewrite the "master-patches" branch, and then prompt you to force-push this to the "patches" remote. You won't have SSH access to push to my fedorapeople.org repo, so just imagine that is a team repo where many people on my team can push :) This just a really simple example with two patches in one small package.
Thanks a lot!
Ceph we do this at a slightly different point of time. We use "rdopkg tag-patches" to save each of the "patches" refs that we've translated into patch series in dist-git. Each Git tag is the NVR of the package.
When you do rdopkg new-version and you are asked to force push, is also the current master-patches HEAD tagged with the current package NVR?
Somewhere I was expecting to see a lot of NVR tags for past sate of master-patches (i assume, you could have also f32-patches, f31-patches, epel8-patches, ... ?) but I don't see those tags :) that would form the mentioned "history of histories".
Thanks again clime
- Ken
devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
On Fri, 8 May 2020 at 21:13, David Cantrell dcantrell@redhat.com wrote:
On Mon, May 04, 2020 at 05:05:02PM +0200, Tomas Tomecek wrote:
Let’s talk about dist-git, as a place where we work. For us, packagers, it’s a well-known place. Yet for newcomers, it may take a while to learn all the details. Even though we operate with projects in a dist-git repository, the layout doesn’t resemble the respective upstream project.
There is a multitude of tasks we tend to perform in a dist-git repo:
- Bumping a release field for sake of a rebuild.
- Updating to the latest upstream release.
- Resolving CVEs.
- Fixing bugs by…
- Changing a spec file.
- Pulling a commit from upstream.
- Or even backporting a commit.
- And more...
For some tasks, the workflow is just fine and pretty straightforward. But for the other, it’s very gruesome - the moment you need to touch patch files, the horror comes in. The fact that we operate with patch files, in a git repository, is just mind-boggling to me.
Luckily, we have tooling which supports the repository layout - `fedpkg prep`, `srpm` or `mockbuild` are such handy commands - you can easily inspect the source tree or make sure your local change builds.
Where am I getting with this?
Over the years there have been multiple tools created to improve the development experience: rdopkg [r], rpkg-util [ru], tito [t] and probably much much more (e.g. the way Fedora kernel developers work on kernel [k]).
In the packit project, we work in source-git repositories. These are pretty much upstream repositories combined with Fedora downstream packaging files. An example: I recently added a project called nyancat [n] to Fedora. I have worked [w] on packaging the project in the GitHub repo and then just pushed the changes to dist-git using packit tooling. These source-git repositories can live anywhere: we have support for GitHub right now and are working on supporting pagure.
Would there be an interest within the community, as opt-in, to have such source-git repositories created for respective dist-git repositories? The idea is that you would work in the source-git repo and then let packit handle synchronization with a respective dist-git repo. Our aim is to provide the contribution experience you have in GitHub when working on your packages. Dist-git would still be the authoritative source and a place where official builds are done - the source-git repo would work as a way to collaborate. We also don’t have plans right now to integrate packit into fedpkg.
The main reason I am sending this is to gather feedback from all of you whether there is an interest in such a workflow. We don’t have concrete plans for Fedora right now but based on your feedback we could.
Tomas,
This is an interesting idea and it is a direction I would like to see dist-git move. I do not think it's possible to find a one size fits all approach since every package has and needs varying workflows. And we should be flexible to let teams and developers do what they need to do. For me, moving spec files upstream does not seem that appealing from a package maintenance standpoint. I still like the clear distinction between the upstream project and the 'Fedora bits' that make it a package we ship. But that might not be the case for every package.
I have read through this thread as of 3pm Boston time on 08-May and there's a lot of great feedback. I wanted to offer my own thoughts on what I'd like to see related to this topic:
WHAT I WANT TO BE ABLE TO DO:
View Fedora's dist-git repos as authoritative for packages built for Fedora. That is, I want to see a package on my Fedora system and be able to visit its dist-git repo to see how it's packaged.
Make the lookaside cache optional. For SourceX lines, I want to be able to specify a git URL to a specific tag. fedpkg should use git archive to include that in the SRPM. e.g.:
Source0: https://github.com/rpminspect/rpminspect/archive/v0.12
If we offer the above, honor signed git tags for verification at build time.
Make PatchX lines optional. In dist-git, I should be able to set a remote pointing to the upstream repo. Then do the Fedora work on the appropriate Fedora branch. SourceX should still become a tarball using git archive and the tag. Patches should be automatically generated for SRPM construction using git format-patch or something comparing the Fedora dist-git branch with the remote branch. Multiple remotes should be possible should new and old versions of the upstream project need to be supported. Fedora dist-git branches should know their remote.
I still want to be able to do 'fedpkg srpm' and get a standalone ready-to-build SRPM file that I can carry around.
Possibly extend fedpkg to helper package maintainers submit patches from the package to the upstream project.
I very much like the ideas that you described although I have a slightly different view on implementation.
- I would be very happy if the original spec files in dist-git described the building process. I.e. that I can open any spec file anywhere on src.fp.o and I will know what happens when that spec file gets built. I think this is a useful feature to quickly navigate ourselves and don't rely on context and implicit knowledge too much. As a concrete example, if something is going to dynamically generate patch files from git commits and also the respective "Patch:" lines, I think there should be something in the spec file telling me that this will happen. The advantage is explicitness and I could imagine that something could also have parameters that would e.g. enable to omit certain commit from patch generation, i.e. it has imho better flexibility when compared to some tool in our build pipeline that would just blindly paste Patch lines at a certain place in spec file. I think such behavior would be unexpected and unpleasant because the spec file itself doesn't give you the full picture of the build process and you need to have some extra knowledge that newcomers will not have. You can say: "Not a big deal" but these things add up. Also order of declarations sometimes plays a role in rpm spec file so pasting some lines somewhere might not always be a valid operation.
- I think we should have an "importer" service that would download sources directly from upstream and placed them into dist-git's lookaside on demand, based on their checksum. It could also automatically check signatures when importing. When import is done, "sources" file would be updated. In future, it could also import git tags instead of just tarballs (i.e. it would basically just clone git repo and reset it to specific tag) and these would be the "source-git" repos (of course their more manual population by git push should be also possible).
- I think we could try to have workflows that require no force pushes. They are imho not really good for cooperation among more than 1 people :)
clime
PRs in dist-git would be more meaningful to me if we were able to have the upstream repo as a remote in dist-git and our branches just an extension of that.
Thanks,
-- David Cantrell dcantrell@redhat.com Red Hat, Inc. | Boston, MA | EST5EDT _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
* Stephen John Smoogen:
No because the things that backups and rsync do works in a slow way. We can do the backup the look-aside cache with tar-balls in a couple of hours. We can also rsync that in the same amount of time. It takes that long or longer to do that with a couple of git trees which are much smaller in size but larger in file numbers. Every file in a git tree is stat'd and while there is some deduplication, there is a lot of files.
I think there's a logic bug somewhere. 8-)
The number of files in the lookaside cache is small only because we check in patch files into dist-git. Upstream glibc.git isn't too bad, despite not having a particularly clean repository due to frequently rebased user branches (and tons of lose objects as a result):
$ find glibc.git/ | wc -l 1725
That's not two far off from the number of files we have in downstream dist-git at the tip of each release branch:
$ for x in {7..32} ; do git ls-tree origin/f$x: ; done | awk '{print $3}' | wc -l 1232
Admittedly, the deduplicated number is somewhat lower:
$ for x in {7..32} ; do git ls-tree origin/f$x: ; done | awk '{print $3}' | sort -u | wc -l 711
(I don't know how many files end up on the dist-git server for that.)
There must be hundreds of glibc tarballs in the lookaside cache by now, too, but I don't have insight into that. (Clearly, we aren't model citizens.) The file count would likely be way lower if you had to back up only one or two Git repositories.
Thanks, Florian
On Wed, 13 May 2020 at 03:19, Florian Weimer fweimer@redhat.com wrote:
- Stephen John Smoogen:
No because the things that backups and rsync do works in a slow way. We can do the backup the look-aside cache with tar-balls in a couple of hours. We can also rsync that in the same amount of time. It takes that long or longer to do that with a couple of git trees which are much smaller in size but larger in file numbers. Every file in a git tree is stat'd and while there is some deduplication, there is a lot of files.
I think there's a logic bug somewhere. 8-)
I think the logic bug is assuming people will regularly repack and garbage collect the git repositories. I have found that this is a rarity and trying to enforce it happening ends up with maintainer complaints that you messed with THEIR way of doing things. Assume that no one will do so until we have a crisis which forces various people to finally give into it.
Then also assume that developers will not come up with multiple ways to branch/sidebranch/fork (sometimes in their same project) which will end up with tons of lose objects which are not deduplicated. That is the reality of what I have seen on all our source repositories in the past. Then also realize that for copyright and other legal reasons we can not delete code once it has been committed (or at least been built against which the packet will do for you right away) .. so this is always going to grow.
On Wed, May 13, 2020 at 08:16:06AM -0400, Stephen John Smoogen wrote:
On Wed, 13 May 2020 at 03:19, Florian Weimer fweimer@redhat.com wrote:
- Stephen John Smoogen:
No because the things that backups and rsync do works in a slow way. We can do the backup the look-aside cache with tar-balls in a couple of hours. We can also rsync that in the same amount of time. It takes that long or longer to do that with a couple of git trees which are much smaller in size but larger in file numbers. Every file in a git tree is stat'd and while there is some deduplication, there is a lot of files.
I think there's a logic bug somewhere. 8-)
I think the logic bug is assuming people will regularly repack and garbage collect the git repositories. I have found that this is a rarity and trying to enforce it happening ends up with maintainer complaints that you messed with THEIR way of doing things. Assume that no one will do so until we have a crisis which forces various people to finally give into it.
Then also assume that developers will not come up with multiple ways to branch/sidebranch/fork (sometimes in their same project) which will end up with tons of lose objects which are not deduplicated. That is the reality of what I have seen on all our source repositories in the past. Then also realize that for copyright and other legal reasons we can not delete code once it has been committed (or at least been built against which the packet will do for you right away) .. so this is always going to grow.
Some random facts:
There are currently 2668 glibc files in the lookaside.
There are currently 6733 files in the glibc.git git repo on src.fedoraproject.org
Perhaps we should run regular 'git gc' over the repos there to move objects into pack files, but we don't currently.
In any case the usual problem is scale here. ~10k files isn't that much, but when you multiply by 30,000 packages its a lot.
kevin
On Tue, May 12, 2020 at 6:20 PM clime clime@fedoraproject.org wrote:
When you do rdopkg new-version and you are asked to force push, is also the current master-patches HEAD tagged with the current package NVR?
It's something that I have to do before I run "new-version". Here's the command I ran today:
$ rdopkg tag-patches --push
And rdopkg performed the following commands for me:
git tag python-jenkins-job-builder-3.2.0-1 master-patches git push patches python-jenkins-job-builder-3.2.0-1
You can see the new tag here: https://fedorapeople.org/cgit/ktdreyer/public_git/python-jenkins-job-builder...
rdopkg read the current NVR, tagged the tip of the master-patches branch for me, and pushed that tag to the patches remote.
Somewhere I was expecting to see a lot of NVR tags for past sate of master-patches (i assume, you could have also f32-patches, f31-patches, epel8-patches, ... ?) but I don't see those tags :) that would form the mentioned "history of histories".
You're correct, rdopkg supports branch names like "f32-patches", "f31-patches", and "epel8-patches". In my case I only needed to patch Rawhide, so I created "master-patches" there.
You're right that I didn't create a ton of NVR tags. This package is a super trivial example where I only started using this model to fix a FTBFS, so I did not tag every NVR. The reason I did this was because there are only a few patches and I did not expect to keep them in Fedora very long, because I can easily rebase Rawhide to 3.3.0 soon, and the upstream authors were going to ship 3.3.0 soon. When we ship an unpatched upstream release, there's less utility to tagging the NVR like that.
For the ceph package in RH Ceph Storage, we've tagged over three hundred NVRs with this system. We could probably go back and check which old builds koji-gc has deleted, and then delete those Git tags as well, if we want to clean up the ones that we never shipped.
- Ken
On Wed, 13 May 2020 at 22:32, Ken Dreyer ktdreyer@ktdreyer.com wrote:
On Tue, May 12, 2020 at 6:20 PM clime clime@fedoraproject.org wrote:
When you do rdopkg new-version and you are asked to force push, is also the current master-patches HEAD tagged with the current package NVR?
It's something that I have to do before I run "new-version". Here's the command I ran today:
$ rdopkg tag-patches --push
And rdopkg performed the following commands for me:
git tag python-jenkins-job-builder-3.2.0-1 master-patches git push patches python-jenkins-job-builder-3.2.0-1
You can see the new tag here: https://fedorapeople.org/cgit/ktdreyer/public_git/python-jenkins-job-builder...
rdopkg read the current NVR, tagged the tip of the master-patches branch for me, and pushed that tag to the patches remote.
Somewhere I was expecting to see a lot of NVR tags for past sate of master-patches (i assume, you could have also f32-patches, f31-patches, epel8-patches, ... ?) but I don't see those tags :) that would form the mentioned "history of histories".
You're correct, rdopkg supports branch names like "f32-patches", "f31-patches", and "epel8-patches". In my case I only needed to patch Rawhide, so I created "master-patches" there.
You're right that I didn't create a ton of NVR tags. This package is a super trivial example where I only started using this model to fix a FTBFS, so I did not tag every NVR. The reason I did this was because there are only a few patches and I did not expect to keep them in Fedora very long, because I can easily rebase Rawhide to 3.3.0 soon, and the upstream authors were going to ship 3.3.0 soon. When we ship an unpatched upstream release, there's less utility to tagging the NVR like that.
For the ceph package in RH Ceph Storage, we've tagged over three hundred NVRs with this system. We could probably go back and check which old builds koji-gc has deleted, and then delete those Git tags as well, if we want to clean up the ones that we never shipped.
Ken, thank you very much for the detailed explanation. I understand now.
I can see three variations on this approach for a per package setup
1) source-git repo, branches per upstream release + dist-git repo with the classic branching (f31, f32, ...) where the branch from source-git is included as a git submodule. You can have patches in source-git which are generic across all distribution release and you can also have a single distro-specific patches in dist-git as classic patch files, spec file would be in dist-git repo
2) the same thing but the source-git repo does not only have branches for upstream releases but also can have a branch <upstream-release>-f32 when I need to patch something specifically for f32, again there is a dist-git repo with a spec file and the submodule for the respective branch (either generic or patched specifically for a certain distro release)
3) there is no dist-git repo with the classic branching structure and there is only source-git with a spec file included. This would require that we would need to define what e.g. f32 means somewhere else by pointing to specific refs in the source git from an f32 definition file. The best for start might be to actually still have some kind of dist-git repo with the classic branches but these would only point to some refs in source-git. I am not sure if there is something in git to enable to have a branch as a pointer to another branch in another repo but at worst we can use a specific file or a submodule again.
Probably there are more variants but I see these three right now. I think variants 1 and 2 where the spec file is kept in dist-git but patches can be in source-git are more within our reach right now (but I might be wrong, variant 3 is also interesting).
clime
- Ken
devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
On Wed, May 13, 2020 at 4:29 PM clime clime@fedoraproject.org wrote:
Probably there are more variants but I see these three right now. I think variants 1 and 2 where the spec file is kept in dist-git but patches can be in source-git are more within our reach right now (but I might be wrong, variant 3 is also interesting).
I think the best approach is to try your ideas on many different real Fedora packages from many different upstreams, and refine the tools as you go, documenting what works and what doesn't. Tools like tito and rdopkg have the advantage of having been tested and hardened across many different packages, Fedora releases, and RHEL versions.
In relation to rdopkg, I forgot to mention that Debian's git-buildpackage tool uses a patch management model that is almost identical to rdopkg for RPMs. Debian packagers create "patch-queue" branches from the upstream project, and git-buildpackage can write out a quilt-formatted series of .patch files into the debian packaging. It is pretty fast to manage a large series of downstream patches this way, rebase to new versions, etc.
- Ken
On Thu, 14 May 2020 at 01:03, Ken Dreyer ktdreyer@ktdreyer.com wrote:
On Wed, May 13, 2020 at 4:29 PM clime clime@fedoraproject.org wrote:
Probably there are more variants but I see these three right now. I think variants 1 and 2 where the spec file is kept in dist-git but patches can be in source-git are more within our reach right now (but I might be wrong, variant 3 is also interesting).
I think the best approach is to try your ideas on many different real Fedora packages from many different upstreams, and refine the tools as you go, documenting what works and what doesn't. Tools like tito and rdopkg have the advantage of having been tested and hardened across many different packages, Fedora releases, and RHEL versions.
Yes, that's true but I think currently these tools assume the unpacked repo is outside of Fedora dist-git and import of the exported tarball into Fedora's git lookaside cache is needed. But I think we can optimize on this once the unpacked/source-git repos will be inside Fedora dist-git because we can use git submodules to link to them directly. Maybe at some point git submodules didn't have the best reputation? I remember something like this but in the present day, I think they are already heavily tried&tested and something we could include into our workflow. So that's something worth looking at too :).
But I think it's great that we have all these options like packit, rdopkg, tito, rpkg.
In relation to rdopkg, I forgot to mention that Debian's git-buildpackage tool uses a patch management model that is almost identical to rdopkg for RPMs. Debian packagers create "patch-queue" branches from the upstream project, and git-buildpackage can write out a quilt-formatted series of .patch files into the debian packaging. It is pretty fast to manage a large series of downstream patches this way, rebase to new versions, etc.
- Ken
devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Hunor Csomortáni csomh@redhat.com writes:
On Wed, May 6, 2020 at 10:24 PM Simo Sorce simo@redhat.com wrote:
Well, a way to allow force pushes would be to have a git hook that branches the tree before the force push. (creating a branch named something like audit-force-push-<timestamp>) That way you can retain data for legal/auditing reasons, while allowing every day history to be rewritten.
Wouldn't it be easier to approach this from a build system perspective and let for example the build system (or tools) tag the commits which were built from with some for-ever-living tags? This would still ensure a complete audit trail for whatever was built and shipped, but could eliminate the need for a complete lock down of dist/source-git.
Not sure how "nice" that would be for an auditor that has to reconstruct what happened over multiple force pushes that way, it also will generate quite an amount of noisy metadata (branches), but it could work.
Refs created for auditing purposes could be kept in a separate git namespace so they don't create noise in everyday workflows.
As someone who works with git history all the time, I cannot imagine how I would work in such an environment. Consider the simple task of finding the commit that first introduced a downstream change. Currently with dist-git, I can just do 'git log patch-file', scroll to the very end and be done with it.
If what you're proposing was implemented, then I'd have to manually try all the tags until I found the "right history" where the change was first introduced.
In an email in this thread clime suggested a similar approach, only instead of tags there would be a separate branch for each upstream release. While that eliminates the need to allow force-pushes, for the purposes of digging through the history it's the same thing. The only difference is that instead of iterating over tags, I'd be iterating over branches.
The only other approach to source-git that I can think of is merging new upstream releases instead of git-rebasing on top of them. That is the approach that I originally thought would work, but one of Neal's responses made me realize that this approach also has a significant drawback - it makes distinguishing between downstream changes and cherry-picked upstream changes hard.
I was originally excited about source-git, however currently I don't see an approach to source-git that would work for me and I don't think I'd use it if it became available. And frankly, I think I wouldn't want other people using it either because it would make understanding their packages hard.
I completely agree that dist-git is difficult to work with, but perhaps instead of inventing something completely new, we could focus on making working with dist-git easier by dropping the changlog and Release from the specfiles and on creating tools for ourselves to make working with patches easier? I'm currently looking into Quilt, mentioned by several people here, to see if it could make my life easier at all. Just a suggestion.
Thanks everyone for this enlightening thread.
Ondřej Lysoněk
On Thu, 14 May 2020 at 14:31, Ondřej Lysoněk olysonek@redhat.com wrote:
Hunor Csomortáni csomh@redhat.com writes:
On Wed, May 6, 2020 at 10:24 PM Simo Sorce simo@redhat.com wrote:
Well, a way to allow force pushes would be to have a git hook that branches the tree before the force push. (creating a branch named something like audit-force-push-<timestamp>) That way you can retain data for legal/auditing reasons, while allowing every day history to be rewritten.
Wouldn't it be easier to approach this from a build system perspective and let for example the build system (or tools) tag the commits which were built from with some for-ever-living tags? This would still ensure a complete audit trail for whatever was built and shipped, but could eliminate the need for a complete lock down of dist/source-git.
Not sure how "nice" that would be for an auditor that has to reconstruct what happened over multiple force pushes that way, it also will generate quite an amount of noisy metadata (branches), but it could work.
Refs created for auditing purposes could be kept in a separate git namespace so they don't create noise in everyday workflows.
As someone who works with git history all the time, I cannot imagine how I would work in such an environment. Consider the simple task of finding the commit that first introduced a downstream change. Currently with dist-git, I can just do 'git log patch-file', scroll to the very end and be done with it.
It's a good point that this operation would be harder but it could be solved, I think.
I mean it could have beneficial features for maybe not all packages but at least some.
I suspect on such scale as Fedora operates, it might be quite hard to do something which improves things for everybody.
If what you're proposing was implemented, then I'd have to manually try all the tags until I found the "right history" where the change was first introduced.
In an email in this thread clime suggested a similar approach, only instead of tags there would be a separate branch for each upstream release. While that eliminates the need to allow force-pushes, for the purposes of digging through the history it's the same thing. The only difference is that instead of iterating over tags, I'd be iterating over branches.
The only other approach to source-git that I can think of is merging new upstream releases instead of git-rebasing on top of them. That is the approach that I originally thought would work, but one of Neal's responses made me realize that this approach also has a significant drawback - it makes distinguishing between downstream changes and cherry-picked upstream changes hard.
I was originally excited about source-git, however currently I don't see an approach to source-git that would work for me and I don't think I'd use it if it became available. And frankly, I think I wouldn't want other people using it either because it would make understanding their packages hard.
I completely agree that dist-git is difficult to work with, but perhaps instead of inventing something completely new, we could focus on making working with dist-git easier by dropping the changlog and Release from the specfiles and on creating tools for ourselves to make working with patches easier? I'm currently looking into Quilt, mentioned by several people here, to see if it could make my life easier at all. Just a suggestion.
Thanks everyone for this enlightening thread.
Ondřej Lysoněk _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
clime clime@fedoraproject.org writes:
On Thu, 14 May 2020 at 14:31, Ondřej Lysoněk olysonek@redhat.com wrote:
Hunor Csomortáni csomh@redhat.com writes:
On Wed, May 6, 2020 at 10:24 PM Simo Sorce simo@redhat.com wrote:
Well, a way to allow force pushes would be to have a git hook that branches the tree before the force push. (creating a branch named something like audit-force-push-<timestamp>) That way you can retain data for legal/auditing reasons, while allowing every day history to be rewritten.
Wouldn't it be easier to approach this from a build system perspective and let for example the build system (or tools) tag the commits which were built from with some for-ever-living tags? This would still ensure a complete audit trail for whatever was built and shipped, but could eliminate the need for a complete lock down of dist/source-git.
Not sure how "nice" that would be for an auditor that has to reconstruct what happened over multiple force pushes that way, it also will generate quite an amount of noisy metadata (branches), but it could work.
Refs created for auditing purposes could be kept in a separate git namespace so they don't create noise in everyday workflows.
As someone who works with git history all the time, I cannot imagine how I would work in such an environment. Consider the simple task of finding the commit that first introduced a downstream change. Currently with dist-git, I can just do 'git log patch-file', scroll to the very end and be done with it.
It's a good point that this operation would be harder but it could be solved, I think.
I mean it could have beneficial features for maybe not all packages but at least some.
I think that source-git could make sense for packages that have historically *not* had many patches - be that downstream or cherry-picked patches. (And I realize that is quite the opposite of what others have said here.) With such packages, I could just git-pull new upstream releases, review the changes and git-push them to Fedora, instead of having to juggle with tarballs. That's a very appealing proposition to me. And with such packages, you wouldn't run into the downside of accumulating a history that is hard to understand.
So my opinion is that for simple packages that have no patches and where the conversion between source-git and dist-git is relatively straightforward, source-git could be a great option. For other packages, not really.
I suspect on such scale as Fedora operates, it might be quite hard to do something which improves things for everybody.
Agreed.
Ondřej
If what you're proposing was implemented, then I'd have to manually try all the tags until I found the "right history" where the change was first introduced.
In an email in this thread clime suggested a similar approach, only instead of tags there would be a separate branch for each upstream release. While that eliminates the need to allow force-pushes, for the purposes of digging through the history it's the same thing. The only difference is that instead of iterating over tags, I'd be iterating over branches.
The only other approach to source-git that I can think of is merging new upstream releases instead of git-rebasing on top of them. That is the approach that I originally thought would work, but one of Neal's responses made me realize that this approach also has a significant drawback - it makes distinguishing between downstream changes and cherry-picked upstream changes hard.
I was originally excited about source-git, however currently I don't see an approach to source-git that would work for me and I don't think I'd use it if it became available. And frankly, I think I wouldn't want other people using it either because it would make understanding their packages hard.
I completely agree that dist-git is difficult to work with, but perhaps instead of inventing something completely new, we could focus on making working with dist-git easier by dropping the changlog and Release from the specfiles and on creating tools for ourselves to make working with patches easier? I'm currently looking into Quilt, mentioned by several people here, to see if it could make my life easier at all. Just a suggestion.
Thanks everyone for this enlightening thread.
Ondřej Lysoněk
On Thu, 2020-05-14 at 14:30 +0200, Ondřej Lysoněk wrote:
I was originally excited about source-git, however currently I don't see an approach to source-git that would work for me and I don't think I'd use it if it became available. And frankly, I think I wouldn't want other people using it either because it would make understanding their packages hard.
So, another way that could work, with minimal tooling is that we keep the master branch strictly mirroring whatever upstream branch we follow, and then for each fedora release you have a fedora branch where you add the downstream stuff (spec file, patches, etc..). Whenever you want to bring in a new upstream release you would update the master tree to the release as tagged in the upstream master branch, then you branch off a new fedora branch, say fedora33, and then you cherry-pick on top of it whatever donwstream patches you had in the fedora32 branch.
The downside of this is that you cannot rebase mid-release. But then you can always cherry-pick patches from a rebase master or from upstream if you want.
And the branching, including cherry-picking of downstream patches can be done automatically for the most part.
Some issues we normally have can also be handled with some discipline: - upstream has code we cannot ship - upstream does not use git - other issues preventing mirroring
For all of these what we do is just use tarballs to create a diff from current master and then do megacommits with the diff on the master branch, and use the fedora branches just for the downstream patches and auto-cherry-picking.
Ideally the tarballs are referenced somehow in the master commit via sha256 ids so that you can reconstruct exactly what was layered on top. Those tarballs could also still be stored in the lookaside as a audit trail as well if preferred.
The critical thing is how to ensure the master branch is sane, or alternatively how to enable tooling that can switch around the master branch should surgery be required such that the previous one is preserved as a historical branch. (for example if upstream force pushes, we still should have an audited command that saves the previous master as a branch and then allows a rebase).
Simo.
Le vendredi 15 mai 2020 à 11:11 -0400, Simo Sorce a écrit :
So, another way that could work, with minimal tooling is that we keep the master branch strictly mirroring whatever upstream branch we follow,
For some projects we are not hopping between branches of the same upstream git, we are hopping between branches in different forked repos of the same upstream
Regards,
Le samedi 16 mai 2020 à 11:09 +0200, Nicolas Mailhot a écrit :
Le vendredi 15 mai 2020 à 11:11 -0400, Simo Sorce a écrit :
So, another way that could work, with minimal tooling is that we keep the master branch strictly mirroring whatever upstream branch we follow,
For some projects we are not hopping between branches of the same upstream git, we are hopping between branches in different forked repos of the same upstream
To expand a little: when you are creating a Fedora package, you are packaging a fixed code state (and the state must stay fixed for trivial reproducibility, auditing, and security reasons). In git speak that means you are packaging a specific commit reference, that may (if upstream is careful and serious) be tagged with a clean fixed version string.
That means packaging a branch is a no-go. It’s not a fixed git reference, it’s a moving reference.
*How* upstream arrived to this fixed state from the last packaged fixed state is deeply uninteresting from the srpm POW.
It may not even exist as clean git history upstream (assuming upstream uses git). For exemple, you package foo project, it gets in governance trouble and the original repository is no longuer updated, so you bet on fork foo1, that seems to have picked up the dev. Six months later you lost your bet, devs have consolidated on foo2 fork. There is no clean upstream git history from foo1 to foo2, foo1 is a dead evolutionary path.
Also there may even be a complete state tracking hole somewhere in the middle, because the creator of foo2 did not bother importing foo history in its own repo, and did some private dev starting from a partial copy of some (reformatted/reprocessed) foo files. Having needed to reconstitute fragmentary dev history in a new consolidated upstream git, I can tell you splicing past repo history fragments is non- trivial. I can totally understand why some upstreams do not bother with it after a governance accident.
Therefore, all the systems that try to base a Fedora package history on the mirorring of a unified unchanging monotonic upstream repo are broken by construction. The only thing you can reliably import in Fedora land are specific hashes or tags. And the upstream repo where you can find those hashes or tags may change over time.
I suppose you *could* ask a downstream Fedora scm “mirror” to compute a git path from the last packaged state to the new one, faking a merge of the new state over the last state, and faking continuous regular upstream history.
But why bother? You’d be creating artificial git history complexity that will exist Fedora-side only, and that upstream will not understand and disagree with, just to avoid cloning the upstream repo of the day separately to make your changes and PRs there.
Also, rpm is able to package multiple source archives in a single spec, and we have packagers that make use of this capability. If you wanted a fully working scm mirroring system, not only would you need to fake continuity between upstream scm repositories that do not provide this continuity, but to merge multiple upstream scm repositories in a single downstream git (good luck producing patches that upstream will accept from this unified repository).
In the meanwhile, you could just dump in your spec file
%global forgeurl0 https://repo0 %global commit0 hash0 # repo0 time handling is broken %global time0 2020-05-16T12:25:43+00:00
%global forgeurl1 https://repo1 %global tag1 x.y.z %global forgepatchlist1 %{expand: foo1X.patch foo2X.patch }
%forgemeta
%sourcelist %forgesources
%patchlist %forgepatches
… %setup %forgesetup
and be done. All existing Fedora tools like spectool will work just fine on that (actually the forge macros are not quite there yet in the version included in redhat-rpm-config, I still need to upstream multipatch handling once I finish QAing it)
Regards,