Hi folks,
The Fedora kernel is moving to maintaining the package in a source (sometimes people refer to it as an "exploded") tree. Basically just a fork of upstream. This makes a lot of packager tasks easier, but has introduced a minor issue with respect to the lookaside cache.
Right now, it's configured to create a tarball from the git tree and upload it to the lookaside cache for each build. We build the rawhide kernel every weekday (give or take) and the xz compressed source tarball is ~110MB. This works out to about 28GB per year for Rawhide alone (if this is a drop in the bucket and no one cares please let me know and we'll just do this). The old approach uploaded a release tarball and then incremental tarballs on top of that.
If, however, Fedora allowed packagers to optionally generate tarballs from a git repository we could just push the linux git repository. The entire repository with history going back 15 years is under 4GB total, which is pretty good when compared to ~419GB which is the space required for the equivalent time using the lookaside cache.
What would need to change:
* Fedora offers a git repository to push source trees to.
* A new file in the dist-git repository could be added if the packager wishes called "source-repos". In it, it contains a git url and commit identifier. For example, an entry might look like: " https://src.fedoraproject.org/sources/kernel.git v5.6" where v5.6 is a tag in the repository. We can restrict it so the git repository must be hosted by Fedora so we keep all the sources forever.
* fedpkg and fedpkg-minimal would need to be updated to pull the source tree if the "source-repos" file is found and run "git archive". Fortunately this work is actually already done since Red Hat's version of fedpkg already supports this.
I'm happy do to all the work for fedpkg/fedpkg-minimal to make this possible because the other option is to add a bunch of hacks to the kernel tooling to spit out a bunch of incremental tarballs to reduce what we have to upload.
I assume this is something that will need to go through the packaging SIG, but from an infra side of things are there any thoughts/concerns?
Regards, Jeremy
Hello Jeremy,
Dne st 8. dub 2020 22:33 uživatel Jeremy Cline jeremy@jcline.org napsal:
Hi folks,
The Fedora kernel is moving to maintaining the package in a source (sometimes people refer to it as an "exploded") tree. Basically just a fork of upstream. This makes a lot of packager tasks easier, but has introduced a minor issue with respect to the lookaside cache.
Right now, it's configured to create a tarball from the git tree and upload it to the lookaside cache for each build. We build the rawhide kernel every weekday (give or take) and the xz compressed source tarball is ~110MB. This works out to about 28GB per year for Rawhide alone (if this is a drop in the bucket and no one cares please let me know and we'll just do this). The old approach uploaded a release tarball and then incremental tarballs on top of that.
If, however, Fedora allowed packagers to optionally generate tarballs from a git repository we could just push the linux git repository. The entire repository with history going back 15 years is under 4GB total, which is pretty good when compared to ~419GB which is the space required for the equivalent time using the lookaside cache.
What would need to change:
Fedora offers a git repository to push source trees to.
A new file in the dist-git repository could be added if the packager wishes called "source-repos". In it, it contains a git url and commit identifier. For example, an entry might look like: "
https://src.fedoraproject.org/sources/kernel.git v5.6" where v5.6 is a tag in the repository. We can restrict it so the git repository must be hosted by Fedora so we keep all the sources forever.
- fedpkg and fedpkg-minimal would need to be updated to pull the source tree if the "source-repos" file is found and run "git archive". Fortunately this work is actually already done since Red Hat's version of fedpkg already supports this.
I'm happy do to all the work for fedpkg/fedpkg-minimal to make this possible because the other option is to add a bunch of hacks to the kernel tooling to spit out a bunch of incremental tarballs to reduce what we have to upload.
I assume this is something that will need to go through the packaging SIG, but from an infra side of things are there any thoughts/concerns?
Have you considered employing rpkg-util for that?
Basically, if this is merged https://github.com/rpm-software-management/mock/pull/526 and the plugin is enabled in Fedora, you will be able to use lines in spec files like:
Source0: {{{ git_dir_archive }}}
to automatically archive the content of the git repository surrounding the spec file. Hence the Fedora kernel repo can be exploded in the individual branches.
If you don't want to maintain spec file alongside exploded free, you could also use spec file alongside git submodule which would point to a certain commit in the mirrored kernel repo on src.fp.o. The git_dir_archive macro presented above has support for submodules as well and is able to archive them into an rpm source.
What do you think? clime
Regards, Jeremy _______________________________________________ infrastructure mailing list -- infrastructure@lists.fedoraproject.org To unsubscribe send an email to infrastructure-leave@lists. fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject. org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/ infrastructure@lists.fedoraproject.org
On Wed, Apr 08, 2020 at 08:27:08PM +0000, Jeremy Cline wrote:
Hi folks,
The Fedora kernel is moving to maintaining the package in a source (sometimes people refer to it as an "exploded") tree. Basically just a fork of upstream. This makes a lot of packager tasks easier, but has introduced a minor issue with respect to the lookaside cache.
Right now, it's configured to create a tarball from the git tree and upload it to the lookaside cache for each build. We build the rawhide kernel every weekday (give or take) and the xz compressed source tarball is ~110MB. This works out to about 28GB per year for Rawhide alone (if this is a drop in the bucket and no one cares please let me know and we'll just do this). The old approach uploaded a release tarball and then incremental tarballs on top of that.
Well, the netapp will dedupe some of those blocks, but it would be nice to save some room I would think.
If, however, Fedora allowed packagers to optionally generate tarballs from a git repository we could just push the linux git repository. The entire repository with history going back 15 years is under 4GB total, which is pretty good when compared to ~419GB which is the space required for the equivalent time using the lookaside cache.
What would need to change:
Fedora offers a git repository to push source trees to.
A new file in the dist-git repository could be added if the packager wishes called "source-repos". In it, it contains a git url and commit identifier. For example, an entry might look like: "
https://src.fedoraproject.org/sources/kernel.git v5.6" where v5.6 is a tag in the repository. We can restrict it so the git repository must be hosted by Fedora so we keep all the sources forever.
- fedpkg and fedpkg-minimal would need to be updated to pull the source tree if the "source-repos" file is found and run "git archive". Fortunately this work is actually already done since Red Hat's version of fedpkg already supports this.
I'm happy do to all the work for fedpkg/fedpkg-minimal to make this possible because the other option is to add a bunch of hacks to the kernel tooling to spit out a bunch of incremental tarballs to reduce what we have to upload.
I assume this is something that will need to go through the packaging SIG, but from an infra side of things are there any thoughts/concerns?
So, this was actually discussed already at some length:
https://pagure.io/releng/issue/7498
Most of my concerns were answered, but then nim suggested this could already be done with the macros we have today and no one said "no, that won't work for us" so I stopped doing anything until someone answered and then it dropped off my radar. ;)
Anyhow, I'd say look at that and see if there's any way to do what you want with existing macros, if not, we can revisit.
I guess FPC and FESCo would want to sign off on it, and it would probibly need to get some devel list discussion :)
kevin
On Wed, 2020-04-08 at 15:19 -0700, Kevin Fenzi wrote:
On Wed, Apr 08, 2020 at 08:27:08PM +0000, Jeremy Cline wrote:
Hi folks,
The Fedora kernel is moving to maintaining the package in a source (sometimes people refer to it as an "exploded") tree. Basically just a fork of upstream. This makes a lot of packager tasks easier, but has introduced a minor issue with respect to the lookaside cache.
Right now, it's configured to create a tarball from the git tree and upload it to the lookaside cache for each build. We build the rawhide kernel every weekday (give or take) and the xz compressed source tarball is ~110MB. This works out to about 28GB per year for Rawhide alone (if this is a drop in the bucket and no one cares please let me know and we'll just do this). The old approach uploaded a release tarball and then incremental tarballs on top of that.
Well, the netapp will dedupe some of those blocks, but it would be nice to save some room I would think.
So the main concern with our current approach is it's wasting a bunch of space, but if you're not concerned I'm happy to not do any work and leave things as they are.
If, however, Fedora allowed packagers to optionally generate tarballs from a git repository we could just push the linux git repository. The entire repository with history going back 15 years is under 4GB total, which is pretty good when compared to ~419GB which is the space required for the equivalent time using the lookaside cache.
What would need to change:
Fedora offers a git repository to push source trees to.
A new file in the dist-git repository could be added if the
packager wishes called "source-repos". In it, it contains a git url and commit identifier. For example, an entry might look like: " https://src.fedoraproject.org/sources/kernel.git v5.6" where v5.6 is a tag in the repository. We can restrict it so the git repository must be hosted by Fedora so we keep all the sources forever.
- fedpkg and fedpkg-minimal would need to be updated to pull the source tree if the "source-repos" file is found and run "git archive". Fortunately this work is actually already done
since Red Hat's version of fedpkg already supports this.
I'm happy do to all the work for fedpkg/fedpkg-minimal to make this possible because the other option is to add a bunch of hacks to the kernel tooling to spit out a bunch of incremental tarballs to reduce what we have to upload.
I assume this is something that will need to go through the packaging SIG, but from an infra side of things are there any thoughts/concerns?
So, this was actually discussed already at some length:
https://pagure.io/releng/issue/7498
Most of my concerns were answered, but then nim suggested this could already be done with the macros we have today and no one said "no, that won't work for us" so I stopped doing anything until someone answered and then it dropped off my radar. ;)
The macros would, I think, work for us, but we've actually got everything in a single repository and do a bunch of stuff (build valid configs for each arch, expand a templated specfile, etc) so we're just running git-archive locally rather than downloading it from the forge.
So yeah, if no one's bothered about wasting a bunch of space with tarballs...
- Jeremy
On Wed, Apr 8, 2020 at 4:27 PM Jeremy Cline jeremy@jcline.org wrote:
Hi folks,
The Fedora kernel is moving to maintaining the package in a source (sometimes people refer to it as an "exploded") tree. Basically just a fork of upstream. This makes a lot of packager tasks easier, but has introduced a minor issue with respect to the lookaside cache.
Right now, it's configured to create a tarball from the git tree and upload it to the lookaside cache for each build. We build the rawhide kernel every weekday (give or take) and the xz compressed source tarball is ~110MB. This works out to about 28GB per year for Rawhide alone (if this is a drop in the bucket and no one cares please let me know and we'll just do this). The old approach uploaded a release tarball and then incremental tarballs on top of that.
If, however, Fedora allowed packagers to optionally generate tarballs from a git repository we could just push the linux git repository. The entire repository with history going back 15 years is under 4GB total, which is pretty good when compared to ~419GB which is the space required for the equivalent time using the lookaside cache.
What would need to change:
Fedora offers a git repository to push source trees to.
A new file in the dist-git repository could be added if the packager wishes called "source-repos". In it, it contains a git url and commit identifier. For example, an entry might look like: "
https://src.fedoraproject.org/sources/kernel.git v5.6" where v5.6 is a tag in the repository. We can restrict it so the git repository must be hosted by Fedora so we keep all the sources forever.
- fedpkg and fedpkg-minimal would need to be updated to pull the source tree if the "source-repos" file is found and run "git archive". Fortunately this work is actually already done since Red Hat's version of fedpkg already supports this.
I'm happy do to all the work for fedpkg/fedpkg-minimal to make this possible because the other option is to add a bunch of hacks to the kernel tooling to spit out a bunch of incremental tarballs to reduce what we have to upload.
I assume this is something that will need to go through the packaging SIG, but from an infra side of things are there any thoughts/concerns?
At least with this _specific_ proposal, I don't see too many issues. Adding a "sources" namespace to Pagure and setting up a workflow for that isn't a horrible idea.
I still feel like my general concerns in original proposal from two years ago[1] haven't been sufficiently addressed. But, given that you seem to have a specific idea in mind here, my questions about this for the kernel (and others that would opt into this workflow):
* Are you okay with imposing the same restrictions we have on rpms/*, modules/*, flatpaks/*, and containers/* for sources/*? That is, no rewriting history, no branch deletion, no tag deletion, etc. * Are you okay with blocking the usage of submodules, Git LFS, Git-Annex, or any other mechanism that allows bypassing our protections or cannot be replicated from an upstream repo locally?
[1]: https://pagure.io/releng/issue/7498
-- 真実はいつも一つ!/ Always, there's only one truth!
On Thu, 9 Apr 2020 at 00:48, Neal Gompa ngompa13@gmail.com wrote:
On Wed, Apr 8, 2020 at 4:27 PM Jeremy Cline jeremy@jcline.org wrote:
Hi folks,
The Fedora kernel is moving to maintaining the package in a source (sometimes people refer to it as an "exploded") tree. Basically just a fork of upstream. This makes a lot of packager tasks easier, but has introduced a minor issue with respect to the lookaside cache.
Right now, it's configured to create a tarball from the git tree and upload it to the lookaside cache for each build. We build the rawhide kernel every weekday (give or take) and the xz compressed source tarball is ~110MB. This works out to about 28GB per year for Rawhide alone (if this is a drop in the bucket and no one cares please let me know and we'll just do this). The old approach uploaded a release tarball and then incremental tarballs on top of that.
If, however, Fedora allowed packagers to optionally generate tarballs from a git repository we could just push the linux git repository. The entire repository with history going back 15 years is under 4GB total, which is pretty good when compared to ~419GB which is the space required for the equivalent time using the lookaside cache.
What would need to change:
Fedora offers a git repository to push source trees to.
A new file in the dist-git repository could be added if the packager wishes called "source-repos". In it, it contains a git url and commit identifier. For example, an entry might look like: "
https://src.fedoraproject.org/sources/kernel.git v5.6" where v5.6 is a tag in the repository. We can restrict it so the git repository must be hosted by Fedora so we keep all the sources forever.
- fedpkg and fedpkg-minimal would need to be updated to pull the source tree if the "source-repos" file is found and run "git archive". Fortunately this work is actually already done since Red Hat's version of fedpkg already supports this.
I'm happy do to all the work for fedpkg/fedpkg-minimal to make this possible because the other option is to add a bunch of hacks to the kernel tooling to spit out a bunch of incremental tarballs to reduce what we have to upload.
I assume this is something that will need to go through the packaging SIG, but from an infra side of things are there any thoughts/concerns?
At least with this _specific_ proposal, I don't see too many issues. Adding a "sources" namespace to Pagure and setting up a workflow for that isn't a horrible idea.
I still feel like my general concerns in original proposal from two years ago[1] haven't been sufficiently addressed. But, given that you seem to have a specific idea in mind here, my questions about this for the kernel (and others that would opt into this workflow):
- Are you okay with imposing the same restrictions we have on rpms/*,
modules/*, flatpaks/*, and containers/* for sources/*? That is, no rewriting history, no branch deletion, no tag deletion, etc.
- Are you okay with blocking the usage of submodules, Git LFS,
Git-Annex, or any other mechanism that allows bypassing our protections or cannot be replicated from an upstream repo locally?
I would just like to note that this point is not precise. Usage of git submodules (and other technologies) is completely alright if they still point to src.fp.o. Is there a source for the point so that I can open a PR to fix it?
Thank you
-- 真実はいつも一つ!/ Always, there's only one truth! _______________________________________________ infrastructure mailing list -- infrastructure@lists.fedoraproject.org To unsubscribe send an email to infrastructure-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@lists.fedorapro...
On Wed, Apr 8, 2020 at 6:55 PM clime clime@fedoraproject.org wrote:
On Thu, 9 Apr 2020 at 00:48, Neal Gompa ngompa13@gmail.com wrote:
On Wed, Apr 8, 2020 at 4:27 PM Jeremy Cline jeremy@jcline.org wrote:
Hi folks,
The Fedora kernel is moving to maintaining the package in a source (sometimes people refer to it as an "exploded") tree. Basically just a fork of upstream. This makes a lot of packager tasks easier, but has introduced a minor issue with respect to the lookaside cache.
Right now, it's configured to create a tarball from the git tree and upload it to the lookaside cache for each build. We build the rawhide kernel every weekday (give or take) and the xz compressed source tarball is ~110MB. This works out to about 28GB per year for Rawhide alone (if this is a drop in the bucket and no one cares please let me know and we'll just do this). The old approach uploaded a release tarball and then incremental tarballs on top of that.
If, however, Fedora allowed packagers to optionally generate tarballs from a git repository we could just push the linux git repository. The entire repository with history going back 15 years is under 4GB total, which is pretty good when compared to ~419GB which is the space required for the equivalent time using the lookaside cache.
What would need to change:
Fedora offers a git repository to push source trees to.
A new file in the dist-git repository could be added if the packager wishes called "source-repos". In it, it contains a git url and commit identifier. For example, an entry might look like: "
https://src.fedoraproject.org/sources/kernel.git v5.6" where v5.6 is a tag in the repository. We can restrict it so the git repository must be hosted by Fedora so we keep all the sources forever.
- fedpkg and fedpkg-minimal would need to be updated to pull the source tree if the "source-repos" file is found and run "git archive". Fortunately this work is actually already done since Red Hat's version of fedpkg already supports this.
I'm happy do to all the work for fedpkg/fedpkg-minimal to make this possible because the other option is to add a bunch of hacks to the kernel tooling to spit out a bunch of incremental tarballs to reduce what we have to upload.
I assume this is something that will need to go through the packaging SIG, but from an infra side of things are there any thoughts/concerns?
At least with this _specific_ proposal, I don't see too many issues. Adding a "sources" namespace to Pagure and setting up a workflow for that isn't a horrible idea.
I still feel like my general concerns in original proposal from two years ago[1] haven't been sufficiently addressed. But, given that you seem to have a specific idea in mind here, my questions about this for the kernel (and others that would opt into this workflow):
- Are you okay with imposing the same restrictions we have on rpms/*,
modules/*, flatpaks/*, and containers/* for sources/*? That is, no rewriting history, no branch deletion, no tag deletion, etc.
- Are you okay with blocking the usage of submodules, Git LFS,
Git-Annex, or any other mechanism that allows bypassing our protections or cannot be replicated from an upstream repo locally?
I would just like to note that this point is not precise. Usage of git submodules (and other technologies) is completely alright if they still point to src.fp.o. Is there a source for the point so that I can open a PR to fix it?
Making foreign repositories do that isn't straightforward. You would have to edit the repositories and change all the submodules, download and reimport all the LFS/Annex objects, etc. And that all tampers with the repository itself in ways that break the concept of having pristine trees mirrored to build from.
(Source: have done no less than two major SCM migrations and had to deal with all of these problems)
On Thu, 9 Apr 2020 at 01:08, Neal Gompa ngompa13@gmail.com wrote:
On Wed, Apr 8, 2020 at 6:55 PM clime clime@fedoraproject.org wrote:
On Thu, 9 Apr 2020 at 00:48, Neal Gompa ngompa13@gmail.com wrote:
On Wed, Apr 8, 2020 at 4:27 PM Jeremy Cline jeremy@jcline.org wrote:
Hi folks,
The Fedora kernel is moving to maintaining the package in a source (sometimes people refer to it as an "exploded") tree. Basically just a fork of upstream. This makes a lot of packager tasks easier, but has introduced a minor issue with respect to the lookaside cache.
Right now, it's configured to create a tarball from the git tree and upload it to the lookaside cache for each build. We build the rawhide kernel every weekday (give or take) and the xz compressed source tarball is ~110MB. This works out to about 28GB per year for Rawhide alone (if this is a drop in the bucket and no one cares please let me know and we'll just do this). The old approach uploaded a release tarball and then incremental tarballs on top of that.
If, however, Fedora allowed packagers to optionally generate tarballs from a git repository we could just push the linux git repository. The entire repository with history going back 15 years is under 4GB total, which is pretty good when compared to ~419GB which is the space required for the equivalent time using the lookaside cache.
What would need to change:
Fedora offers a git repository to push source trees to.
A new file in the dist-git repository could be added if the packager wishes called "source-repos". In it, it contains a git url and commit identifier. For example, an entry might look like: "
https://src.fedoraproject.org/sources/kernel.git v5.6" where v5.6 is a tag in the repository. We can restrict it so the git repository must be hosted by Fedora so we keep all the sources forever.
- fedpkg and fedpkg-minimal would need to be updated to pull the source tree if the "source-repos" file is found and run "git archive". Fortunately this work is actually already done since Red Hat's version of fedpkg already supports this.
I'm happy do to all the work for fedpkg/fedpkg-minimal to make this possible because the other option is to add a bunch of hacks to the kernel tooling to spit out a bunch of incremental tarballs to reduce what we have to upload.
I assume this is something that will need to go through the packaging SIG, but from an infra side of things are there any thoughts/concerns?
At least with this _specific_ proposal, I don't see too many issues. Adding a "sources" namespace to Pagure and setting up a workflow for that isn't a horrible idea.
I still feel like my general concerns in original proposal from two years ago[1] haven't been sufficiently addressed. But, given that you seem to have a specific idea in mind here, my questions about this for the kernel (and others that would opt into this workflow):
- Are you okay with imposing the same restrictions we have on rpms/*,
modules/*, flatpaks/*, and containers/* for sources/*? That is, no rewriting history, no branch deletion, no tag deletion, etc.
- Are you okay with blocking the usage of submodules, Git LFS,
Git-Annex, or any other mechanism that allows bypassing our protections or cannot be replicated from an upstream repo locally?
I would just like to note that this point is not precise. Usage of git submodules (and other technologies) is completely alright if they still point to src.fp.o. Is there a source for the point so that I can open a PR to fix it?
Making foreign repositories do that isn't straightforward. You would have to edit the repositories and change all the submodules, download and reimport all the LFS/Annex objects, etc. And that all tampers with the repository itself in ways that break the concept of having pristine trees mirrored to build from.
Sorry, I don't understand:
you make a git submodule for src.fp.o repo which points to another src.fp.o repo.
when you push, there is a hook in src.fp.o that checks if there is any submodule and checks that it has the same origin as the repo the submodule is in (i.e. src.fp.o).
and then during build you can clone with `--recurse-submodules`.
I don't really understand what you meant with "foreign repositories", downloading LFS/Annex objects etc.
(Source: have done no less than two major SCM migrations and had to deal with all of these problems)
-- 真実はいつも一つ!/ Always, there's only one truth! _______________________________________________ infrastructure mailing list -- infrastructure@lists.fedoraproject.org To unsubscribe send an email to infrastructure-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@lists.fedorapro...
On Wed, Apr 8, 2020 at 7:17 PM clime clime@fedoraproject.org wrote:
On Thu, 9 Apr 2020 at 01:08, Neal Gompa ngompa13@gmail.com wrote:
On Wed, Apr 8, 2020 at 6:55 PM clime clime@fedoraproject.org wrote:
On Thu, 9 Apr 2020 at 00:48, Neal Gompa ngompa13@gmail.com wrote:
On Wed, Apr 8, 2020 at 4:27 PM Jeremy Cline jeremy@jcline.org wrote:
Hi folks,
The Fedora kernel is moving to maintaining the package in a source (sometimes people refer to it as an "exploded") tree. Basically just a fork of upstream. This makes a lot of packager tasks easier, but has introduced a minor issue with respect to the lookaside cache.
Right now, it's configured to create a tarball from the git tree and upload it to the lookaside cache for each build. We build the rawhide kernel every weekday (give or take) and the xz compressed source tarball is ~110MB. This works out to about 28GB per year for Rawhide alone (if this is a drop in the bucket and no one cares please let me know and we'll just do this). The old approach uploaded a release tarball and then incremental tarballs on top of that.
If, however, Fedora allowed packagers to optionally generate tarballs from a git repository we could just push the linux git repository. The entire repository with history going back 15 years is under 4GB total, which is pretty good when compared to ~419GB which is the space required for the equivalent time using the lookaside cache.
What would need to change:
Fedora offers a git repository to push source trees to.
A new file in the dist-git repository could be added if the packager wishes called "source-repos". In it, it contains a git url and commit identifier. For example, an entry might look like: "
https://src.fedoraproject.org/sources/kernel.git v5.6" where v5.6 is a tag in the repository. We can restrict it so the git repository must be hosted by Fedora so we keep all the sources forever.
- fedpkg and fedpkg-minimal would need to be updated to pull the source tree if the "source-repos" file is found and run "git archive". Fortunately this work is actually already done since Red Hat's version of fedpkg already supports this.
I'm happy do to all the work for fedpkg/fedpkg-minimal to make this possible because the other option is to add a bunch of hacks to the kernel tooling to spit out a bunch of incremental tarballs to reduce what we have to upload.
I assume this is something that will need to go through the packaging SIG, but from an infra side of things are there any thoughts/concerns?
At least with this _specific_ proposal, I don't see too many issues. Adding a "sources" namespace to Pagure and setting up a workflow for that isn't a horrible idea.
I still feel like my general concerns in original proposal from two years ago[1] haven't been sufficiently addressed. But, given that you seem to have a specific idea in mind here, my questions about this for the kernel (and others that would opt into this workflow):
- Are you okay with imposing the same restrictions we have on rpms/*,
modules/*, flatpaks/*, and containers/* for sources/*? That is, no rewriting history, no branch deletion, no tag deletion, etc.
- Are you okay with blocking the usage of submodules, Git LFS,
Git-Annex, or any other mechanism that allows bypassing our protections or cannot be replicated from an upstream repo locally?
I would just like to note that this point is not precise. Usage of git submodules (and other technologies) is completely alright if they still point to src.fp.o. Is there a source for the point so that I can open a PR to fix it?
Making foreign repositories do that isn't straightforward. You would have to edit the repositories and change all the submodules, download and reimport all the LFS/Annex objects, etc. And that all tampers with the repository itself in ways that break the concept of having pristine trees mirrored to build from.
Sorry, I don't understand:
you make a git submodule for src.fp.o repo which points to another src.fp.o repo.
when you push, there is a hook in src.fp.o that checks if there is any submodule and checks that it has the same origin as the repo the submodule is in (i.e. src.fp.o).
and then during build you can clone with `--recurse-submodules`.
I don't really understand what you meant with "foreign repositories", downloading LFS/Annex objects etc.
If you're doing mirrors and building from mirrored Git repos (as essentially what Jeremy is talking about), what you're suggesting is simply not possible or scalable.
-- 真実はいつも一つ!/ Always, there's only one truth!
On Thu, 9 Apr 2020 at 01:28, Neal Gompa ngompa13@gmail.com wrote:
On Wed, Apr 8, 2020 at 7:17 PM clime clime@fedoraproject.org wrote:
On Thu, 9 Apr 2020 at 01:08, Neal Gompa ngompa13@gmail.com wrote:
On Wed, Apr 8, 2020 at 6:55 PM clime clime@fedoraproject.org wrote:
On Thu, 9 Apr 2020 at 00:48, Neal Gompa ngompa13@gmail.com wrote:
On Wed, Apr 8, 2020 at 4:27 PM Jeremy Cline jeremy@jcline.org wrote:
Hi folks,
The Fedora kernel is moving to maintaining the package in a source (sometimes people refer to it as an "exploded") tree. Basically just a fork of upstream. This makes a lot of packager tasks easier, but has introduced a minor issue with respect to the lookaside cache.
Right now, it's configured to create a tarball from the git tree and upload it to the lookaside cache for each build. We build the rawhide kernel every weekday (give or take) and the xz compressed source tarball is ~110MB. This works out to about 28GB per year for Rawhide alone (if this is a drop in the bucket and no one cares please let me know and we'll just do this). The old approach uploaded a release tarball and then incremental tarballs on top of that.
If, however, Fedora allowed packagers to optionally generate tarballs from a git repository we could just push the linux git repository. The entire repository with history going back 15 years is under 4GB total, which is pretty good when compared to ~419GB which is the space required for the equivalent time using the lookaside cache.
What would need to change:
Fedora offers a git repository to push source trees to.
A new file in the dist-git repository could be added if the packager wishes called "source-repos". In it, it contains a git url and commit identifier. For example, an entry might look like: "
https://src.fedoraproject.org/sources/kernel.git v5.6" where v5.6 is a tag in the repository. We can restrict it so the git repository must be hosted by Fedora so we keep all the sources forever.
- fedpkg and fedpkg-minimal would need to be updated to pull the source tree if the "source-repos" file is found and run "git archive". Fortunately this work is actually already done since Red Hat's version of fedpkg already supports this.
I'm happy do to all the work for fedpkg/fedpkg-minimal to make this possible because the other option is to add a bunch of hacks to the kernel tooling to spit out a bunch of incremental tarballs to reduce what we have to upload.
I assume this is something that will need to go through the packaging SIG, but from an infra side of things are there any thoughts/concerns?
At least with this _specific_ proposal, I don't see too many issues. Adding a "sources" namespace to Pagure and setting up a workflow for that isn't a horrible idea.
I still feel like my general concerns in original proposal from two years ago[1] haven't been sufficiently addressed. But, given that you seem to have a specific idea in mind here, my questions about this for the kernel (and others that would opt into this workflow):
- Are you okay with imposing the same restrictions we have on rpms/*,
modules/*, flatpaks/*, and containers/* for sources/*? That is, no rewriting history, no branch deletion, no tag deletion, etc.
- Are you okay with blocking the usage of submodules, Git LFS,
Git-Annex, or any other mechanism that allows bypassing our protections or cannot be replicated from an upstream repo locally?
I would just like to note that this point is not precise. Usage of git submodules (and other technologies) is completely alright if they still point to src.fp.o. Is there a source for the point so that I can open a PR to fix it?
Making foreign repositories do that isn't straightforward. You would have to edit the repositories and change all the submodules, download and reimport all the LFS/Annex objects, etc. And that all tampers with the repository itself in ways that break the concept of having pristine trees mirrored to build from.
Sorry, I don't understand:
you make a git submodule for src.fp.o repo which points to another src.fp.o repo.
when you push, there is a hook in src.fp.o that checks if there is any submodule and checks that it has the same origin as the repo the submodule is in (i.e. src.fp.o).
and then during build you can clone with `--recurse-submodules`.
I don't really understand what you meant with "foreign repositories", downloading LFS/Annex objects etc.
If you're doing mirrors and building from mirrored Git repos (as essentially what Jeremy is talking about), what you're suggesting is simply not possible or scalable.
Sorry, I still don't get your point.
Jeremy's solution was about introducing new namespace on src.fp.o where mirrored upstream repo will be.
Then it was about introducing the file "source-repos" that contains a git url and commit identifier and points to that mirrored repository.
This file is essentially trying to emulate the same functionality which is already included in git - git submodules.
So I am suggesting to use git submodules instead of inventing a new custom solution to solve the same problem.
Then I was also talking about how it is possible to archive those submodules by using a dedicated macro (rpkg macro in that case).
Can you explain what is not possible or scalable in that?
-- 真実はいつも一つ!/ Always, there's only one truth! _______________________________________________ infrastructure mailing list -- infrastructure@lists.fedoraproject.org To unsubscribe send an email to infrastructure-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@lists.fedorapro...
On Wed, 2020-04-08 at 18:38 -0400, Neal Gompa wrote:
At least with this _specific_ proposal, I don't see too many issues. Adding a "sources" namespace to Pagure and setting up a workflow for that isn't a horrible idea.
I still feel like my general concerns in original proposal from two years ago[1] haven't been sufficiently addressed. But, given that you seem to have a specific idea in mind here, my questions about this for the kernel (and others that would opt into this workflow):
- Are you okay with imposing the same restrictions we have on rpms/*,
modules/*, flatpaks/*, and containers/* for sources/*? That is, no rewriting history, no branch deletion, no tag deletion, etc.
- Are you okay with blocking the usage of submodules, Git LFS,
Git-Annex, or any other mechanism that allows bypassing our protections or cannot be replicated from an upstream repo locally?
I'll dig into[1] tomorrow to see if the existing stuff Kevin mentioned would work for us, but I can say that I'm fine with all those restrictions.
- Jeremy
Jeremy, have you also considered storing everything in a single repository? Instead of having two.
On Wed, Apr 8, 2020 at 10:27 PM Jeremy Cline jeremy@jcline.org wrote:
Hi folks,
The Fedora kernel is moving to maintaining the package in a source (sometimes people refer to it as an "exploded") tree. Basically just a fork of upstream. This makes a lot of packager tasks easier, but has introduced a minor issue with respect to the lookaside cache.
Right now, it's configured to create a tarball from the git tree and upload it to the lookaside cache for each build. We build the rawhide kernel every weekday (give or take) and the xz compressed source tarball is ~110MB. This works out to about 28GB per year for Rawhide alone (if this is a drop in the bucket and no one cares please let me know and we'll just do this). The old approach uploaded a release tarball and then incremental tarballs on top of that.
If, however, Fedora allowed packagers to optionally generate tarballs from a git repository we could just push the linux git repository. The entire repository with history going back 15 years is under 4GB total, which is pretty good when compared to ~419GB which is the space required for the equivalent time using the lookaside cache.
What would need to change:
Fedora offers a git repository to push source trees to.
A new file in the dist-git repository could be added if the packager wishes called "source-repos". In it, it contains a git url and commit identifier. For example, an entry might look like: "
https://src.fedoraproject.org/sources/kernel.git v5.6" where v5.6 is a tag in the repository. We can restrict it so the git repository must be hosted by Fedora so we keep all the sources forever.
- fedpkg and fedpkg-minimal would need to be updated to pull the source tree if the "source-repos" file is found and run "git archive". Fortunately this work is actually already done since Red Hat's version of fedpkg already supports this.
I'm happy do to all the work for fedpkg/fedpkg-minimal to make this possible because the other option is to add a bunch of hacks to the kernel tooling to spit out a bunch of incremental tarballs to reduce what we have to upload.
I assume this is something that will need to go through the packaging SIG, but from an infra side of things are there any thoughts/concerns?
Regards, Jeremy _______________________________________________ infrastructure mailing list -- infrastructure@lists.fedoraproject.org To unsubscribe send an email to infrastructure-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@lists.fedorapro...
On Thu, 2020-04-09 at 14:47 +0200, Tomas Tomecek wrote:
Jeremy, have you also considered storing everything in a single repository? Instead of having two.
We actually are. I assume you're asking about why we're not using packit. We're not on GitHub so the service isn't (as far as I can tell) useful to us. There's huge piles of existing bash scripts and makefiles that achieve about what I think packit-the-cli would give us so it would be some amount of work with no obvious benefit to move at the moment. Generally speaking, though, I'm not against the idea.
- Jeremy
On Thu, Apr 9, 2020 at 4:34 PM Jeremy Cline jeremy@jcline.org wrote:
We actually are. I assume you're asking about why we're not using packit. We're not on GitHub so the service isn't (as far as I can tell) useful to us. There's huge piles of existing bash scripts and makefiles that achieve about what I think packit-the-cli would give us so it would be some amount of work with no obvious benefit to move at the moment. Generally speaking, though, I'm not against the idea.
When we started packit, we played with the scripts and makefiles in your kernel repo and I hope it's not too bold of me to say that it shouldn't be that hard to integrate the two now.
I'm assuming you're using pagure.io - at this point, we're not going to integrate packit with pagure (for obvious reasons). What are your future plans for the git forge?
The benefits would be that you'd get all the features of packit which we have right now and those we implement in future.
Tomas
On Thu, Apr 9, 2020 at 10:41 AM Tomas Tomecek ttomecek@redhat.com wrote:
On Thu, Apr 9, 2020 at 4:34 PM Jeremy Cline jeremy@jcline.org wrote:
We actually are. I assume you're asking about why we're not using packit. We're not on GitHub so the service isn't (as far as I can tell) useful to us. There's huge piles of existing bash scripts and makefiles that achieve about what I think packit-the-cli would give us so it would be some amount of work with no obvious benefit to move at the moment. Generally speaking, though, I'm not against the idea.
When we started packit, we played with the scripts and makefiles in your kernel repo and I hope it's not too bold of me to say that it shouldn't be that hard to integrate the two now.
I'm assuming you're using pagure.io - at this point, we're not going to integrate packit with pagure (for obvious reasons). What are your future plans for the git forge?
The benefits would be that you'd get all the features of packit which we have right now and those we implement in future.
While the scripts have changed a bit, I don't see any actual benefit to packit with what we are doing. The script to create a release builds the dist-git. This was more about managing the lookaside so that we weren't uploading quite as much with the daily kernel builds in rawhide.
Justin
On Thu, 9 Apr 2020 at 17:58, Justin Forbes jmforbes@linuxtx.org wrote:
On Thu, Apr 9, 2020 at 10:41 AM Tomas Tomecek ttomecek@redhat.com wrote:
On Thu, Apr 9, 2020 at 4:34 PM Jeremy Cline jeremy@jcline.org wrote:
We actually are. I assume you're asking about why we're not using packit. We're not on GitHub so the service isn't (as far as I can tell) useful to us. There's huge piles of existing bash scripts and makefiles that achieve about what I think packit-the-cli would give us so it would be some amount of work with no obvious benefit to move at the moment. Generally speaking, though, I'm not against the idea.
When we started packit, we played with the scripts and makefiles in your kernel repo and I hope it's not too bold of me to say that it shouldn't be that hard to integrate the two now.
I'm assuming you're using pagure.io - at this point, we're not going to integrate packit with pagure (for obvious reasons). What are your future plans for the git forge?
The benefits would be that you'd get all the features of packit which we have right now and those we implement in future.
While the scripts have changed a bit, I don't see any actual benefit to packit with what we are doing. The script to create a release builds the dist-git. This was more about managing the lookaside so that we weren't uploading quite as much with the daily kernel builds in rawhide.
I still recommend using the rpkg-preprocessor approach that I suggested here or at least looking at the possibility to use it and finding out what should be done so that it is useful for you.
I can cooperate on this with you. I am looking for some cooperation for quite a long time.
The preprocessor has support for custom macros that you could use from your spec file. These macros could hook into your already created scripts. I admit I don't know your exact workflow but I firmly believe you could find the tool valuable.
Best regards clime
Justin _______________________________________________ infrastructure mailing list -- infrastructure@lists.fedoraproject.org To unsubscribe send an email to infrastructure-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@lists.fedorapro...
infrastructure@lists.fedoraproject.org