https://fedoraproject.org/wiki/Changes/ReproducibleBuildsClampMtimes
This document represents a proposed Change. As part of the Changes process, proposals are publicly announced in order to receive community feedback. This proposal will only be implemented if approved by the Fedora Engineering Steering Committee.
== Summary ==
The `%clamp_mtime_to_source_date_epoch` RPM macro will be set to `1`. When an RPM package is built, mtimes of packaged files will be clamped to `$SOURCE_DATE_EPOCH` which is already set to the date of the latest `%changelog` entry. As a result, more RPM packages will be reproducible: The actual modification time of files that are e.g. modified in the `%prep` section or built in the `%build` section will not be reflected in the resulting RPM packages. Files in RPM packages will have mtimes that are independent of the time of the actual build.
== Owner == * Name: [[User:Churchyard|Miro Hrončok]], [[User:Zbyszek|Zbigniew Jędrzejewski-Szmek]] * Email: mhroncok at redhat.com, zbyszek at in.waw.pl
== Detailed Description == This change exists to make RPM package builds more reproducible. A common problem that prevents [https://reproducible-builds.org/ build reproducibility] is the mtime (modification times) of the packaged files.
Suppose we package an RPM package of software called `skynet` in version `1.0`. Upstream released this version at datetime A. A Fedora packager creates the RPM package at datetime B. Unfortunately, the packager needs to patch the sources in the RPM `%prep` section. When the build runs at datetime C, the modification datetime of the patched file is set to C. When the build runs again in an otherwise identical environment at datetime D, the modification datetime of the patched file is set to D. As a result, the build is not bit-by-bit reproducible, because the datetime of the build is saved in the resulting package. Patching is not necessary to make this happen. When a source file is compiled into a binary file, the modification datetime is also set to the datetime of the build. In practice, the modification datetime of many files packaged in RPM packages is dependent on when the package was actually built.
To eliminate this problem, we propose to clamp build mtimes to `$SOURCE_DATE_EPOCH`. RPM build in Fedora already sets the `$SOURCE_DATE_EPOCH` environment variable based on the latest `%changelog` entry because the `%source_date_epoch_from_changelog` macro is set to `1`. We will also set the `%clamp_mtime_to_source_date_epoch` macro to `1`. As a result, when files are packaged to the RPM package, their modification datetimes will be clamped to `$SOURCE_DATE_EPOCH` (to the latest changelog entry datetime). Clamping means that all files which would otherwise have a modification datetime higher than `$SOURCE_DATE_EPOCH` will have the modification datetime changed to `$SOURCE_DATE_EPOCH`; files with mtime lower (or equal) to `$SOURCE_DATE_EPOCH` will retain the original mtimes.
This functionality is already implemented in RPM. We will enable it by setting `%clamp_mtime_to_source_date_epoch` to `1`.
=== Non-goal ===
We do not aim to make all Fedora packages reproducible (at least not as part of this change proposal). We just eliminate one problem that we consider the biggest blocker for reproducible builds.
=== Python bytecode ===
When Python bytecode cache (a `.pyc` file) is built, the mtime of the corresponding Python source file (`.py`) is included in it for invalidation purposes. Since the `.pyc` file is created before RPM clamps the mtime of the `.py` file, the mtime stored in the `.pyc` file might be higher than the corresponding mtime of the `.py` file.
With the previous example, if `skynet` is written in Python: # `skynet.py` is modified in `%prep` and hence has mtime set to the time of the build # `skynet.pyc` is generated in `%install` and the mtime of `skynet.py` is saved in it # RPM clamps the mtime of `skynet.py` # `skynet.pyc` is considered invalid by Python on runtime, as the stored and actual mtime of `skynet.py` don't match
To solve this, we will modify Python to clamp the stored mtime to `$SOURCE_DATE_EPOCH` as well (when building RPM packages). Upstream Python chooses to invalidate bytecode cache based on hashes instead of mtimes when `$SOURCE_DATE_EPOCH` is set, but that could cause performance issues for big files, so Fedora's Python already deviates from upstream behavior when building RPM packages. To avoid accidentally breaking the behavior when `%clamp_mtime_to_source_date_epoch` is not set to `1`, RPM macros and buildroot policy scripts for creating the Python bytecode cache will be modified to unset `$SOURCE_DATE_EPOCH` when `%clamp_mtime_to_source_date_epoch` is not set to `1`.
This behavior might be proposed upstream if it turns out to be superior to the current upstream choice, in case we [https://discuss.python.org/t/14594 won't redesign the bytecode-source relationship entirely] instead.
=== Opting out ===
Packages broken by this new behavior can unset `%clamp_mtime_to_source_date_epoch` but packagers are encouraged to fix the problem instead.
== Feedback == Enabling this RPM feature was [https://src.fedoraproject.org/rpms/redhat-rpm-config/pull-request/126 proposed as a pull request] to {{package|redhat-rpm-config}} in April 2021. It received good feedback with the exception of the following:
* it was said the change needs to be coordinated with the Python maintainers * it was said the change should be done via a change process for better coordination and exposure
We believe that by proposing this via the change process and planning for the changes needed in Python, both issues are addressed.
== Benefit to Fedora == We believe that many RPM packages will become reproducible and others will be more reproducible than before. The benefits of reproducible builds are better explained at https://reproducible-builds.org/
== Scope == * Proposal owners: ** Propose a PR for {{package|redhat-rpm-config}} (set `%clamp_mtime_to_source_date_epoch` to `1`, possibly only when `%source_date_epoch_from_changelog` is set) ** Propose a PR for {{package|python-rpm-macros}} (unset `$SOURCE_DATE_EPOCH` while creating `.pyc` files iff `%clamp_mtime_to_source_date_epoch` is not `1`) ** Propose a PR for [https://src.fedoraproject.org/rpms/python3.11/blob/b2d80045f9/f/00328-pyc-ti... the Python's bytecode invalidation mode patch] for all Python versions that have it ** Backport (the new portion of) the patch to older Pythons ({{package|python2.7}}, {{package|python3.6}} and PyPys) ** Test everything together in Copr and deploy it if it works. ** Optional: Run some reproducibility tests before and after this change and produce some statistics.
* Other developers: ** Test their packages with the new behavior, report problems, and opt-out if really needed. * Release engineering: N/A (not needed for this Change) * Policies and guidelines: N/A (not needed for this Change) * Trademark approval: N/A (not needed for this Change) * Alignment with Objectives: N/A (not needed for this Change)
== Upgrade/compatibility impact == Nothing anticipated.
== How To Test == The change owners plan to perform a mass rebuild in Copr to see if this breaks anything significantly. If it actually works as anticipated, they also plan to run some reproducibility tests and hopefully produce some statistics before and after this change.
Other packages can test by building their packages and verifying they still work as expected and no packaged files have higher mtimes than the last `%changelog` entry.
To verify if this change has landed, run: `rpm --eval '%clamp_mtime_to_source_date_epoch'` on Fedora 38. The result should be `1`.
== User Experience == Users of Fedora Linux on their machines should not be impacted at all. Users who build RPM packages atop Fedora will be impacted by this change the same way Fedora is.
== Dependencies ==
* RPM needs to support this (it already does) * RPM needs to set `$SOURCE_DATE_EPOCH` (it already does)
== Contingency Plan ==
* Contingency mechanism: The change owners or {{package|redhat-rpm-config}} maintainers or proven packagers will revert the change in {{package|redhat-rpm-config}}. That should be enough to undo anything as the changes in Python should be dependent on that. If not enough, revert everything. * Contingency deadline: Ideally, we should do this before the Mass Rebuild. Technically, we can land it any time before the Beta Freeze, but it would not change all the packages, which is a bit messy. * Blocks release? No <
== Documentation ==
This page is the documentation.
== Release Notes ==
V Thu, Nov 10, 2022 at 03:23:49PM -0500, Ben Cotton napsal(a):
https://fedoraproject.org/wiki/Changes/ReproducibleBuildsClampMtimes
== Summary ==
The `%clamp_mtime_to_source_date_epoch` RPM macro will be set to `1`. When an RPM package is built, mtimes of packaged files will be clamped to `$SOURCE_DATE_EPOCH`
Clamp as capping maximal mtime, or resetting mtime for all files? I.e. If I had a source file dated 1970-01-01 and installed it with "install -p", will the packaged file retain that 1970-01-01 date, or will it be set to the date of the latest changlog, e.g. 2022-11-11? In other words, will all files in a package have the same mtime, or there won't be an mtime newer than the changelog entry?
which is already set to the date of the latest `%changelog` entry.
What's a changelog entry date in case of rpmautospec changelog? Is it a git AuthorDate or CommitDate?
As a result, more RPM packages will be reproducible:
Where will this reproducibility stop? An RPM package itself carry a build time in its RPM header. Are we also going to fake this time in the name of reproducibility?
What value these faked timestamps have? E.g. a compiled file is a function not only of its source, but also of the compiler. This proposed change removes the compiler part from the timestamp. Will timestamps like this be helpful?
Wouldn't be easier to admit that timesamps are nonsense and simply eradicate all of them stamps from various data formats rather than trying to fake them? Simply changing rpmbuild to set timestamp to 0 for all contained files, or removing the time attribute from the RPM format completely?
-- Petr
On Fri, Nov 11, 2022 at 11:53 AM Petr Pisar ppisar@redhat.com wrote:
V Thu, Nov 10, 2022 at 03:23:49PM -0500, Ben Cotton napsal(a):
https://fedoraproject.org/wiki/Changes/ReproducibleBuildsClampMtimes
== Summary ==
The `%clamp_mtime_to_source_date_epoch` RPM macro will be set to `1`. When an RPM package is built, mtimes of packaged files will be clamped to `$SOURCE_DATE_EPOCH`
Clamp as capping maximal mtime, or resetting mtime for all files? I.e. If I had a source file dated 1970-01-01 and installed it with "install -p", will the packaged file retain that 1970-01-01 date, or will it be set to the date of the latest changlog, e.g. 2022-11-11? In other words, will all files in a package have the same mtime, or there won't be an mtime newer than the changelog entry?
Second. Original message:
Clamping means that all files which would otherwise have a modification datetime higher than `$SOURCE_DATE_EPOCH` will have the modification datetime changed to `$SOURCE_DATE_EPOCH`; files with mtime lower (or equal) to `$SOURCE_DATE_EPOCH` will retain the original mtimes.
which is already set to the date of the latest `%changelog` entry.
What's a changelog entry date in case of rpmautospec changelog? Is it a git AuthorDate or CommitDate?
As a result, more RPM packages will be reproducible:
Where will this reproducibility stop?
Ideally, when it's achieved, and 100% of Fedora will be reproducible under reprotest =P
An RPM package itself carry a build time in its RPM header. Are we also going to fake this time in the name of reproducibility?
My opinion: yes, please do (%use_source_date_epoch_as_buildtime). And fake the builder hostname (%_buildhost). And enable back --enable-deterministic-archives in binutils: (https://bugzilla.redhat.com/show_bug.cgi?id=1195883). And do whatever else is necessary to stop shipping binary packages that users can't reproduce bit-to-bit.
What value these faked timestamps have?
None.
E.g. a compiled file is a function not only of its source, but also of the compiler.
Nods in NixOS.
This proposed change removes the compiler part from the timestamp. Will timestamps like this be helpful? Wouldn't be easier to admit that timesamps are nonsense and simply eradicate all of them stamps from various data formats rather than trying to fake them? Simply changing rpmbuild to set timestamp to 0 for all contained files, or removing the time attribute from the RPM format completely?
Would be wonderful. Mixing metadata with data has always been a mistake.
Reproducibility is at stakes with auditability, and the second must be driven off or given up on. The metainformation of which host has built the artifact and when has no place within the artifact itself.
* Alexander Sosedkin:
On Fri, Nov 11, 2022 at 11:53 AM Petr Pisar ppisar@redhat.com wrote:
An RPM package itself carry a build time in its RPM header. Are we also going to fake this time in the name of reproducibility?
My opinion: yes, please do (%use_source_date_epoch_as_buildtime). And fake the builder hostname (%_buildhost). And enable back --enable-deterministic-archives in binutils: (https://bugzilla.redhat.com/show_bug.cgi?id=1195883). And do whatever else is necessary to stop shipping binary packages that users can't reproduce bit-to-bit.
The downside of doing this is that it's no longer possible to check whether a build happened against a buildroot with a particular fix in it. The time-based check was never 100% reliable, but it could be used as a good indicator in the past.
Thanks, Florian
On Fri, Nov 11, 2022 at 2:03 PM Florian Weimer fweimer@redhat.com wrote:
- Alexander Sosedkin:
On Fri, Nov 11, 2022 at 11:53 AM Petr Pisar ppisar@redhat.com wrote:
An RPM package itself carry a build time in its RPM header. Are we also going to fake this time in the name of reproducibility?
My opinion: yes, please do (%use_source_date_epoch_as_buildtime). And fake the builder hostname (%_buildhost). And enable back --enable-deterministic-archives in binutils: (https://bugzilla.redhat.com/show_bug.cgi?id=1195883). And do whatever else is necessary to stop shipping binary packages that users can't reproduce bit-to-bit.
The downside of doing this is that it's no longer possible to check whether a build happened against a buildroot with a particular fix in it. The time-based check was never 100% reliable, but it could be used as a good indicator in the past.
No, no, false dichotomy alert. This is not a case where reproducibility rules out auditability.
Not only build system (koji) can track exact versions of builddeps (and if it doesn't, it really should, regardless of reproducibility), I'm not against including builddep versions into the artifacts, in any form, as long as it's done in a reproducible manner. E.g., I have no problem with NixOS having them hashed and used as the installation prefix, not at all.
In RPM world, I've even entertained an idea of having a subpackage for auditability not unlike how we have debuginfo, since rebuilding a package reproducibly requires builddep pinning. But if that's avoidable, I'd rather just not mix artifacts with meta.
Hi,
Alexander Sosedkin asosedkin@redhat.com wrote:
In RPM world, I've even entertained an idea of having a subpackage for auditability not unlike how we have debuginfo, since rebuilding a package reproducibly requires builddep pinning. But if that's avoidable, I’d rather just not mix artifacts with meta.
Debian is working on this already, they call those “buildinfo” files:
https://wiki.debian.org/ReproducibleBuilds/BuildinfoFiles https://manpages.debian.org/testing/dpkg-dev/deb-buildinfo.5.en.html
If we want something similar, I’d propose not to completely re-invent the wheel.
HTH, Clemens
On Fri, Nov 11, 2022 at 8:46 AM Clemens Lang cllang@redhat.com wrote:
Hi,
Alexander Sosedkin asosedkin@redhat.com wrote:
In RPM world, I've even entertained an idea of having a subpackage for auditability not unlike how we have debuginfo, since rebuilding a package reproducibly requires builddep pinning. But if that's avoidable, I’d rather just not mix artifacts with meta.
Debian is working on this already, they call those “buildinfo” files:
https://wiki.debian.org/ReproducibleBuilds/BuildinfoFiles https://manpages.debian.org/testing/dpkg-dev/deb-buildinfo.5.en.html
If we want something similar, I’d propose not to completely re-invent the wheel.
We've discussed an RPM-specific format upstream. Debian and Arch both have their own formats that are tailored to their package systems, and RPM may have one too, eventually.
On Fri, Nov 11, 2022 at 10:14:56AM -0500, Neal Gompa wrote:
On Fri, Nov 11, 2022 at 8:46 AM Clemens Lang cllang@redhat.com wrote:
Hi,
Alexander Sosedkin asosedkin@redhat.com wrote:
In RPM world, I've even entertained an idea of having a subpackage for auditability not unlike how we have debuginfo, since rebuilding a package reproducibly requires builddep pinning. But if that's avoidable, I’d rather just not mix artifacts with meta.
Debian is working on this already, they call those “buildinfo” files:
https://wiki.debian.org/ReproducibleBuilds/BuildinfoFiles https://manpages.debian.org/testing/dpkg-dev/deb-buildinfo.5.en.html
If we want something similar, I’d propose not to completely re-invent the wheel.
We've discussed an RPM-specific format upstream. Debian and Arch both have their own formats that are tailored to their package systems, and RPM may have one too, eventually.
For context, the discussion is here: https://github.com/rpm-software-management/rpm/pull/1532
The idea of faking this and that (timestamps, builder hostname, ... whatever) is weird. It always leads to a question: why do we even have / use such metadata, if we fake them anyway?
Only either ditching such values entirely or always honoring them does make sense to me. Or inventing new ones that better fit the various use cases we have (like before mentioned splitting metadata from artifacts and so on)
But if your proposal is the best we have, I'm fine with it. It's your time after all :) and I haven't noticed it would affect me (or rather "make my life harder") in any way both as a user and as a package maintainer.
--
Michal Schorm Software Engineer Core Services - Databases Team Red Hat
--
On Sat, Nov 26, 2022 at 11:09 PM Marek Marczykowski-Górecki marmarek@invisiblethingslab.com wrote:
On Fri, Nov 11, 2022 at 10:14:56AM -0500, Neal Gompa wrote:
On Fri, Nov 11, 2022 at 8:46 AM Clemens Lang cllang@redhat.com wrote:
Hi,
Alexander Sosedkin asosedkin@redhat.com wrote:
In RPM world, I've even entertained an idea of having a subpackage for auditability not unlike how we have debuginfo, since rebuilding a package reproducibly requires builddep pinning. But if that's avoidable, I’d rather just not mix artifacts with meta.
Debian is working on this already, they call those “buildinfo” files:
https://wiki.debian.org/ReproducibleBuilds/BuildinfoFiles https://manpages.debian.org/testing/dpkg-dev/deb-buildinfo.5.en.html
If we want something similar, I’d propose not to completely re-invent the wheel.
We've discussed an RPM-specific format upstream. Debian and Arch both have their own formats that are tailored to their package systems, and RPM may have one too, eventually.
For context, the discussion is here: https://github.com/rpm-software-management/rpm/pull/1532
-- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
* Alexander Sosedkin:
On Fri, Nov 11, 2022 at 2:03 PM Florian Weimer fweimer@redhat.com wrote:
- Alexander Sosedkin:
On Fri, Nov 11, 2022 at 11:53 AM Petr Pisar ppisar@redhat.com wrote:
An RPM package itself carry a build time in its RPM header. Are we also going to fake this time in the name of reproducibility?
My opinion: yes, please do (%use_source_date_epoch_as_buildtime). And fake the builder hostname (%_buildhost). And enable back --enable-deterministic-archives in binutils: (https://bugzilla.redhat.com/show_bug.cgi?id=1195883). And do whatever else is necessary to stop shipping binary packages that users can't reproduce bit-to-bit.
The downside of doing this is that it's no longer possible to check whether a build happened against a buildroot with a particular fix in it. The time-based check was never 100% reliable, but it could be used as a good indicator in the past.
No, no, false dichotomy alert. This is not a case where reproducibility rules out auditability.
Sure, not in principle. I merely wanted to point out that this takes a way a bit of information that was useful to some of us before.
Thanks, Florian
On Fri, Nov 11, 2022 at 12:05:03PM +0100, Alexander Sosedkin wrote:
On Fri, Nov 11, 2022 at 11:53 AM Petr Pisar ppisar@redhat.com wrote:
As a result, more RPM packages will be reproducible:
Where will this reproducibility stop?
Ideally, when it's achieved, and 100% of Fedora will be reproducible under reprotest =P
Exactly ;) My personal goal is to achieve 100% build reproducibility for packages and other build artifacts. This Change is just one of the first steps. Sources of non-reproducibility will be tackled one-by-one as we find them.
An RPM package itself carry a build time in its RPM header. Are we also going to fake this time in the name of reproducibility?
My opinion: yes, please do (%use_source_date_epoch_as_buildtime).
Oh, I'm surprised we don't do this yet. I think we should.
And fake the builder hostname (%_buildhost).
Yes. Maybe the best option is to set it when rebuilding an rpm to the original value.
And enable back --enable-deterministic-archives in binutils: (https://bugzilla.redhat.com/show_bug.cgi?id=1195883). And do whatever else is necessary to stop shipping binary packages that users can't reproduce bit-to-bit.
The compile-time setting is a brute hammer that affects too many things. Instead we should clamp the times inteligently where this makes sense. https://reproducible-builds.org/docs/archives/ describes the approaches that Debian took. E.g. for tar they add a --clamp-mtime flag, which seems a nicer approach. I think we'll want something similar for 'ar'.
Zbyszek
On 11. 11. 22 11:53, Petr Pisar wrote:
V Thu, Nov 10, 2022 at 03:23:49PM -0500, Ben Cotton napsal(a):
https://fedoraproject.org/wiki/Changes/ReproducibleBuildsClampMtimes
== Summary ==
The `%clamp_mtime_to_source_date_epoch` RPM macro will be set to `1`. When an RPM package is built, mtimes of packaged files will be clamped to `$SOURCE_DATE_EPOCH`
Clamp as capping maximal mtime, or resetting mtime for all files? I.e. If I had a source file dated 1970-01-01 and installed it with "install -p", will the packaged file retain that 1970-01-01 date, or will it be set to the date of the latest changlog, e.g. 2022-11-11? In other words, will all files in a package have the same mtime, or there won't be an mtime newer than the changelog entry?
Capping maximal mtime. It's actually described in the detailed description:
"""Clamping means that all files which would otherwise have a modification datetime higher than $SOURCE_DATE_EPOCH will have the modification datetime changed to $SOURCE_DATE_EPOCH; files with mtime lower (or equal) to $SOURCE_DATE_EPOCH will retain the original mtimes."""
Possibly "higher" should say "greater" instead, not sure.
which is already set to the date of the latest `%changelog` entry.
What's a changelog entry date in case of rpmautospec changelog? Is it a git AuthorDate or CommitDate?
I don't know from top of my head. There's also https://pagure.io/fedora-infra/rpmautospec/issue/209 which touches this topic a bit.
As a result, more RPM packages will be reproducible:
Where will this reproducibility stop? An RPM package itself carry a build time in its RPM header. Are we also going to fake this time in the name of reproducibility?
Not as part of this change proposal and I have no intention to propose such a thing.
What value these faked timestamps have? E.g. a compiled file is a function not only of its source, but also of the compiler. This proposed change removes the compiler part from the timestamp. Will timestamps like this be helpful?
Are the current timestamps helpful?
Wouldn't be easier to admit that timesamps are nonsense and simply eradicate all of them stamps from various data formats rather than trying to fake them?
I don't think it would be easier, but I have not tried that.
Simply changing rpmbuild to set timestamp to 0 for all contained files, or removing the time attribute from the RPM format completely?
RPM does not currently support this. RPM currently supports mtime clamping which is what we have proposed. You seem to not like the idea but you don't say so explicitly. If you prefer status quo over this change and would rather see the proposal rejected, please say so, so FESCo can evaluate your feedback when voting about the proposal.
V Fri, Nov 11, 2022 at 02:05:11PM +0100, Miro Hrončok napsal(a):
As a result, more RPM packages will be reproducible:
Where will this reproducibility stop? An RPM package itself carry a build time in its RPM header. Are we also going to fake this time in the name of reproducibility?
Not as part of this change proposal and I have no intention to propose such a thing.
Then a goal of this change cannot be a reproducible RPM package. We could rather speak about reproducible cpio archives inside the RPM packages.
What value these faked timestamps have? E.g. a compiled file is a function not only of its source, but also of the compiler. This proposed change removes the compiler part from the timestamp. Will timestamps like this be helpful?
Are the current timestamps helpful?
None of the timestamps are reliable. But a universe where two versions of a file have the same timestamp but a different content violates my perception of time. It's connected to the tracability touched by Alexander.
Wouldn't be easier to admit that timesamps are nonsense and simply eradicate all of them stamps from various data formats rather than trying to fake them?
I don't think it would be easier, but I have not tried that.
Simply changing rpmbuild to set timestamp to 0 for all contained files, or removing the time attribute from the RPM format completely?
RPM does not currently support this. RPM currently supports mtime clamping which is what we have proposed. You seem to not like the idea but you don't say so explicitly. If you prefer status quo over this change and would rather see the proposal rejected, please say so, so FESCo can evaluate your feedback when voting about the proposal.
I asked all the questions because I think it's quite convoluted way to reproducible builds. If the purpose is just normalize timestamps to a release date of the package, then fine.
I didn't write explicitly that I don't like this change, because I can see some advantages of it. I'm only not convinced, wheter loosing advatages of the current systems is worth of it.
-- Petr
On Fri, Nov 11, 2022 at 02:05:11PM +0100, Miro Hrončok wrote:
On 11. 11. 22 11:53, Petr Pisar wrote:
V Thu, Nov 10, 2022 at 03:23:49PM -0500, Ben Cotton napsal(a):
https://fedoraproject.org/wiki/Changes/ReproducibleBuildsClampMtimes
== Summary ==
The `%clamp_mtime_to_source_date_epoch` RPM macro will be set to `1`. When an RPM package is built, mtimes of packaged files will be clamped to `$SOURCE_DATE_EPOCH`
Clamp as capping maximal mtime, or resetting mtime for all files? I.e. If I had a source file dated 1970-01-01 and installed it with "install -p", will the packaged file retain that 1970-01-01 date, or will it be set to the date of the latest changlog, e.g. 2022-11-11? In other words, will all files in a package have the same mtime, or there won't be an mtime newer than the changelog entry?
Capping maximal mtime. It's actually described in the detailed description:
"""Clamping means that all files which would otherwise have a modification datetime higher than $SOURCE_DATE_EPOCH will have the modification datetime changed to $SOURCE_DATE_EPOCH; files with mtime lower (or equal) to $SOURCE_DATE_EPOCH will retain the original mtimes."""
Possibly "higher" should say "greater" instead, not sure.
which is already set to the date of the latest `%changelog` entry.
What's a changelog entry date in case of rpmautospec changelog? Is it a git AuthorDate or CommitDate?
I don't know from top of my head. There's also https://pagure.io/fedora-infra/rpmautospec/issue/209 which touches this topic a bit.
This is obviously an interesting question, but it's orthogonal to this proposal. Here we'll take whatever rpmautospec generates as the last timestamp. The details of that generation may even change over time, as discussed in the issue.
As a result, more RPM packages will be reproducible:
Where will this reproducibility stop? An RPM package itself carry a build time in its RPM header. Are we also going to fake this time in the name of reproducibility?
Not as part of this change proposal and I have no intention to propose such a thing.
What value these faked timestamps have? E.g. a compiled file is a function not only of its source, but also of the compiler. This proposed change removes the compiler part from the timestamp. Will timestamps like this be helpful?
Are the current timestamps helpful?
Wouldn't be easier to admit that timesamps are nonsense and simply eradicate all of them stamps from various data formats rather than trying to fake them?
I don't think it would be easier, but I have not tried that.
Simply changing rpmbuild to set timestamp to 0 for all contained files, or removing the time attribute from the RPM format completely?
RPM does not currently support this. RPM currently supports mtime clamping which is what we have proposed. You seem to not like the idea but you don't say so explicitly. If you prefer status quo over this change and would rather see the proposal rejected, please say so, so FESCo can evaluate your feedback when voting about the proposal.
I think we should not get rid of timestamps altogether. They are useful metadata and even end users look at them occasionally. Our packaging guidelines say that an effort must be made to preserve _upstream_ timestamps. We also know from the experince with ostree systems that setting timestamps to 0 confuses some software. Thus, I think we should clamp the timestamps as required to achieve reproducible builds, but not more.
Zbyszek
On Fri, Nov 11, 2022, at 5:53 AM, Petr Pisar wrote:
Wouldn't be easier to admit that timesamps are nonsense and simply eradicate all of them stamps from various data formats rather than trying to fake them? Simply changing rpmbuild to set timestamp to 0 for all contained files, or removing the time attribute from the RPM format completely?
This is what ostree has done since its inception.
On Fri, 2022-11-11 at 12:42 -0500, Colin Walters wrote:
On Fri, Nov 11, 2022, at 5:53 AM, Petr Pisar wrote:
Wouldn't be easier to admit that timesamps are nonsense and simply eradicate all of them stamps from various data formats rather than trying to fake them? Simply changing rpmbuild to set timestamp to 0 for all contained files, or removing the time attribute from the RPM format completely?
This is what ostree has done since its inception.
And it broke some software, I know because i had to fix it.
Simo.
On 10 Nov 2022, at 20:24, Ben Cotton bcotton@redhat.com wrote:
https://fedoraproject.org/wiki/Changes/ReproducibleBuildsClampMtimes
This document represents a proposed Change. As part of the Changes process, proposals are publicly announced in order to receive community feedback. This proposal will only be implemented if approved by the Fedora Engineering Steering Committee.
== Summary ==
The `%clamp_mtime_to_source_date_epoch` RPM macro will be set to `1`. When an RPM package is built, mtimes of packaged files will be clamped to `$SOURCE_DATE_EPOCH` which is already set to the date of the latest `%changelog` entry. As a result, more RPM packages will be reproducible: The actual modification time of files that are e.g. modified in the `%prep` section or built in the `%build` section will not be reflected in the resulting RPM packages. Files in RPM packages will have mtimes that are independent of the time of the actual build.
== Owner ==
- Name: [[User:Churchyard|Miro Hrončok]], [[User:Zbyszek|Zbigniew
Jędrzejewski-Szmek]]
- Email: mhroncok at redhat.com, zbyszek at in.waw.pl
== Detailed Description == This change exists to make RPM package builds more reproducible. A common problem that prevents [https://reproducible-builds.org/ build reproducibility] is the mtime (modification times) of the packaged files.
Suppose we package an RPM package of software called `skynet` in version `1.0`. Upstream released this version at datetime A. A Fedora packager creates the RPM package at datetime B. Unfortunately, the packager needs to patch the sources in the RPM `%prep` section. When the build runs at datetime C, the modification datetime of the patched file is set to C. When the build runs again in an otherwise identical environment at datetime D, the modification datetime of the patched file is set to D. As a result, the build is not bit-by-bit reproducible, because the datetime of the build is saved in the resulting package. Patching is not necessary to make this happen. When a source file is compiled into a binary file, the modification datetime is also set to the datetime of the build. In practice, the modification datetime of many files packaged in RPM packages is dependent on when the package was actually built.
To eliminate this problem, we propose to clamp build mtimes to `$SOURCE_DATE_EPOCH`. RPM build in Fedora already sets the `$SOURCE_DATE_EPOCH` environment variable based on the latest `%changelog` entry because the `%source_date_epoch_from_changelog` macro is set to `1`
Change log has the date of a change but no time. What time of day and timezone is used? 00:00:00 UTC?
Barry
. We will also set the `%clamp_mtime_to_source_date_epoch` macro to `1`. As a result, when files are packaged to the RPM package, their modification datetimes will be clamped to `$SOURCE_DATE_EPOCH` (to the latest changelog entry datetime). Clamping means that all files which would otherwise have a modification datetime higher than `$SOURCE_DATE_EPOCH` will have the modification datetime changed to `$SOURCE_DATE_EPOCH`; files with mtime lower (or equal) to `$SOURCE_DATE_EPOCH` will retain the original mtimes.
This functionality is already implemented in RPM. We will enable it by setting `%clamp_mtime_to_source_date_epoch` to `1`.
=== Non-goal ===
We do not aim to make all Fedora packages reproducible (at least not as part of this change proposal). We just eliminate one problem that we consider the biggest blocker for reproducible builds.
=== Python bytecode ===
When Python bytecode cache (a `.pyc` file) is built, the mtime of the corresponding Python source file (`.py`) is included in it for invalidation purposes. Since the `.pyc` file is created before RPM clamps the mtime of the `.py` file, the mtime stored in the `.pyc` file might be higher than the corresponding mtime of the `.py` file.
With the previous example, if `skynet` is written in Python: # `skynet.py` is modified in `%prep` and hence has mtime set to the time of the build # `skynet.pyc` is generated in `%install` and the mtime of `skynet.py` is saved in it # RPM clamps the mtime of `skynet.py` # `skynet.pyc` is considered invalid by Python on runtime, as the stored and actual mtime of `skynet.py` don't match
To solve this, we will modify Python to clamp the stored mtime to `$SOURCE_DATE_EPOCH` as well (when building RPM packages). Upstream Python chooses to invalidate bytecode cache based on hashes instead of mtimes when `$SOURCE_DATE_EPOCH` is set, but that could cause performance issues for big files, so Fedora's Python already deviates from upstream behavior when building RPM packages. To avoid accidentally breaking the behavior when `%clamp_mtime_to_source_date_epoch` is not set to `1`, RPM macros and buildroot policy scripts for creating the Python bytecode cache will be modified to unset `$SOURCE_DATE_EPOCH` when `%clamp_mtime_to_source_date_epoch` is not set to `1`.
This behavior might be proposed upstream if it turns out to be superior to the current upstream choice, in case we [https://discuss.python.org/t/14594 won't redesign the bytecode-source relationship entirely] instead.
=== Opting out ===
Packages broken by this new behavior can unset `%clamp_mtime_to_source_date_epoch` but packagers are encouraged to fix the problem instead.
== Feedback == Enabling this RPM feature was [https://src.fedoraproject.org/rpms/redhat-rpm-config/pull-request/126 proposed as a pull request] to {{package|redhat-rpm-config}} in April 2021. It received good feedback with the exception of the following:
- it was said the change needs to be coordinated with the Python maintainers
- it was said the change should be done via a change process for
better coordination and exposure
We believe that by proposing this via the change process and planning for the changes needed in Python, both issues are addressed.
== Benefit to Fedora == We believe that many RPM packages will become reproducible and others will be more reproducible than before. The benefits of reproducible builds are better explained at https://reproducible-builds.org/
== Scope ==
- Proposal owners:
** Propose a PR for {{package|redhat-rpm-config}} (set `%clamp_mtime_to_source_date_epoch` to `1`, possibly only when `%source_date_epoch_from_changelog` is set) ** Propose a PR for {{package|python-rpm-macros}} (unset `$SOURCE_DATE_EPOCH` while creating `.pyc` files iff `%clamp_mtime_to_source_date_epoch` is not `1`) ** Propose a PR for [https://src.fedoraproject.org/rpms/python3.11/blob/b2d80045f9/f/00328-pyc-ti... the Python's bytecode invalidation mode patch] for all Python versions that have it ** Backport (the new portion of) the patch to older Pythons ({{package|python2.7}}, {{package|python3.6}} and PyPys) ** Test everything together in Copr and deploy it if it works. ** Optional: Run some reproducibility tests before and after this change and produce some statistics.
- Other developers:
** Test their packages with the new behavior, report problems, and opt-out if really needed.
- Release engineering: N/A (not needed for this Change)
- Policies and guidelines: N/A (not needed for this Change)
- Trademark approval: N/A (not needed for this Change)
- Alignment with Objectives: N/A (not needed for this Change)
== Upgrade/compatibility impact == Nothing anticipated.
== How To Test == The change owners plan to perform a mass rebuild in Copr to see if this breaks anything significantly. If it actually works as anticipated, they also plan to run some reproducibility tests and hopefully produce some statistics before and after this change.
Other packages can test by building their packages and verifying they still work as expected and no packaged files have higher mtimes than the last `%changelog` entry.
To verify if this change has landed, run: `rpm --eval '%clamp_mtime_to_source_date_epoch'` on Fedora 38. The result should be `1`.
== User Experience == Users of Fedora Linux on their machines should not be impacted at all. Users who build RPM packages atop Fedora will be impacted by this change the same way Fedora is.
== Dependencies ==
- RPM needs to support this (it already does)
- RPM needs to set `$SOURCE_DATE_EPOCH` (it already does)
== Contingency Plan ==
- Contingency mechanism: The change owners or
{{package|redhat-rpm-config}} maintainers or proven packagers will revert the change in {{package|redhat-rpm-config}}. That should be enough to undo anything as the changes in Python should be dependent on that. If not enough, revert everything.
- Contingency deadline: Ideally, we should do this before the Mass
Rebuild. Technically, we can land it any time before the Beta Freeze, but it would not change all the packages, which is a bit messy. * Blocks release? No <
== Documentation ==
This page is the documentation.
== Release Notes ==
-- Ben Cotton He / Him / His Fedora Program Manager Red Hat TZ=America/Indiana/Indianapolis _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
On 11. 11. 22 14:18, Barry wrote:
Change log has the date of a change but no time. What time of day and timezone is used? 00:00:00 UTC?
Changelogs can have times as well.
When they don't, they are considered 12:00 (noon) UTC:
https://github.com/rpm-software-management/rpm/blob/rpm-4.18.0-release/build... https://github.com/rpm-software-management/rpm/blob/rpm-4.18.0-release/build...
On Thu Nov 10, 2022 at 15:23 -0500, Ben Cotton wrote:
https://fedoraproject.org/wiki/Changes/ReproducibleBuildsClampMtimes
This document represents a proposed Change. As part of the Changes process, proposals are publicly announced in order to receive community feedback. This proposal will only be implemented if approved by the Fedora Engineering Steering Committee.
== Summary ==
The `%clamp_mtime_to_source_date_epoch` RPM macro will be set to `1`. When an RPM package is built, mtimes of packaged files will be clamped to `$SOURCE_DATE_EPOCH` which is already set to the date of the latest `%changelog` entry. As a result, more RPM packages will be reproducible: The actual modification time of files that are e.g. modified in the `%prep` section or built in the `%build` section will not be reflected in the resulting RPM packages. Files in RPM packages will have mtimes that are independent of the time of the actual build.
Will packagers still be required to use install -p, cp -p, etc. to preserve mtimes? For some packages, the source archives mtimes will be lower than $SOURCE_DATE_EPOCH, but e.g. for ancillary Source files (e.g. systemd units) stored in distgit, using -p won't make a difference, because the mtimes aren't preserved when Koji clones the distgit repository.
-- Best,
Maxwell G (@gotmax23) Pronouns: He/Him/His
On 24. 11. 22 19:28, Maxwell G via devel wrote:
On Thu Nov 10, 2022 at 15:23 -0500, Ben Cotton wrote:
https://fedoraproject.org/wiki/Changes/ReproducibleBuildsClampMtimes
This document represents a proposed Change. As part of the Changes process, proposals are publicly announced in order to receive community feedback. This proposal will only be implemented if approved by the Fedora Engineering Steering Committee.
== Summary ==
The `%clamp_mtime_to_source_date_epoch` RPM macro will be set to `1`. When an RPM package is built, mtimes of packaged files will be clamped to `$SOURCE_DATE_EPOCH` which is already set to the date of the latest `%changelog` entry. As a result, more RPM packages will be reproducible: The actual modification time of files that are e.g. modified in the `%prep` section or built in the `%build` section will not be reflected in the resulting RPM packages. Files in RPM packages will have mtimes that are independent of the time of the actual build.
Will packagers still be required to use install -p, cp -p, etc. to preserve mtimes? For some packages, the source archives mtimes will be lower than $SOURCE_DATE_EPOCH, but e.g. for ancillary Source files (e.g. systemd units) stored in distgit, using -p won't make a difference, because the mtimes aren't preserved when Koji clones the distgit repository.
Yes, that guideline will stay.
The following change proposal has been shipped in redhat-rpm-config-238-1.fc38.
If you need to opt-out, you can %undefine clamp_mtime_to_source_date_epoch or define it to 0.
If you encounter problems, report them in Bugzilla and preferably make it block the change tracking https://bugzilla.redhat.com/2149310
Or reply to this thread on the devel list.
On 10. 11. 22 21:23, Ben Cotton wrote:
https://fedoraproject.org/wiki/Changes/ReproducibleBuildsClampMtimes
This document represents a proposed Change. As part of the Changes process, proposals are publicly announced in order to receive community feedback. This proposal will only be implemented if approved by the Fedora Engineering Steering Committee.
== Summary ==
The `%clamp_mtime_to_source_date_epoch` RPM macro will be set to `1`. When an RPM package is built, mtimes of packaged files will be clamped to `$SOURCE_DATE_EPOCH` which is already set to the date of the latest `%changelog` entry. As a result, more RPM packages will be reproducible: The actual modification time of files that are e.g. modified in the `%prep` section or built in the `%build` section will not be reflected in the resulting RPM packages. Files in RPM packages will have mtimes that are independent of the time of the actual build.
== Owner ==
- Name: [[User:Churchyard|Miro Hrončok]], [[User:Zbyszek|Zbigniew
Jędrzejewski-Szmek]]
- Email: mhroncok at redhat.com, zbyszek at in.waw.pl
== Detailed Description == This change exists to make RPM package builds more reproducible. A common problem that prevents [https://reproducible-builds.org/ build reproducibility] is the mtime (modification times) of the packaged files.
Suppose we package an RPM package of software called `skynet` in version `1.0`. Upstream released this version at datetime A. A Fedora packager creates the RPM package at datetime B. Unfortunately, the packager needs to patch the sources in the RPM `%prep` section. When the build runs at datetime C, the modification datetime of the patched file is set to C. When the build runs again in an otherwise identical environment at datetime D, the modification datetime of the patched file is set to D. As a result, the build is not bit-by-bit reproducible, because the datetime of the build is saved in the resulting package. Patching is not necessary to make this happen. When a source file is compiled into a binary file, the modification datetime is also set to the datetime of the build. In practice, the modification datetime of many files packaged in RPM packages is dependent on when the package was actually built.
To eliminate this problem, we propose to clamp build mtimes to `$SOURCE_DATE_EPOCH`. RPM build in Fedora already sets the `$SOURCE_DATE_EPOCH` environment variable based on the latest `%changelog` entry because the `%source_date_epoch_from_changelog` macro is set to `1`. We will also set the `%clamp_mtime_to_source_date_epoch` macro to `1`. As a result, when files are packaged to the RPM package, their modification datetimes will be clamped to `$SOURCE_DATE_EPOCH` (to the latest changelog entry datetime). Clamping means that all files which would otherwise have a modification datetime higher than `$SOURCE_DATE_EPOCH` will have the modification datetime changed to `$SOURCE_DATE_EPOCH`; files with mtime lower (or equal) to `$SOURCE_DATE_EPOCH` will retain the original mtimes.
This functionality is already implemented in RPM. We will enable it by setting `%clamp_mtime_to_source_date_epoch` to `1`.
=== Non-goal ===
We do not aim to make all Fedora packages reproducible (at least not as part of this change proposal). We just eliminate one problem that we consider the biggest blocker for reproducible builds.
=== Python bytecode ===
When Python bytecode cache (a `.pyc` file) is built, the mtime of the corresponding Python source file (`.py`) is included in it for invalidation purposes. Since the `.pyc` file is created before RPM clamps the mtime of the `.py` file, the mtime stored in the `.pyc` file might be higher than the corresponding mtime of the `.py` file.
With the previous example, if `skynet` is written in Python: # `skynet.py` is modified in `%prep` and hence has mtime set to the time of the build # `skynet.pyc` is generated in `%install` and the mtime of `skynet.py` is saved in it # RPM clamps the mtime of `skynet.py` # `skynet.pyc` is considered invalid by Python on runtime, as the stored and actual mtime of `skynet.py` don't match
To solve this, we will modify Python to clamp the stored mtime to `$SOURCE_DATE_EPOCH` as well (when building RPM packages). Upstream Python chooses to invalidate bytecode cache based on hashes instead of mtimes when `$SOURCE_DATE_EPOCH` is set, but that could cause performance issues for big files, so Fedora's Python already deviates from upstream behavior when building RPM packages. To avoid accidentally breaking the behavior when `%clamp_mtime_to_source_date_epoch` is not set to `1`, RPM macros and buildroot policy scripts for creating the Python bytecode cache will be modified to unset `$SOURCE_DATE_EPOCH` when `%clamp_mtime_to_source_date_epoch` is not set to `1`.
This behavior might be proposed upstream if it turns out to be superior to the current upstream choice, in case we [https://discuss.python.org/t/14594 won't redesign the bytecode-source relationship entirely] instead.
=== Opting out ===
Packages broken by this new behavior can unset `%clamp_mtime_to_source_date_epoch` but packagers are encouraged to fix the problem instead.
== Feedback == Enabling this RPM feature was [https://src.fedoraproject.org/rpms/redhat-rpm-config/pull-request/126 proposed as a pull request] to {{package|redhat-rpm-config}} in April 2021. It received good feedback with the exception of the following:
- it was said the change needs to be coordinated with the Python maintainers
- it was said the change should be done via a change process for
better coordination and exposure
We believe that by proposing this via the change process and planning for the changes needed in Python, both issues are addressed.
== Benefit to Fedora == We believe that many RPM packages will become reproducible and others will be more reproducible than before. The benefits of reproducible builds are better explained at https://reproducible-builds.org/
== Scope ==
- Proposal owners:
** Propose a PR for {{package|redhat-rpm-config}} (set `%clamp_mtime_to_source_date_epoch` to `1`, possibly only when `%source_date_epoch_from_changelog` is set) ** Propose a PR for {{package|python-rpm-macros}} (unset `$SOURCE_DATE_EPOCH` while creating `.pyc` files iff `%clamp_mtime_to_source_date_epoch` is not `1`) ** Propose a PR for [https://src.fedoraproject.org/rpms/python3.11/blob/b2d80045f9/f/00328-pyc-ti... the Python's bytecode invalidation mode patch] for all Python versions that have it ** Backport (the new portion of) the patch to older Pythons ({{package|python2.7}}, {{package|python3.6}} and PyPys) ** Test everything together in Copr and deploy it if it works. ** Optional: Run some reproducibility tests before and after this change and produce some statistics.
- Other developers:
** Test their packages with the new behavior, report problems, and opt-out if really needed.
- Release engineering: N/A (not needed for this Change)
- Policies and guidelines: N/A (not needed for this Change)
- Trademark approval: N/A (not needed for this Change)
- Alignment with Objectives: N/A (not needed for this Change)
== Upgrade/compatibility impact == Nothing anticipated.
== How To Test == The change owners plan to perform a mass rebuild in Copr to see if this breaks anything significantly. If it actually works as anticipated, they also plan to run some reproducibility tests and hopefully produce some statistics before and after this change.
Other packages can test by building their packages and verifying they still work as expected and no packaged files have higher mtimes than the last `%changelog` entry.
To verify if this change has landed, run: `rpm --eval '%clamp_mtime_to_source_date_epoch'` on Fedora 38. The result should be `1`.
== User Experience == Users of Fedora Linux on their machines should not be impacted at all. Users who build RPM packages atop Fedora will be impacted by this change the same way Fedora is.
== Dependencies ==
- RPM needs to support this (it already does)
- RPM needs to set `$SOURCE_DATE_EPOCH` (it already does)
== Contingency Plan ==
- Contingency mechanism: The change owners or
{{package|redhat-rpm-config}} maintainers or proven packagers will revert the change in {{package|redhat-rpm-config}}. That should be enough to undo anything as the changes in Python should be dependent on that. If not enough, revert everything.
- Contingency deadline: Ideally, we should do this before the Mass
Rebuild. Technically, we can land it any time before the Beta Freeze, but it would not change all the packages, which is a bit messy. * Blocks release? No <
== Documentation ==
This page is the documentation.