Re: F35 Change proposal: RPM 4.17 (System-Wide Change proposal)

Tuesday, 6 April 2021

On 4/6/21 1:36 PM, Miro Hrončok wrote:
...
 On 06. 04. 21 11:16, Panu Matilainen wrote:
> On 4/1/21 12:45 AM, Miro Hrončok wrote:
>> On 31. 03. 21 21:52, Ben Cotton wrote:
>>> * Strict checking for unpackaged content in builds
>>  > ...
>>> * Many existing packages will fail to build due to the stricter
>>> buildroot content checking. Fixing this in the packaging is always
>>> backwards compatible. We could temporarily set
>>> `%_unpackaged_files_terminate_build 0` in rawhide to alleviate initial
>>> impact if necessary.
>>
>> This is my main concern with this update.
>>
>> tl;dr If you %exclude something and there is no other subpackage to 
>> own the files, the build fails:
>>
>>
>> This fails:
>>
>>    %install
>>    ...
>>    touch %{buildroot}/foo %{buildroot}/bar
>>
>>    %files
>>    /
>>    %exclude /foo
>>
>> This still succeeds:
>>
>>    %files
>>    /
>>    %exclude /foo
>>
>>    %files foo
>>    /foo
>>
>> Many packages do the former in Fedora for various different reasons, 
>> namely to, well... exclude files from the package (and not ship them 
>> at all). Sometimes a `rm` in %install can be used instead. Sometimes 
>> not, because the files are needed in the %{buildroot} for %check but 
>> not needed to be shipped.
>
> The bottom line is that the buildroot should not contain anything that 
> is not packaged. This has been the basic premise ever since the check 
> for unpackaged files was added somewhere around turn of the millenium, 
> but some loopholes were left around. Yes, %exclude is a loophole.
> So 'rm -f' at end of %install for undesired material is the expected fix.
>
> What %check may or may not do has probably never been designed, much 
> less documented or enforced. It runs after %install yes, so one might 
> assume it's ok/supposed to use what's in buildroot, but then it runs 
> from the build directory, not buildroot.

 It doesn't matter where does it run *from*, it needs to "see" the files 
 from %{buildroot} because that is what we want to test: The files we ship.

> The way I see it, %check should be able to use/access buildroot 
> contents, but it certainly should not write there. That might be 
> something to even enforce one day.

 The packages (that I know about) don't need to actually write there in 
 %check. They just need to to see the files that will be later excluded 
 (e.g. because they belong to a different package that the 
 soon-to-be-built package requires on runtime). 

And therein lies the problem. Like many things in rpm, %check isn't 
actually defined in any meaningful way, but in practise it's kinda 
limited to standalone unit/functional testing and this steps on the 
border of integration testing.

...

>> When this change was introduced upstream in November 2020, I've 
>> analyzed the impact on Fedora packages. Bare in mind that the data is 
>> 4+ months old.
>>
>> https://github.com/rpm-software-management/rpm/pull/1442#issuecomment-731... 
>>
>>
>>   - 1675 packages had %exclude in the spec file
>>   -  261 packages FTBFS for unrelated reason (incl. a very limited 
>> timeout)
>>   - 1414 packages actually tested
>>   -  537 packages built successfully, that is ~38%
>>   -  877 packages failed with unpackaged files, that is ~62%, list 
>> attached
>>
>> When I extrapolate the numbers to compensate the unrelated FTBFS, 
>> that's likely more than 1000 affected Fedora packages.
>>
>> OTOH ~500 packages are generated rubygem-* packages, automatically 
>> fixable.
>>
>> I'd like to know how are the affected packages supposed to migrate to 
>> RPM 4.17 behavior, especially if they cannot remove the files in 
>> %install prior to %check. Are they supposed to remove the files at 
>> the end of %check instead? What if the package is build without %check?
>
> In the order of preference:
>
> 1. 'rm -f <unwanted>' at end of %install

 That one is simple. I agree. If it is possible, let's do this. We could 
 even automate it if needed. (As a side note I'd say writing such 
 automation is a right thing to do when a change like this is proposed, 
 but I understand that we cannot have everything.)

> 2. change %check not to rely on unpackaged files in buildroot

 That one is non-trivial and depends on the reason it is needed. 
Of course.

...
 For example, what is common for Python "namepsace"
packages, e.g. 
 pkg_name.foo.

 1) We want to test installed files, not what is in $PWD, so we set 
 PYTHONPATH to
     %{buildroot}%{python3_sitearch}:%{buildroot}%{python3_sitelib} and we
     (try hard to) exclude $PWD from it. This is crucial to ensure the files
     we actually ship are working and the installed file set is complete.
     Our macros do this for the packagers.

 2) The %{python3_site...}/pkg_name/ directory and
     %{python3_site...}/pkg_name/__init__.py and
     %{python3_site...}/pkg_name/__pycache__/ and
     %{python3_site...}/pkg_name/__pycache__/__init__...pyc
     files must be present in %{buildroot} to successfully run the tests.

 3) The files from (2) must be excluded from the package because 
 *pkg_name* owns
     them, not *pkg_name.foo*.
     We Require the "toplevel" *pkg_name* package from *pkg_name.foo*.
     The files are not bit-by-bit+metadata identical,
     so both packages cannot ship them. 
This seems like a fairly exotic case to me - okay, a Python-peculiar 
problem. And a problem of stepping (not saying crossing, because I'm not 
sure it is) on the boundaries of the %check use-case I suppose.

%ghost'ing the files might be one option - I don't know about the Ruby 
cache case beyond that there is one.

...
 Or, let's assume I want to package libfoo-devel for EPEL 9
because RHEL 
 9 only ships libfoo. And I want to run %check, but only ship the 
 headers. There are plenty situations like this. 
I know the RHEL problem with -devel, but really? Shipping -devel from a 
different package than the actual lib sounds like a disaster waiting to 
happen to me.

That aside, based on personal experience most %check sections don't use 
the buildroot content at all, but rather use the build tree. For example 
autotools-based projects separate these cases: "make check" runs in the 
source tree, "make installcheck" the installed copies and the test might 
differ. I don't know, but my guess is that "installcheck" is a much 
later addition and %check was in fact modeled after "make check" 
automake target. And it's suitability to other cases ... varies.

...
 The case-by-case fix is also impossible to do at scale without a huge

 heroic effort.

> 3. short-term, "%_unpackaged_files_terminate_build 0" in the spec with 
> explanation (similar to eg unsupported architectures)

 I will address below why I don't like this (and why I consider it 
 dangerous).

> 2. would be case-by-case of course, but an "universal" solution is to 
> create a private install root inside %check, eg
>
> ---
> %check
> export CHECKROOT=${PWD}/_check
> %make_install DESTDIR=${CHECKROOT}
>
> # ...do what you normally do, just replace BUILDROOT with CHECKROOT
> ---
>
> This way %check will not be messing with the precious to-be-packaged 
> contents.

 Yes, this works. However it requires a tedious copy-paste boilerplate :/

 E.g. instead of this:

    %install
    %py3_install

    %check
    %pytest

    %files
    %exclude %{python3_sitelib}/...unwanted...
    ...

 I need to do this:

    %install
    %py3_install
    mkdir _check
    cp -a %{buildroot} _check
    rm -rf %{buildroot}%{python3_sitelib}/...unwanted...

    %check
    export PYTHONPATH=${PWD}/_check%{python3_sitelib}
    %pytest 
This isn't quite what I suggested. I suggested doing the 
install-for-check inside %check, ie basically all the added boilerplate 
would be handled inside %pytest, so you'd also only pay the extra price 
if %check is executed:

---
%install
%py3_install

rm -rf %{buildroot}%{python3_sitelib}/...unwanted...

%check
%pytest
---

...where %pytest would look like something like:

---
CHECKROOT=${PWD}/_check
%py3_install --root ${CHECKROOT}
PYTHONPATH=${CHECKROOT}
# ... pytest environment stuff
/usr/bin/pytest
---

So the whole thing is isolated within %check.

Of course at this point you're not literally testing the exact bits that 
will be packaged, but then that's also true about the %exclude pattern. 
To really test actual packaged bits, you want to test an actual 
installed package and %check wont work for that.

...

    %files
    ...

 Also suffers from "very hard to do this change at scale" problem.

> More light-weight options will exist on case-by-case basis, eg 
> typically a cache can be diverted to some other location with 
> environment variables or such.
>
>>
>> Using `s` in entire rawhide just to compensate this:
>>
>>   - is dangerous for other implications of the setting
>>   - only postpones the problem to a later time (when we will face the 
>> same issue)
>
> It's hardly "dangerous", because the only content that will
"leak" 
> because of it is buildid links, which is what happens now. It doesn't 
> affect content inclusion, just whether the message is a warning or error.

 Nothing will "leak", quite the opposite: files will be missing.

 Setting %_unpackaged_files_terminate_build to 0 globally is dangerous, 
 because packagers will make mistakes and they will forget to include 
 newly added files, e.g. on updates. Such packages will happily build but 
 will miss needed files, only to fail in unpredictable situations on 
 runtime.

 Consider this spec:

    Name:     hammer
    Version:  1.1
    ...
    %global _unpackaged_files_terminate_build 0

    %install
    %make_install

    %check
    %make test

    %files
    %{_bindir}/hammer
    %exclude %{_bindir}/test-hammer

 It works and builds fine.

 Now the maintainer updates to version 1.2, where %make_install also 
 installs /usr/share/hammer with some data files. When the package 
 builds, there is a warning (yet the packagers don't usually read the 
 Koji build logs when the build is successful) but /usr/share/hammer is 
 missing from the package.

 %check has passed, so the maintainer thinks everything works. However in 
 reality, hammer crashes on runtime. Even if the maintainer does their 
 due diligence to test the package before they ship it, they might not 
 notice the problem because hammer crashes only when the user tries to do 
 a very specific action.

 And let's face it: Many packagers will just bump the version in the spec 
 file and "if it builds, ship it". 
I'm not sure unpackaged files is the biggest threat in that case :D

...
 Setting %_unpackaged_files_terminate_build to 0 in the affected
packages 
 only is only less dangerous because of the limited impact, but can cause 
 the very same problem.

 OTOH if we had a %_unpackaged_excluded_files_terminate_build 0 setting, 
 I would argue that adding that to all affected packages until they are 
 fixed is a good backwards compatibility measure. Adding 
 %_unpackaged_files_terminate_build 0 is not. 
Not sanely possible, the context in which an %exclude is parsed is long 
lost by the time we get to the unpackaged files check. I can't see us 
adding a whole pile of extra tracking code just to handle that specific 
compat scenario.

For some cases, %ghost'ing such files might be a reasonable option (I 
don't know the rubygem cache details but perhaps)

...
>> And for a more specific problem, around ~100 Python packages
were 
>> affected when tested, many of them crucial (e.g. dnf), so this 
>> problem will block the upgrade to Python 3.10 if the change lands in 
>> Rawhide before we upgrade Python (which is the current plan) until we 
>> fix all the affected packages (by at least adding `%global 
>> _unpackaged_files_terminate_build 0` to them, which is a tad big 
>> hammer, but it will be our last-resort option).
>
> Which is why suggested the global "%_unpackaged_files_terminate_build 
> 0" in rawhide: just to move the impact to a less inconvenient time. No 
> kittens will be killed by doing so.

 We will ship broken packages in the transition period. Kittens may die 
 if we break the packages that take care of kitten-life-support systems.

 One could argue that kitten-lives-critical systems should not run 
 Rawhide, however once we end the transition period, the broken packages 
 will start to fail to build. And until we fix them, we will ship the 
 broken versions. Hence it is likely we will ship such packages with 
 missing files unnoticed to the stable release as well. 
Seems like it's the rare case where I'm the less pessimistic one :)

	- Panu -

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: F35 Change proposal: RPM 4.17 (System-Wide Change proposal)