Noarch subpackage problem

Wed Feb 25 01:53:51 UTC 2009

Mike Bonnet wrote:
> Toshio Kuratomi wrote:
>> So we had a discussion on IRC today about the failure cases of noarch
>> subpackages.  I think we should make some changes to the way we check
>> that noarch subpackages are sane.
>>
>> Currently, when a noarch subpackage is built, rpmdiff is run on the
>> noarch packages that were built by each builder.
>>
>> Of the checks that rpmdiff does, we discard all of them except Provides,
>> Requires, and the list of files.  My concern is that if you throw out
>> md5sum and filesize in these checks there's a lot of margin for creating
>> subpackages that are not actually noarch.
> 
> Actually the full list of tags we diff are:
> 
> name
> summary
> description
> group
> license
> url
> prein (script)
> postin (script)
> preun (script)
> postun (script)
> 
> For Requires, Provides, Conflicts, and Obsoletes we check that the lists
>  (including versions) are identical across all subpackages.
> 
> We verify that that file lists are identical across all subpackages, and
> for each file in the file lists we verify that the following attributes
> are identical:
> 
> mode
> flags
> nlink
> state
> vflags
> user
> group
> 
>> For instance, if bitedness ends up in include files that are placed in a
>> noarch subpackage, those subpackages won't be caught by this check.
>> That would allow a package to go out that could prevent building with
>> the incorrect header.
>>
>> The reason that filesize and md5sum are discarded is that
>> arch-inspecific files can have timestamps embedded into them at build
>> time.  This means, for instance, that documentation can differ between
>> builds of a subpackage despite it being a prime candidate for a noarch
>> subpackage.
>>
>> An idea for a change would be to extend rpmdiff to be able to list
>> changes in md5sum between all files except those marked as %doc.  This
>> would let documentation packages through even if timestamps were
>> embedded but not let a noarch package with differing headers through,
>> for instance.
> 
> Another class of files that are noarch-but-different are .pyc/.pyo
> files, as Julian Sikorski found out:
> 
> https://www.redhat.com/archives/fedora-devel-list/2009-February/msg01826.html
> 
Actually, that's not correct.

*.pyc and *.pyo files encode the timestamp of the file they are
generated from.  So in a normal build you get matching *.pyc and *.pyo
files from builds on different arches.

What's happening in this build is actually a very interesting corner
case.  The source => byte code compiler on x86_64 and i86 are generating
different code for the constant value 4294967295 (2**32 - 1) but both
interpreters read the byte code from the other compiler successfully.

On x86_64, the constant is compiled in to an "I" type (64 bit integer
type) and saved in the byte code as such.  On i86, the constant ins
compiled into a "l" type (long integer.  Limited by the memory of the
machine rather than arbitrary bytes)

When the i86 byte code is loaded on x86_64, the type that was saved in
the pyc is used so it stays an "l".  When loaded on i86, the type is
converted from "I" to an "l".

So this is a false positive, but one that shows up infrequently in
practice.  It would probably be an *extremely* minor optimization to
build this particular file on the architecture its going to run on to
avoid conversion costs on i86 and be able to operate with a long int on
x86_64 :-)

> 
> Issues like this, where files differed because of embedded
> timestamps/hostnames/etc. but were not different in any meaningful way
> came up during testing of this feature.  As a result it was decided to
> not fail a build due to differences in file size, digest, or mtime
> because this would have resulted in a lot of false positives, and
> significantly reduced the usability and usefulness of this feature.
> 
> The automated checks are not a replacement for diligent package review
> and testing, they are there to help package maintainers catch silly
> mistakes and oversights.  What we have now is a good balance between
> catching those oversights and not burdening maintainers with a huge
> number of false positives that (as in the python case above) they are
> unable to do anything about and thus unable to make use of this feature.
> 
Really?  If you have an exclusion for %doc marked files, I'd say that
would satisfy the #1 major use case of this feature.

OTOH, allowing people to use this feature makes more work for everyone.
   Now reviewers and packagers have to check whether the noarch
subpackages they build really are noarch.  And whether they are noarch
after many upstream releases.  And while this is true of every noarch
package, it's worse for noarch subpackages because we're dealing with
more types of files now.

For a reviewer with a single computer or computers of only a single
architecture, the same check requires uploading the packages to koji to
be scratch built.  The reviewer then has to downlaod the binary rpms.
And they have to run rpmdiff on the packages manually to determine if
there's a problem.  By contrast, koji can have an automated check
because packages get built in koji.

> Note that this feature is in no way more dangerous or prone to error
> than the existing method of creating noarch subpackages (extra-arches).
> 
It's no more dangerous but it is more prone to error.  For instance,
without the ability to make noarch subpackages, header files would be
included in an arch specific -devel package without question.  Now the
possibility arises to put headers in a noarch subpackage instead so
packagers need to check that the headers do not contain arch specific
code.  This is an additional piece of information and an additional
check that packagers will have to know to perform.

Anytime you add things that the packager has to understand and has to do
you add a step where things can be forgotten or misapplied.  Right now,
we have the ability to start with a small set of files that are
allowable in noarch subpackages and add more as we figure out heuristics
to allow things that we are pretty certain are correct.  (%doc, *.py,
etc).  Once this is out there for a while, the only thing we can do is
impose additional burden on reviewers and packagers as it's much harder
to take back a feature than it is to parcel it out a little at a time.

-Toshio

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: OpenPGP digital signature
Url : http://lists.fedoraproject.org/pipermail/devel/attachments/20090224/4fc58d8a/attachment.bin