On Tue, May 17, 2022 at 05:46:25PM +0000, Gary Buhrmaster wrote:
On Tue, May 17, 2022 at 2:41 PM Vitaly Zaitsev via devel
<devel(a)lists.fedoraproject.org> wrote:
> But I think this change also requires automatic conversion of all
> available SPECs, because manual conversion will take years.
Automating where possible (the existing license has a
one-to-one mapping) is desirable, but realistically
there are just too many packages that currently have
a license such as the poster child "BSD" that are
going to require someone(*) to actually look at the
upstream license files to decide which SPDX id
is the right one (and not all upstreams even name
their license files consistently or the contents of
those license files have minor syntactic variations).
Automating the change of identifiers is only meaningful if the values we
currently have in the License field are correct. Given that the only time
someone other than the package maintainer validates the License field against
what is actually in the software is during initial package review, it is
possible that some packages have added additional licenses or changed and the
spec files are not in sync. We know this happens when package maintainers
make announcements about upcoming license changes in a package. Many
packagers are good about this, but it is easy to miss a change sometimes when
you are doing updates.
(*) I suppose it is conceivable someone could
create a sufficiently accurate AI/ML model
to scan the spec file, all the sources, and choose
correctly. If this was an ongoing activity that
might even make sense. But for a one time
activity I suspect packagers are going to have
to do it manually unless you are volunteering to
build and test that automation.
I think a better thing to do would be to use a scanner like scancode[1] to
check the source tree in question and then construct a License expression for
the spec file from its results. In many cases it will be the same as what we
have in the spec file, just with different identifiers. But we would be using
the opportunity to both move to new license identifiers and audit the
information at the same time. Note that scancode isn't perfect, but it would
be used as a workflow tool here as the contributor audits the licensing
information in a package.
I realize this is a lot of work. It would be best done in hackfest type
sessions with work divided up in the subsets of packages. It would be a good
opportunity for new contributors to learn how things are structured and send
PRs to existing packages.
[1]
https://github.com/nexB/scancode-licensedb
Thanks,
--
David Cantrell <dcantrell(a)redhat.com>
Red Hat, Inc. | Boston, MA | EST5EDT