Dne 24. 11. 23 v 20:07 Mark Wielaard napsal(a):
I think the main conflict is that SPDX identifiers and expressions are
meant to apply to individual source files (and not describe the general
intended license of the larger work), 
SPDX license ids are used in SPDX SBOMs that are intended to describe large work
where the Fedora spec file
License tags is meant to provide the approximate license of the
(sub)package as a whole (which can consist of multiple larger, possibly
independently licensed, works).
Fedora license tag should NOT provide approximate license. It should provide exact license.
Also I found the tooling around this hard to use/understand. It seems
various tools just haven't caught up or aren't even packaged for Fedora
itself and are only available in some giant container image blob (e.g.
fossology, really a webapp that you then should run "locally", which I
never could get working [or simply didn't understood how to use]).

True. We are trying to improve it. Every day. I have just packaged another dependency for a tool that will be helpfull.

You can use OSUOSL instance of fossology https://fossology.osuosl.org/repo/ with login:password: fossy:fossy

Or you can use scancode-toolkit from Copr repo https://copr.fedorainfracloud.org/coprs/eclipseo/scancode-toolkit/

The command to do the analysis is then:

$ scancode --license --html /tmp/spdx.html --json /tmp/spdx.json --license-references  .

The output is then: http://miroslav.suchy.cz/fedora/spdx-reports/mutt.html

For the elfutils project Housam Alamour created a new eu-srcfiles
utility for version 0.190 (already packaged for Fedora 37..Rawhide),
which might be helpful for native ELF plus DWARF based packages. At the
moment it does require you have the build requires and debugsources
installed, but a newer version will query debuginfod for that.

With that we might be able to build somewhat simpler command line tools
to help packagers extract all the license snippets found in the source
files included in each binary.

Great. I did not know this.

e.g. for the mutt package, you can get a rough estimate of licenses
used in all the binaries using (if that is really what you are
insisting packagers do instead of just using the declared licenses):

$ dnf install mutt
$ dnf builddep mutt
$ dnf debuginfo-install mutt

$ for i in `rpm -ql mutt`; do eu-elfclassify --elf --file $i; \
  if [ $? -eq 0 ]; then eu-srcfiles --exec $i; fi; done | sort -u \
  | xargs licensecheck --shortname-scheme spdx | cut -f2- -d: \
  | sort -u | sed -z -e 's/\n / AND /g'

 GPL-2.0-or-later AND GPL-3.0-or-later AND HPND-sell-variant AND LGPL
AND LGPL-2.1-or-later AND *No copyright* GPL-2.0-or-later AND *No
copyright* public-domain AND *No copyright* UNKNOWN AND UNKNOWN AND
Zlib

Which still requires lots of investigation (there are various
UNKNOWNs), but might be a good starting point. At least for me
something like that would be much more usable than a container image
packaged webapp.

There is many ways to do it. We are all exploring various ways. If this is better for you then use it. I am afraid this will not work for noarch packages.



On Mon, 2023-09-18 at 20:47 -0400, Richard Fontana wrote:
On Sun, Sep 17, 2023 at 11:37 AM Mark Wielaard <mark@klomp.org> wrote:
To be clear I don't mind using a different set of short-hands in the
License tags. Although it feels a little odd to try to create separate
identifiers for lax-permissive MIT/BSD like licenses which sometimes
just different in one or two words.
FWIW, usually a difference of one or two words wouldn't be enough to
result in creation of a distinct SPDX identifier. The standard applied
by SPDX is, informally, whether the difference is "legally
substantive" (this has its flaws but seems to work OK in practice).
But then for the Hybrid BSD license that parts of bzip and valgrind
uses it actually has different identifiers depending on the version of
the package (it actually has both bzip2-1.0.5 and bzip2-1.0.6 which are
literally exactly the same except for the version string and the
copyright year).

The SPDX uses markup which allows variation in license. E.g. when you look at

https://spdx.org/licenses/BSD-3-Clause.html the red parts allow variations. What and how is better visible in source:

https://github.com/spdx/license-list-XML/blob/main/src/BSD-3-Clause.xml

Tags <copyrightText> allows any variations. <alt> allows only regexp variations.

OK, so how would we do this for this Hybrid BSD license?
And what is "well defined"?

Open issue with your proposal at https://gitlab.com/fedora/legal/fedora-license-data

"well defined" means that there will be common consensus that the definition is clear. :)



What is the goal of dropping the effective license and make packagers
list all the licences of some code snippets originally incorporated
under lax-permissive licenses? Is that not just make work for the
packager if upsteam just uses one effective license?
One rationale is given in Fedora legal documentation:
"There is no agreed-upon set of criteria or rules under which one can
make conclusions about “effective” licenses or reduce composite
license expressions to something simpler."
Isn't that not just like most other things fedora, we follow
upstream. Upstream states the (effective) license and we just adopt
that. If we notice that there might be a bug and the effective license
isn't exactly as the upstream project states, then we fix that
upstream?
I basically don't recognize "effective license" as a valid concept. I
see people using it, perhaps increasingly, but I never see any
definition of what it means.
It sounds like you are using it to mean "whatever the upstream project
seems to say the license is, despite possible evidence to the
contrary".  I'm not sure that's how other people are using "effective
license".
I would call it the intended license. Normally an (upstream) project
declares their intended license by placing a COPYING or LICENSE
document at the top-level (or different ones in subdirs if different
parts have different intended licenses). That intended license is the
effective license, meaning the license you would have to follow when
redistributing the project. Any other licenses used in the project
would only have requirements that are subsumed by the intended license.

No.

This was maybe enough in past when industry used OSS projects rarely. Now it is used masively. And we want that industry complies with our OSS licenses. If we want this then we should provide good overview what licenses are used.

E.g., your mutt package. The upstream claims it is GPL-2.0-or-later https://gitlab.com/muttmua/mutt/-/blob/master/COPYRIGHT?ref_type=heads but there is

https://gitlab.com/muttmua/mutt/-/blob/master/wcwidth.c?ref_type=heads#L8 which is HPND-Markus-Kuhn. And now imagine that there is user/company for which HPND-Markus-Kuhn license is problematic and they cannot use it. If you use only

 License: GPL-2.0-or-later

they will never know. But when you use

  License: GPL-2.0-or-later AND HPND-Markus-Kuhn

then it is pretty easy for them to do the audit and avoid this package and use alternative.

This example may look artificial, but I know a lot of companies that want to avoid GPL-3.0-or-later. And Fedora itself avoids many licenses that other find ok. E.g. JSON or BSD-3-Clause-Clear

https://docs.fedoraproject.org/en-US/legal/not-allowed-licenses/

Both REUSE and SPDX are intended to be used at the individual source
file level. Trying to use them at the binary or package level as Fedora
wants to do seems to bring up these conflicts yes.
I disagree. As I mentioned SPDX defines whole SBOM model exactly for large projects and large deployments.
That doesn't mean there aren't "standards" for this. Like I said
upstream often has a top-level LICENSE, COPYING or README file
declaring the intended license. There also often is a NOTICES file
listing any legal notices subsumed by the intended/effective license.

This is not standard. This is habit. Very far from any possible automation and machine parsing.


Just dropping the license tags from the spec file is an interesting
idea. Would we then adopt something like a separate copyright file like
Debian does?

Likely. It needs to be proposed as Change and discussed and approved. I expect a LOT of discussion about it.

I will be very glad if somebody will drive this.


-- 
Miroslav Suchy, RHCA
Red Hat, Manager, Packit and CPT, #brno, #fedora-buildsys