Some of the complaints that have surfaced since the migration from the Callaway system to SPDX seem to be, at root, an aesthetic distaste for complex license expressions in RPM license metadata. This may explain why some favor application of "effective license" analysis. I suspect there is also a sort of psychological desire to hide the underlying licensing complexity that characterizes many packages.
I do think that the current approach can be criticized as being overly pedantic, and perhaps also internally contradictory (some of Florian's recent comments get at the various ways in which we are being contradictory). We have a still-undocumented rule that what I call "true public domain" should not be reflected in the License: field (unless it would otherwise be empty), yet we have carefully attempted to collect nonstandard public domain dedication statements and cover those by `LicenseRef-Fedora-Public-Domain`. We have been using a similar approach with `LicenseRef-Fedora-UltraPermissive`. These basically replace Callaway system names "Public domain" (though this was sometimes used for "true public domain") and "Freely redistributable without restrictions", respectively.
I think it can reasonably be argued that there is little point in including `LicenseRef-Fedora-Public-Domain` and `LicenseRef-Fedora-UltraPermissive` in the License: field since they are associated with no conditions or obligations. In those special cases where the License: field would otherwise be empty, we can ask SPDX to create unique identifiers for the license text in question.
We might want to extend this principle to other things, such as GPL exceptions that entail no conditions in the use case encountered in particular packages. (There is already an old issue about this, I think concerning the Bison exception.)
This wouldn't do *that* much to make License: fields simpler, so maybe it's not particularly worthwhile. There is also the problem that if we make it optional, package maintainers may be less likely to scrutinize things that are assumed to fall into these kinds of categories, when in some cases they actually wouldn't, although I think it's now clear that those situations are uncommon. In theory we'd still expect package maintainers to submit issues to have things that seem to qualify for LicenseRef-Fedora-Public-Domain reviewed, but it might be challenging to enforce that expectation and the Fedora Legal team would have to end up doing all that work themselves, which might be a justifiable result.
As with abandoning the "license of the binary" rule, this would seemingly be a major departure from the principles established under the Callaway system.
Any thoughts on this?
Richard
Dne 24. 08. 23 v 20:52 Richard Fontana napsal(a):
Some of the complaints that have surfaced since the migration from the Callaway system to SPDX seem to be, at root, an aesthetic distaste for complex license expressions in RPM license metadata. This may explain why some favor application of "effective license" analysis. I suspect there is also a sort of psychological desire to hide the underlying licensing complexity that characterizes many packages.
I do think that the current approach can be criticized as being overly pedantic, and perhaps also internally contradictory (some of Florian's recent comments get at the various ways in which we are being contradictory). We have a still-undocumented rule that what I call "true public domain" should not be reflected in the License: field
The problem is that leaving out this "true public domain" tag makes license review harder in a sense.
Let me explain. If I am reviewing license and find some file being "true public domain", leaving it out might mean that it won't be recorded anywhere that it was already identified as a "true public domain". Doing the review next time, I (or somebody else) will need to find it the hard way again.
I think that the current license field is unfortunately very limited in expressing the source license. I wish if we were able to record the license per file or even per file lines. But admittedly, this won't be easier.
Vít
(unless it would otherwise be empty), yet we have carefully attempted to collect nonstandard public domain dedication statements and cover those by `LicenseRef-Fedora-Public-Domain`. We have been using a similar approach with `LicenseRef-Fedora-UltraPermissive`. These basically replace Callaway system names "Public domain" (though this was sometimes used for "true public domain") and "Freely redistributable without restrictions", respectively.
I think it can reasonably be argued that there is little point in including `LicenseRef-Fedora-Public-Domain` and `LicenseRef-Fedora-UltraPermissive` in the License: field since they are associated with no conditions or obligations. In those special cases where the License: field would otherwise be empty, we can ask SPDX to create unique identifiers for the license text in question.
We might want to extend this principle to other things, such as GPL exceptions that entail no conditions in the use case encountered in particular packages. (There is already an old issue about this, I think concerning the Bison exception.)
This wouldn't do *that* much to make License: fields simpler, so maybe it's not particularly worthwhile. There is also the problem that if we make it optional, package maintainers may be less likely to scrutinize things that are assumed to fall into these kinds of categories, when in some cases they actually wouldn't, although I think it's now clear that those situations are uncommon. In theory we'd still expect package maintainers to submit issues to have things that seem to qualify for LicenseRef-Fedora-Public-Domain reviewed, but it might be challenging to enforce that expectation and the Fedora Legal team would have to end up doing all that work themselves, which might be a justifiable result.
As with abandoning the "license of the binary" rule, this would seemingly be a major departure from the principles established under the Callaway system.
Any thoughts on this?
Richard _______________________________________________ legal mailing list -- legal@lists.fedoraproject.org To unsubscribe send an email to legal-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/legal@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
On Mon, Aug 28, 2023 at 10:31 AM Vít Ondruch vondruch@redhat.com wrote:
Dne 24. 08. 23 v 20:52 Richard Fontana napsal(a):
Some of the complaints that have surfaced since the migration from the Callaway system to SPDX seem to be, at root, an aesthetic distaste for complex license expressions in RPM license metadata. This may explain why some favor application of "effective license" analysis. I suspect there is also a sort of psychological desire to hide the underlying licensing complexity that characterizes many packages.
I do think that the current approach can be criticized as being overly pedantic, and perhaps also internally contradictory (some of Florian's recent comments get at the various ways in which we are being contradictory). We have a still-undocumented rule that what I call "true public domain" should not be reflected in the License: field
The problem is that leaving out this "true public domain" tag makes license review harder in a sense.
Let me explain. If I am reviewing license and find some file being "true public domain", leaving it out might mean that it won't be recorded anywhere that it was already identified as a "true public domain". Doing the review next time, I (or somebody else) will need to find it the hard way again.
I think that the current license field is unfortunately very limited in expressing the source license. I wish if we were able to record the license per file or even per file lines. But admittedly, this won't be easier.
I guess we have been stretching the "License: " field beyond whatever purpose it was originally supposed to have (probably never well defined or thought out). It is not useful by itself for keeping track of source-file-specific license review.
The REUSE specification (https://reuse.software), which enforces a per-source-file license identification discipline, might be a way to facilitate that, but that is something that is generally adopted by upstream projects, not by downstream packagers.
fedora-license-data and fedora-legal-docs themselves conform to REUSE, largely relying on the use of a dep5 file (even though I think REUSE disapproves of that approach) and using some custom-defined license identifiers that I think REUSE might frown upon (but which do serve to keep track of "true public domain" stuff). See: https://gitlab.com/fedora/legal/fedora-license-data/-/blob/main/.reuse/dep5?... https://gitlab.com/fedora/legal/fedora-legal-docs/-/blob/main/.reuse/dep5?re...
Richard
On Thu, Aug 24, 2023 at 8:52 PM Richard Fontana rfontana@redhat.com wrote:
Some of the complaints that have surfaced since the migration from the Callaway system to SPDX seem to be, at root, an aesthetic distaste for complex license expressions in RPM license metadata. This may explain why some favor application of "effective license" analysis. I suspect there is also a sort of psychological desire to hide the underlying licensing complexity that characterizes many packages.
I do think that the current approach can be criticized as being overly pedantic, and perhaps also internally contradictory (some of Florian's recent comments get at the various ways in which we are being contradictory). We have a still-undocumented rule that what I call "true public domain" should not be reflected in the License: field (unless it would otherwise be empty), yet we have carefully attempted to collect nonstandard public domain dedication statements and cover those by `LicenseRef-Fedora-Public-Domain`. We have been using a similar approach with `LicenseRef-Fedora-UltraPermissive`. These basically replace Callaway system names "Public domain" (though this was sometimes used for "true public domain") and "Freely redistributable without restrictions", respectively.
I think it can reasonably be argued that there is little point in including `LicenseRef-Fedora-Public-Domain` and `LicenseRef-Fedora-UltraPermissive` in the License: field since they are associated with no conditions or obligations. In those special cases where the License: field would otherwise be empty, we can ask SPDX to create unique identifiers for the license text in question.
We might want to extend this principle to other things, such as GPL exceptions that entail no conditions in the use case encountered in particular packages. (There is already an old issue about this, I think concerning the Bison exception.)
This wouldn't do *that* much to make License: fields simpler, so maybe it's not particularly worthwhile. There is also the problem that if we make it optional, package maintainers may be less likely to scrutinize things that are assumed to fall into these kinds of categories, when in some cases they actually wouldn't, although I think it's now clear that those situations are uncommon. In theory we'd still expect package maintainers to submit issues to have things that seem to qualify for LicenseRef-Fedora-Public-Domain reviewed, but it might be challenging to enforce that expectation and the Fedora Legal team would have to end up doing all that work themselves, which might be a justifiable result.
Wouldn't dropping licenses (or exceptions) that entail no conditions just be another way to do "effective license analysis" (i.e. who needs to decide whether the license entails no conditions)? Listing everything might be verbose, but it at least has the benefit of being *simple*, and doesn't involve judgement calls like "this license doesn't matter in this case").
As with abandoning the "license of the binary" rule, this would seemingly be a major departure from the principles established under the Callaway system.
I think keeping the distinction between contents of the upstream sources and contents of the packages we distribute is worthwhile. It's one way to make the License information more meaningful for actual users. Just listing the licenses of the files in the upstream project (whether the contents end up shipped in our packages or not) is "just passing through" information and not particularly useful (in which case we could just say "the license of this package is the license of this upstream project, go look it up yourself" instead of including a License tag in the RPM).
Fabio
Dne 30. 08. 23 v 19:07 Fabio Valentini napsal(a):
On Thu, Aug 24, 2023 at 8:52 PM Richard Fontana rfontana@redhat.com wrote:
Some of the complaints that have surfaced since the migration from the Callaway system to SPDX seem to be, at root, an aesthetic distaste for complex license expressions in RPM license metadata. This may explain why some favor application of "effective license" analysis. I suspect there is also a sort of psychological desire to hide the underlying licensing complexity that characterizes many packages.
I do think that the current approach can be criticized as being overly pedantic, and perhaps also internally contradictory (some of Florian's recent comments get at the various ways in which we are being contradictory). We have a still-undocumented rule that what I call "true public domain" should not be reflected in the License: field (unless it would otherwise be empty), yet we have carefully attempted to collect nonstandard public domain dedication statements and cover those by `LicenseRef-Fedora-Public-Domain`. We have been using a similar approach with `LicenseRef-Fedora-UltraPermissive`. These basically replace Callaway system names "Public domain" (though this was sometimes used for "true public domain") and "Freely redistributable without restrictions", respectively.
I think it can reasonably be argued that there is little point in including `LicenseRef-Fedora-Public-Domain` and `LicenseRef-Fedora-UltraPermissive` in the License: field since they are associated with no conditions or obligations. In those special cases where the License: field would otherwise be empty, we can ask SPDX to create unique identifiers for the license text in question.
We might want to extend this principle to other things, such as GPL exceptions that entail no conditions in the use case encountered in particular packages. (There is already an old issue about this, I think concerning the Bison exception.)
This wouldn't do *that* much to make License: fields simpler, so maybe it's not particularly worthwhile. There is also the problem that if we make it optional, package maintainers may be less likely to scrutinize things that are assumed to fall into these kinds of categories, when in some cases they actually wouldn't, although I think it's now clear that those situations are uncommon. In theory we'd still expect package maintainers to submit issues to have things that seem to qualify for LicenseRef-Fedora-Public-Domain reviewed, but it might be challenging to enforce that expectation and the Fedora Legal team would have to end up doing all that work themselves, which might be a justifiable result.
Wouldn't dropping licenses (or exceptions) that entail no conditions just be another way to do "effective license analysis" (i.e. who needs to decide whether the license entails no conditions)? Listing everything might be verbose, but it at least has the benefit of being *simple*, and doesn't involve judgement calls like "this license doesn't matter in this case").
As with abandoning the "license of the binary" rule, this would seemingly be a major departure from the principles established under the Callaway system.
I think keeping the distinction between contents of the upstream sources and contents of the packages we distribute is worthwhile. It's one way to make the License information more meaningful for actual users. Just listing the licenses of the files in the upstream project (whether the contents end up shipped in our packages or not) is "just passing through" information and not particularly useful (in which case we could just say "the license of this package is the license of this upstream project, go look it up yourself" instead of including a License tag in the RPM).
I concur with this paragraph
Vít
Fabio _______________________________________________ legal mailing list -- legal@lists.fedoraproject.org To unsubscribe send an email to legal-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/legal@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
On 8/31/23 1:59 AM, Vít Ondruch wrote:
Dne 30. 08. 23 v 19:07 Fabio Valentini napsal(a):
On Thu, Aug 24, 2023 at 8:52 PM Richard Fontana rfontana@redhat.com wrote:
Some of the complaints that have surfaced since the migration from the Callaway system to SPDX seem to be, at root, an aesthetic distaste for complex license expressions in RPM license metadata. This may explain why some favor application of "effective license" analysis. I suspect there is also a sort of psychological desire to hide the underlying licensing complexity that characterizes many packages.
While I can understand the 'aesthetic distaste" for complex license expressions, they reflect the reality, which is that software licensing is complex. I'm not sure complex is even the right word - it's extensive might be a better way to put it. We always knew there were many variations of open source licenses out there (now I sound like Richard!) - we are merely reflecting that reality.
I would hope we can get past distastes or complaints or psychological desires that go against the given reality :)
I do think that the current approach can be criticized as being overly pedantic, and perhaps also internally contradictory (some of Florian's recent comments get at the various ways in which we are being contradictory). We have a still-undocumented rule that what I call "true public domain" should not be reflected in the License: field (unless it would otherwise be empty), yet we have carefully attempted to collect nonstandard public domain dedication statements and cover those by `LicenseRef-Fedora-Public-Domain`. We have been using a similar approach with `LicenseRef-Fedora-UltraPermissive`. These basically replace Callaway system names "Public domain" (though this was sometimes used for "true public domain") and "Freely redistributable without restrictions", respectively.
They do kind of replace the previous system, but I think we have articulated those categories more explicitly and are capturing the text so someone can "check" that if they so desire.
I think it can reasonably be argued that there is little point in including `LicenseRef-Fedora-Public-Domain` and `LicenseRef-Fedora-UltraPermissive` in the License: field since they are associated with no conditions or obligations. In those special cases where the License: field would otherwise be empty, we can ask SPDX to create unique identifiers for the license text in question.
We might want to extend this principle to other things, such as GPL exceptions that entail no conditions in the use case encountered in particular packages. (There is already an old issue about this, I think concerning the Bison exception.)
This wouldn't do *that* much to make License: fields simpler, so maybe it's not particularly worthwhile. There is also the problem that if we make it optional, package maintainers may be less likely to scrutinize things that are assumed to fall into these kinds of categories, when in some cases they actually wouldn't, although I think it's now clear that those situations are uncommon. In theory we'd still expect package maintainers to submit issues to have things that seem to qualify for LicenseRef-Fedora-Public-Domain reviewed, but it might be challenging to enforce that expectation and the Fedora Legal team would have to end up doing all that work themselves, which might be a justifiable result.
Wouldn't dropping licenses (or exceptions) that entail no conditions just be another way to do "effective license analysis" (i.e. who needs to decide whether the license entails no conditions)? Listing everything might be verbose, but it at least has the benefit of being *simple*, and doesn't involve judgement calls like "this license doesn't matter in this case").
I would say that dropping any information about what we actually found is not only sort of making an analysis, it also creates a potential "mismatch" with what is reported and what is there, and that can then undermine the value of what we report to begin as it leaves a downstream recipient wondering, 'what else did they leave out?'
As with abandoning the "license of the binary" rule, this would seemingly be a major departure from the principles established under the Callaway system.
I think keeping the distinction between contents of the upstream sources and contents of the packages we distribute is worthwhile. It's one way to make the License information more meaningful for actual users. Just listing the licenses of the files in the upstream project (whether the contents end up shipped in our packages or not) is "just passing through" information and not particularly useful (in which case we could just say "the license of this package is the license of this upstream project, go look it up yourself" instead of including a License tag in the RPM).
I concur with this paragraph
100% agree. What is distributed is key in terms of compliance for almost all open source licenses. If you hire an auditor to help you determine what open source licenses are present in your given code base and that you need to comply with - that process will begin with a scan of the source code and then an analysis from there as to what is actually distributed, which usually involves a conversation with the engineers involved in creating the code base.
Vít
Fabio _______________________________________________ legal mailing list -- legal@lists.fedoraproject.org To unsubscribe send an email to legal-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/legal@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
legal mailing list -- legal@lists.fedoraproject.org To unsubscribe send an email to legal-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/legal@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
On Wed, Aug 30, 2023 at 1:08 PM Fabio Valentini decathorpe@gmail.com wrote:
Wouldn't dropping licenses (or exceptions) that entail no conditions just be another way to do "effective license analysis" (i.e. who needs to decide whether the license entails no conditions)? Listing everything might be verbose, but it at least has the benefit of being *simple*, and doesn't involve judgement calls like "this license doesn't matter in this case").
The assumption here is that packages will continue to be reviewed carefully wrt licensing and that new licenses encountered in source code will continue to go through the Fedora review process and be added to fedora-license-data. I was thinking that the excludability characteristic could be recorded at the time that a license identifier is added to fedora-license-data. It wouldn't be package-specific, except that if the otherwise excludable license is the only identifiable license text applicable to a package, it would have to be listed.
Just listing the licenses of the files in the upstream project (whether the contents end up shipped in our packages or not) is "just passing through" information and not particularly useful (in which case we could just say "the license of this package is the license of this upstream project, go look it up yourself" instead of including a License tag in the RPM).
Agreed, I think this was one of the main reasons why we ended up not adopting a "license of the source code" approach.
Richard
On Thu, Aug 24, 2023 at 02:52:21PM -0400, Richard Fontana wrote:
Some of the complaints that have surfaced since the migration from the Callaway system to SPDX seem to be, at root, an aesthetic distaste for complex license expressions in RPM license metadata. This may explain why some favor application of "effective license" analysis. I suspect there is also a sort of psychological desire to hide the underlying licensing complexity that characterizes many packages.
Lets take the proposed change to the kernel spec:
https://gitlab.com/cki-project/kernel-ark/-/merge_requests/2648/diffs#b49eec...
as an example of "complex license expressions" for which there is likely an aesthetic distaste. Each distinct SPDX-License-Identifier tag expession, is combined such that we end up with:
License: ((GPL-2.0-only WITH Linux-syscall-note) OR BSD-2-Clause) AND ((GPL-2.0-only WITH Linux-syscall-note) OR BSD-3-Clause) AND ((GPL-2.0-only WITH Linux-syscall-note) OR CDDL-1.0) AND ((GPL-2.0-only WITH Linux-syscall-note) OR Linux-OpenIB) AND ((GPL-2.0-only WITH Linux-syscall-note) OR MIT) AND ((GPL-2.0-or-later WITH Linux-syscall-note) OR BSD-3-Clause) AND ((GPL-2.0-or-later WITH Linux-syscall-note) OR MIT) AND BSD-2-Clause AND BSD-3-Clause AND BSD-3-Clause-Clear AND GPL-1.0-or-later AND (GPL-1.0-or-later OR BSD-3-Clause) AND (GPL-1.0-or-later WITH Linux-syscall-note) AND GPL-2.0-only AND (GPL-2.0-only OR Apache-2.0) AND (GPL-2.0-only OR BSD-2-Clause) AND (GPL-2.0-only OR BSD-3-Clause) AND (GPL-2.0-only OR CDDL-1.0) AND (GPL-2.0-only OR Linux-OpenIB) AND (GPL-2.0-only OR MIT) AND (GPL-2.0-only OR X11) AND (GPL-2.0-only WITH Linux-syscall-note) AND GPL-2.0-or-later AND (GPL-2.0-or-later OR BSD-2-Clause) AND (GPL-2.0-or-later OR BSD-3-Clause) AND (GPL-2.0-or-later OR MIT) AND (GPL-2.0-or-later WITH GCC-exception-2.0) AND (GPL-2.0-or-later WITH Linux-syscall-note) AND ISC AND LGPL-2.0-or-later AND (LGPL-2.0-or-later OR BSD-2-Clause) AND (LGPL-2.0-or-later WITH Linux-syscall-note) AND LGPL-2.1-only AND (LGPL-2.1-only OR BSD-2-Clause) AND (LGPL-2.1-only WITH Linux-syscall-note) AND LGPL-2.1-or-later AND (LGPL-2.1-or-later WITH Linux-syscall-note) AND (Linux-OpenIB OR GPL-2.0-only) AND (Linux-OpenIB OR GPL-2.0-only OR BSD-2-Clause) AND MIT AND (MIT OR Apache-2.0) AND (MIT OR GPL-2.0-only) AND (MIT OR GPL-2.0-or-later) AND (MIT OR LGPL-2.1-only) AND (MPL-1.1 OR GPL-2.0-only) AND (X11 OR GPL-2.0-only) AND (X11 OR GPL-2.0-or-later) AND Zlib AND (copyleft-next-0.3.1 OR GPL-2.0-or-later) AND (Redistributable, no modification permitted)
While the majority of files in the kernel are "GPL-2.0-only", a number of files are offered under a choice of licenses (OR). Even if 99% of files were simply GPL-2.0-only, it only takes a handful of files being offered under a choice, to result in an enourmous SPDX expression like the one above. In the above example, at a bare minimum it would only take 30 files, out of the kernel's 80,000 to have distinct licence choices to cause the existance the above expression.
While this is an accurate reflection of the range of distinct file license choices, I'm not convinced that this approach is especially beneficial to Fedora users.
What purpose does it serve to list "MPL-1.1 OR GPL-2.0-only" and "MIT OR LGPL-2.1-only", etc if only perhaps < 1% of files carry this choice and we're not telling the user which 1% of files it applies to ?
The previous effective license analysis addressed this problem, such that everything reduced down to "GPLv2 and Redistributable" I don't want to suggest going back to effective analysis as I think that was overly simplified, but perhaps we can finese what we're doing today.
ie tather than trying to maintain the full list of choices, can we eliminate all the OR clauses, such that we present just a flat list of each distinct SPDX license name that is found. IOW, the above kernel SPDX expression would be
License: Apache-2.0 AND BSD-2-Clause AND BSD-3-Clause AND BSD-3-Clause-Clear AND CDDL-1.0 AND copyleft-next-0.3.1 AND GPL-1.0-or-later AND GPL-1.0-or-later-WITH-Linux-syscall-note AND GPL-2.0-only AND GPL-2.0-only-WITH-Linux-syscall-note AND GPL-2.0-or-later AND GPL-2.0-or-later-WITH-GCC-exception-2.0 AND GPL-2.0-or-later-WITH-Linux-syscall-note AND ISC AND LGPL-2.0-or-later AND LGPL-2.0-or-later-WITH-Linux-syscall-note AND LGPL-2.1-only AND LGPL-2.1-only-WITH-Linux-syscall-note AND LGPL-2.1-or-later AND LGPL-2.1-or-later-WITH-Linux-syscall-note AND Linux-OpenIB AND MIT AND MPL-1.1 AND Redistributable, no modification permitted AND X11 AND Zlib
I do think that the current approach can be criticized as being overly pedantic, and perhaps also internally contradictory (some of Florian's recent comments get at the various ways in which we are being contradictory). We have a still-undocumented rule that what I call "true public domain" should not be reflected in the License: field (unless it would otherwise be empty), yet we have carefully attempted to collect nonstandard public domain dedication statements and cover those by `LicenseRef-Fedora-Public-Domain`. We have been using a similar approach with `LicenseRef-Fedora-UltraPermissive`. These basically replace Callaway system names "Public domain" (though this was sometimes used for "true public domain") and "Freely redistributable without restrictions", respectively.
I think it can reasonably be argued that there is little point in including `LicenseRef-Fedora-Public-Domain` and `LicenseRef-Fedora-UltraPermissive` in the License: field since they are associated with no conditions or obligations. In those special cases where the License: field would otherwise be empty, we can ask SPDX to create unique identifiers for the license text in question.
I think there is value in LicenseRef-Fedora-Public-Domain, etc because it expresses the fact that license analysis has actually been performed and these public domain choices have been correctly identified. I don't like the need to special case the omission to avoid an entirely empty License: field. If we have a need to record LicenseRef-Fedora-Public-Domain in any scenario, we should be consistent.
eg consider a package is 100% public domain initially so we have to record that to avoid empty field:
License: LicenseRef-Fedora-Public-Domain
then one day a file is added which is MIT. I would find it pretty strange for the rule to say we can now drop the LicenseRef-Fedora-Public-Domain to go to just record:
License: MIT
when 99% of the files are still LicenseRef-Fedora-Public-Domain and only 1 single file were MIT.
IMHO the package should be changed to say
License: LicenseRef-Fedora-Public-Domain and MIT
IOW, I think we should always be recording the license, even if it is a public domain LicenseRef term.
We might want to extend this principle to other things, such as GPL exceptions that entail no conditions in the use case encountered in particular packages. (There is already an old issue about this, I think concerning the Bison exception.)
Personally I like the way we're not recording the existance of each license and exception, just not the creation of the combinatorial expansion of each license choice.
This wouldn't do *that* much to make License: fields simpler, so maybe it's not particularly worthwhile. There is also the problem that if we make it optional, package maintainers may be less likely to scrutinize things that are assumed to fall into these kinds of categories, when in some cases they actually wouldn't, although I think it's now clear that those situations are uncommon. In theory we'd still expect package maintainers to submit issues to have things that seem to qualify for LicenseRef-Fedora-Public-Domain reviewed, but it might be challenging to enforce that expectation and the Fedora Legal team would have to end up doing all that work themselves, which might be a justifiable result.
As with abandoning the "license of the binary" rule, this would seemingly be a major departure from the principles established under the Callaway system.
Any thoughts on this?
With regards, Daniel
On 8/31/23 2:39 AM, Daniel P. Berrangé wrote:
On Thu, Aug 24, 2023 at 02:52:21PM -0400, Richard Fontana wrote:
Some of the complaints that have surfaced since the migration from the Callaway system to SPDX seem to be, at root, an aesthetic distaste for complex license expressions in RPM license metadata. This may explain why some favor application of "effective license" analysis. I suspect there is also a sort of psychological desire to hide the underlying licensing complexity that characterizes many packages.
Lets take the proposed change to the kernel spec:
https://gitlab.com/cki-project/kernel-ark/-/merge_requests/2648/diffs#b49eec...
as an example of "complex license expressions" for which there is likely an aesthetic distaste. Each distinct SPDX-License-Identifier tag expession, is combined such that we end up with:
License: ((GPL-2.0-only WITH Linux-syscall-note) OR BSD-2-Clause) AND ((GPL-2.0-only WITH Linux-syscall-note) OR BSD-3-Clause) AND ((GPL-2.0-only WITH Linux-syscall-note) OR CDDL-1.0) AND ((GPL-2.0-only WITH Linux-syscall-note) OR Linux-OpenIB) AND ((GPL-2.0-only WITH Linux-syscall-note) OR MIT) AND ((GPL-2.0-or-later WITH Linux-syscall-note) OR BSD-3-Clause) AND ((GPL-2.0-or-later WITH Linux-syscall-note) OR MIT) AND BSD-2-Clause AND BSD-3-Clause AND BSD-3-Clause-Clear AND GPL-1.0-or-later AND (GPL-1.0-or-later OR BSD-3-Clause) AND (GPL-1.0-or-later WITH Linux-syscall-note) AND GPL-2.0-only AND (GPL-2.0-only OR Apache-2.0) AND (GPL-2.0-only OR BSD-2-Clause) AND (GPL-2.0-only OR BSD-3-Clause) AND (GPL-2.0-only OR CDDL-1.0) AND (GPL-2.0-only OR Linux-OpenIB) AND (GPL-2.0-only OR MIT) AND (GPL-2.0-only OR X11) AND (GPL-2.0-only WITH Linux-syscall-note) AND GPL-2.0-or-later AND (GPL-2.0-or-later OR BSD-2-Clause) AND (GPL-2.0-or-later OR BSD-3-Clause) AND (GPL-2.0-or-later OR MIT) AND (GPL-2.0-or-later WITH GCC-exception-2.0) AND (GPL-2.0-or-later WITH Linux-syscall-note) AND ISC AND LGPL-2.0-or-later AND (LGPL-2.0-or-later OR BSD-2-Clause) AND (LGPL-2.0-or-later WITH Linux-syscall-note) AND LGPL-2.1-only AND (LGPL-2.1-only OR BSD-2-Clause) AND (LGPL-2.1-only WITH Linux-syscall-note) AND LGPL-2.1-or-later AND (LGPL-2.1-or-later WITH Linux-syscall-note) AND (Linux-OpenIB OR GPL-2.0-only) AND (Linux-OpenIB OR GPL-2.0-only OR BSD-2-Clause) AND MIT AND (MIT OR Apache-2.0) AND (MIT OR GPL-2.0-only) AND (MIT OR GPL-2.0-or-later) AND (MIT OR LGPL-2.1-only) AND (MPL-1.1 OR GPL-2.0-only) AND (X11 OR GPL-2.0-only) AND (X11 OR GPL-2.0-or-later) AND Zlib AND (copyleft-next-0.3.1 OR GPL-2.0-or-later) AND (Redistributable, no modification permitted)
Given that the kernel is a very large package with many files and it has adopted SPDX ids at the file level (which means the licensing info is far more complete and easier to parse :) - there is nothing surprising to me about the length of this string. It is what it is!
While the majority of files in the kernel are "GPL-2.0-only", a number of files are offered under a choice of licenses (OR). Even if 99% of files were simply GPL-2.0-only, it only takes a handful of files being offered under a choice, to result in an enourmous SPDX expression like the one above. In the above example, at a bare minimum it would only take 30 files, out of the kernel's 80,000 to have distinct licence choices to cause the existance the above expression.
That's an interesting point, but I'm not sure how we could justify some kind of an exception in such a case
While this is an accurate reflection of the range of distinct file license choices, I'm not convinced that this approach is especially beneficial to Fedora users.
well, it's not really just about Fedora users - besides the benefit downstream, I think there is some benefit to what Fedora is doing in a broader, example-setting, ecosystem sense. I guess part of this feeling comes from my thinking that any desire or attempt to obscure the license complexity is not a good thing and potentially creates more work or issues - reflecting the reality, to me, sets a good precedent
What purpose does it serve to list "MPL-1.1 OR GPL-2.0-only" and "MIT OR LGPL-2.1-only", etc if only perhaps < 1% of files carry this choice and we're not telling the user which 1% of files it applies to ?
they can run a license scanner and create an SPDX document that shows the file level license info to determine this. And that report will be far more complex and lengthy than what you came up with above ;) In that way, what you have above is a useful "summary" and accurate reflection of the big picture
The previous effective license analysis addressed this problem, such that everything reduced down to "GPLv2 and Redistributable" I don't want to suggest going back to effective analysis as I think that was overly simplified, but perhaps we can finese what we're doing today.
ie tather than trying to maintain the full list of choices, can we eliminate all the OR clauses, such that we present just a flat list of each distinct SPDX license name that is found. IOW, the above kernel SPDX expression would be
License: Apache-2.0 AND BSD-2-Clause AND BSD-3-Clause AND BSD-3-Clause-Clear AND CDDL-1.0 AND copyleft-next-0.3.1 AND GPL-1.0-or-later AND GPL-1.0-or-later-WITH-Linux-syscall-note AND GPL-2.0-only AND GPL-2.0-only-WITH-Linux-syscall-note AND GPL-2.0-or-later AND GPL-2.0-or-later-WITH-GCC-exception-2.0 AND GPL-2.0-or-later-WITH-Linux-syscall-note AND ISC AND LGPL-2.0-or-later AND LGPL-2.0-or-later-WITH-Linux-syscall-note AND LGPL-2.1-only AND LGPL-2.1-only-WITH-Linux-syscall-note AND LGPL-2.1-or-later AND LGPL-2.1-or-later-WITH-Linux-syscall-note AND Linux-OpenIB AND MIT AND MPL-1.1 AND Redistributable, no modification permitted AND X11 AND Zlib
but then this would be an exception to our original policy? and how would we articulate that? I'm not sure why this is really any "better" than your original - it's just shorter and truncated.
oh, and we should take a look at the "Redistributable, no modification permitted" ones... that is likely the firmware licenses that were never captured
I do think that the current approach can be criticized as being overly pedantic, and perhaps also internally contradictory (some of Florian's recent comments get at the various ways in which we are being contradictory). We have a still-undocumented rule that what I call "true public domain" should not be reflected in the License: field (unless it would otherwise be empty), yet we have carefully attempted to collect nonstandard public domain dedication statements and cover those by `LicenseRef-Fedora-Public-Domain`. We have been using a similar approach with `LicenseRef-Fedora-UltraPermissive`. These basically replace Callaway system names "Public domain" (though this was sometimes used for "true public domain") and "Freely redistributable without restrictions", respectively.
I think it can reasonably be argued that there is little point in including `LicenseRef-Fedora-Public-Domain` and `LicenseRef-Fedora-UltraPermissive` in the License: field since they are associated with no conditions or obligations. In those special cases where the License: field would otherwise be empty, we can ask SPDX to create unique identifiers for the license text in question.
I think there is value in LicenseRef-Fedora-Public-Domain, etc because it expresses the fact that license analysis has actually been performed and these public domain choices have been correctly identified. I don't like the need to special case the omission to avoid an entirely empty License: field. If we have a need to record LicenseRef-Fedora-Public-Domain in any scenario, we should be consistent.
eg consider a package is 100% public domain initially so we have to record that to avoid empty field:
License: LicenseRef-Fedora-Public-Domainthen one day a file is added which is MIT. I would find it pretty strange for the rule to say we can now drop the LicenseRef-Fedora-Public-Domain to go to just record:
License: MITwhen 99% of the files are still LicenseRef-Fedora-Public-Domain and only 1 single file were MIT.
IMHO the package should be changed to say
License: LicenseRef-Fedora-Public-Domain and MITIOW, I think we should always be recording the license, even if it is a public domain LicenseRef term.
100% agree
We might want to extend this principle to other things, such as GPL exceptions that entail no conditions in the use case encountered in particular packages. (There is already an old issue about this, I think concerning the Bison exception.)
Personally I like the way we're not recording the existance of each license and exception, just not the creation of the combinatorial expansion of each license choice.
This wouldn't do *that* much to make License: fields simpler, so maybe it's not particularly worthwhile. There is also the problem that if we make it optional, package maintainers may be less likely to scrutinize things that are assumed to fall into these kinds of categories, when in some cases they actually wouldn't, although I think it's now clear that those situations are uncommon. In theory we'd still expect package maintainers to submit issues to have things that seem to qualify for LicenseRef-Fedora-Public-Domain reviewed, but it might be challenging to enforce that expectation and the Fedora Legal team would have to end up doing all that work themselves, which might be a justifiable result.
As with abandoning the "license of the binary" rule, this would seemingly be a major departure from the principles established under the Callaway system.
Any thoughts on this?
With regards, Daniel
On Thu, Aug 31, 2023 at 7:13 PM Jilayne Lovejoy jlovejoy@redhat.com wrote:
License: Apache-2.0 AND BSD-2-Clause AND BSD-3-Clause AND BSD-3-Clause-Clear AND CDDL-1.0 AND copyleft-next-0.3.1 AND GPL-1.0-or-later AND GPL-1.0-or-later-WITH-Linux-syscall-note AND GPL-2.0-only AND GPL-2.0-only-WITH-Linux-syscall-note AND GPL-2.0-or-later AND GPL-2.0-or-later-WITH-GCC-exception-2.0 AND GPL-2.0-or-later-WITH-Linux-syscall-note AND ISC AND LGPL-2.0-or-later AND LGPL-2.0-or-later-WITH-Linux-syscall-note AND LGPL-2.1-only AND LGPL-2.1-only-WITH-Linux-syscall-note AND LGPL-2.1-or-later AND LGPL-2.1-or-later-WITH-Linux-syscall-note AND Linux-OpenIB AND MIT AND MPL-1.1 AND Redistributable, no modification permitted AND X11 AND Zlib
but then this would be an exception to our original policy? and how would we articulate that? I'm not sure why this is really any "better" than your original - it's just shorter and truncated.
oh, and we should take a look at the "Redistributable, no modification permitted" ones... that is likely the firmware licenses that were never captured
In the kernel specifically, I think 'Redistributable, no modification permitted' resulted from a bugzilla ticket filed many years ago by Alexandre Oliva where he pointed out that 'GPLv2' as the kernel license tag was incorrect because there was some firmware hex code in at least one source file (this was some years after the creation of the linux-firmware repository upstream). I remember verifying that he was correct but I wonder if that code is still in the kernel today.
Richard
On Thu, Aug 31, 2023 at 08:42:19PM -0400, Richard Fontana wrote:
On Thu, Aug 31, 2023 at 7:13 PM Jilayne Lovejoy jlovejoy@redhat.com wrote:
License: Apache-2.0 AND BSD-2-Clause AND BSD-3-Clause AND BSD-3-Clause-Clear AND CDDL-1.0 AND copyleft-next-0.3.1 AND GPL-1.0-or-later AND GPL-1.0-or-later-WITH-Linux-syscall-note AND GPL-2.0-only AND GPL-2.0-only-WITH-Linux-syscall-note AND GPL-2.0-or-later AND GPL-2.0-or-later-WITH-GCC-exception-2.0 AND GPL-2.0-or-later-WITH-Linux-syscall-note AND ISC AND LGPL-2.0-or-later AND LGPL-2.0-or-later-WITH-Linux-syscall-note AND LGPL-2.1-only AND LGPL-2.1-only-WITH-Linux-syscall-note AND LGPL-2.1-or-later AND LGPL-2.1-or-later-WITH-Linux-syscall-note AND Linux-OpenIB AND MIT AND MPL-1.1 AND Redistributable, no modification permitted AND X11 AND Zlib
but then this would be an exception to our original policy? and how would we articulate that? I'm not sure why this is really any "better" than your original - it's just shorter and truncated.
oh, and we should take a look at the "Redistributable, no modification permitted" ones... that is likely the firmware licenses that were never captured
In the kernel specifically, I think 'Redistributable, no modification permitted' resulted from a bugzilla ticket filed many years ago by Alexandre Oliva where he pointed out that 'GPLv2' as the kernel license tag was incorrect because there was some firmware hex code in at least one source file (this was some years after the creation of the linux-firmware repository upstream). I remember verifying that he was correct but I wonder if that code is still in the kernel today.
Some kernel folks have examined this in more detail[1] and thus far can't find any currently built files that justify the continued existance of the additional 'Redistributable, no modification permitted' clause, so they're intending to drop it.
If someone does later identify files that stil justify this, the SPDX license expression can be adjusted at that time.
With regards, Daniel
[1] https://gitlab.com/cki-project/kernel-ark/-/merge_requests/2648#note_1538895...
On Thu, Aug 31, 2023 at 05:13:10PM -0600, Jilayne Lovejoy wrote:
On 8/31/23 2:39 AM, Daniel P. Berrangé wrote:
On Thu, Aug 24, 2023 at 02:52:21PM -0400, Richard Fontana wrote:
Some of the complaints that have surfaced since the migration from the Callaway system to SPDX seem to be, at root, an aesthetic distaste for complex license expressions in RPM license metadata. This may explain why some favor application of "effective license" analysis. I suspect there is also a sort of psychological desire to hide the underlying licensing complexity that characterizes many packages.
Lets take the proposed change to the kernel spec:
https://gitlab.com/cki-project/kernel-ark/-/merge_requests/2648/diffs#b49eec...
as an example of "complex license expressions" for which there is likely an aesthetic distaste. Each distinct SPDX-License-Identifier tag expession, is combined such that we end up with:
While the majority of files in the kernel are "GPL-2.0-only", a number of files are offered under a choice of licenses (OR). Even if 99% of files were simply GPL-2.0-only, it only takes a handful of files being offered under a choice, to result in an enourmous SPDX expression like the one above. In the above example, at a bare minimum it would only take 30 files, out of the kernel's 80,000 to have distinct licence choices to cause the existance the above expression.
That's an interesting point, but I'm not sure how we could justify some kind of an exception in such a case
I'm not suggesting an exception for the kernel, as this also applies to other large projects to some degress. Rather I'm suggestnig a change to the Fedora License: field guidelines to say that the expression should be simplied by not including "OR" clauses, only "AND", such that we don't repeat the same SPDX identifier over & over again with different combinatorial expansions.
While this is an accurate reflection of the range of distinct file license choices, I'm not convinced that this approach is especially beneficial to Fedora users.
well, it's not really just about Fedora users - besides the benefit downstream, I think there is some benefit to what Fedora is doing in a broader, example-setting, ecosystem sense. I guess part of this feeling comes from my thinking that any desire or attempt to obscure the license complexity is not a good thing and potentially creates more work or issues - reflecting the reality, to me, sets a good precedent
I'd say that the License field is inherantly "obscuring" the complexity because it is trying to condense the reality of disparate licensing across 10's of 1000's of files into a single line of text. I don't feel my proposal to simplify the SPDX expressions is a major difference in that respect.
If not "obscuring the license complexity" is the benchmark, then I think our solution is very weak compared to Debian's approach of having the "copyright" file report the distinct licenses along with information about what files they found each license in.
What purpose does it serve to list "MPL-1.1 OR GPL-2.0-only" and "MIT OR LGPL-2.1-only", etc if only perhaps < 1% of files carry this choice and we're not telling the user which 1% of files it applies to ?
they can run a license scanner and create an SPDX document that shows the file level license info to determine this. And that report will be far more complex and lengthy than what you came up with above ;) In that way, what you have above is a useful "summary" and accurate reflection of the big picture
If we're pointing people in the direction of using a license scanner when they want the gory details, then I'd say that it is even more compelling to drop the combinatorial expansion of SPDX identifiers, and simply have a flat list of SPDX identifiers found.
The previous effective license analysis addressed this problem, such that everything reduced down to "GPLv2 and Redistributable" I don't want to suggest going back to effective analysis as I think that was overly simplified, but perhaps we can finese what we're doing today.
ie tather than trying to maintain the full list of choices, can we eliminate all the OR clauses, such that we present just a flat list of each distinct SPDX license name that is found. IOW, the above kernel SPDX expression would be
License: Apache-2.0 AND BSD-2-Clause AND BSD-3-Clause AND BSD-3-Clause-Clear AND CDDL-1.0 AND copyleft-next-0.3.1 AND GPL-1.0-or-later AND GPL-1.0-or-later-WITH-Linux-syscall-note AND GPL-2.0-only AND GPL-2.0-only-WITH-Linux-syscall-note AND GPL-2.0-or-later AND GPL-2.0-or-later-WITH-GCC-exception-2.0 AND GPL-2.0-or-later-WITH-Linux-syscall-note AND ISC AND LGPL-2.0-or-later AND LGPL-2.0-or-later-WITH-Linux-syscall-note AND LGPL-2.1-only AND LGPL-2.1-only-WITH-Linux-syscall-note AND LGPL-2.1-or-later AND LGPL-2.1-or-later-WITH-Linux-syscall-note AND Linux-OpenIB AND MIT AND MPL-1.1 AND Redistributable, no modification permitted AND X11 AND Zlib
but then this would be an exception to our original policy? and how would we articulate that? I'm not sure why this is really any "better" than your original - it's just shorter and truncated.
I wouldn't call it an exception, I'd say this should be the Fedora standard.
Shorter is indeed better because it is far easier for humans to read when it is more concise, especially so because once the expression is flattened, the SPDX identifiers could be alphabetically orered as in this example above. IMHO, there's a world of diference in readability between this shorter example, and the current kernel proposed SPDX expression.
oh, and we should take a look at the "Redistributable, no modification permitted" ones... that is likely the firmware licenses that were never captured
Yes, that looks dubious - I added a comment on the latest kernel MR saying that this needs replacing by actual SPDX identifiers for whatever files it was supposed to apply to.
With regards, Daniel
Dne 01. 09. 23 v 9:54 Daniel P. Berrangé napsal(a):
Shorter is indeed better because it is far easier for humans to read when it is more concise, especially so because once the expression is flattened, the SPDX identifiers could be alphabetically orered as in this example above.
Do we want to have it easier for humans or for machines?
I think that we now target more machines (especially because of the "exchange" part from SPDX abbrev). If you want more human friendly way you can feed the formula to some library and get abbreviated form. E.g., you can use `simplify()` from
https://github.com/nexB/license-expression#usage-examples
It is packaged in Fedora.
And I agree that once we migrate to SPDX we should always present to **users** the simplified formula. But in sources (in spec) we should track the full formula.
On Mon, Sep 4, 2023 at 10:03 AM Miroslav Suchý msuchy@redhat.com wrote:
Dne 01. 09. 23 v 9:54 Daniel P. Berrangé napsal(a):
Shorter is indeed better because it is far easier for humans to read when it is more concise, especially so because once the expression is flattened, the SPDX identifiers could be alphabetically orered as in this example above.
Do we want to have it easier for humans or for machines?
I think that we now target more machines (especially because of the "exchange" part from SPDX abbrev). If you want more human friendly way you can feed the formula to some library and get abbreviated form. E.g., you can use `simplify()` from
https://github.com/nexB/license-expression#usage-examples
It is packaged in Fedora.
And I agree that once we migrate to SPDX we should always present to **users** the simplified formula. But in sources (in spec) we should track the full formula.
I'm not sure if that `simplify()` does anything interesting - seems like it either just applies commutativity and associativity or else it may attempt to make further algebraic reductions that are questionable as equivalent expressions. I believe Mulhern made the related point in another thread that algebraic simplifications may not be possible (and I think the underlying point is that these are not true boolean expressions).
Richard
Dne 01. 09. 23 v 1:13 Jilayne Lovejoy napsal(a):
On 8/31/23 2:39 AM, Daniel P. Berrangé wrote:
On Thu, Aug 24, 2023 at 02:52:21PM -0400, Richard Fontana wrote:
Some of the complaints that have surfaced since the migration from the Callaway system to SPDX seem to be, at root, an aesthetic distaste for complex license expressions in RPM license metadata. This may explain why some favor application of "effective license" analysis. I suspect there is also a sort of psychological desire to hide the underlying licensing complexity that characterizes many packages.
Lets take the proposed change to the kernel spec:
https://gitlab.com/cki-project/kernel-ark/-/merge_requests/2648/diffs#b49eec...
as an example of "complex license expressions" for which there is likely an aesthetic distaste. Each distinct SPDX-License-Identifier tag expession, is combined such that we end up with:
License: ((GPL-2.0-only WITH Linux-syscall-note) OR BSD-2-Clause) AND ((GPL-2.0-only WITH Linux-syscall-note) OR BSD-3-Clause) AND ((GPL-2.0-only WITH Linux-syscall-note) OR CDDL-1.0) AND ((GPL-2.0-only WITH Linux-syscall-note) OR Linux-OpenIB) AND ((GPL-2.0-only WITH Linux-syscall-note) OR MIT) AND ((GPL-2.0-or-later WITH Linux-syscall-note) OR BSD-3-Clause) AND ((GPL-2.0-or-later WITH Linux-syscall-note) OR MIT) AND BSD-2-Clause AND BSD-3-Clause AND BSD-3-Clause-Clear AND GPL-1.0-or-later AND (GPL-1.0-or-later OR BSD-3-Clause) AND (GPL-1.0-or-later WITH Linux-syscall-note) AND GPL-2.0-only AND (GPL-2.0-only OR Apache-2.0) AND (GPL-2.0-only OR BSD-2-Clause) AND (GPL-2.0-only OR BSD-3-Clause) AND (GPL-2.0-only OR CDDL-1.0) AND (GPL-2.0-only OR Linux-OpenIB) AND (GPL-2.0-only OR MIT) AND (GPL-2.0-only OR X11) AND (GPL-2.0-only WITH Linux-syscall-note) AND GPL-2.0-or-later AND (GPL-2.0-or-later OR BSD-2-Clause) AND (GPL-2.0-or-later OR BSD-3-Clause) AND (GPL-2.0-or-later OR MIT) AND (GPL-2.0-or-later WITH GCC-exception-2.0) AND (GPL-2.0-or-later WITH Linux-syscall-note) AND ISC AND LGPL-2.0-or-later AND (LGPL-2.0-or-later OR BSD-2-Clause) AND (LGPL-2.0-or-later WITH Linux-syscall-note) AND LGPL-2.1-only AND (LGPL-2.1-only OR BSD-2-Clause) AND (LGPL-2.1-only WITH Linux-syscall-note) AND LGPL-2.1-or-later AND (LGPL-2.1-or-later WITH Linux-syscall-note) AND (Linux-OpenIB OR GPL-2.0-only) AND (Linux-OpenIB OR GPL-2.0-only OR BSD-2-Clause) AND MIT AND (MIT OR Apache-2.0) AND (MIT OR GPL-2.0-only) AND (MIT OR GPL-2.0-or-later) AND (MIT OR LGPL-2.1-only) AND (MPL-1.1 OR GPL-2.0-only) AND (X11 OR GPL-2.0-only) AND (X11 OR GPL-2.0-or-later) AND Zlib AND (copyleft-next-0.3.1 OR GPL-2.0-or-later) AND (Redistributable, no modification permitted)
Given that the kernel is a very large package with many files and it has adopted SPDX ids at the file level (which means the licensing info is far more complete and easier to parse :) - there is nothing surprising to me about the length of this string. It is what it is!
While the majority of files in the kernel are "GPL-2.0-only", a number of files are offered under a choice of licenses (OR). Even if 99% of files were simply GPL-2.0-only, it only takes a handful of files being offered under a choice, to result in an enourmous SPDX expression like the one above. In the above example, at a bare minimum it would only take 30 files, out of the kernel's 80,000 to have distinct licence choices to cause the existance the above expression.
That's an interesting point, but I'm not sure how we could justify some kind of an exception in such a case
Can we attach percentage to each license? E.g. "Kernel is from 99,9625 % GPL-2.0-only license"
I am proposing this mostly as a joke. Nevertheless, it is interesting information IMHO.
¯_(ツ)_/¯
Vít
While this is an accurate reflection of the range of distinct file license choices, I'm not convinced that this approach is especially beneficial to Fedora users.
well, it's not really just about Fedora users - besides the benefit downstream, I think there is some benefit to what Fedora is doing in a broader, example-setting, ecosystem sense. I guess part of this feeling comes from my thinking that any desire or attempt to obscure the license complexity is not a good thing and potentially creates more work or issues - reflecting the reality, to me, sets a good precedent
What purpose does it serve to list "MPL-1.1 OR GPL-2.0-only" and "MIT OR LGPL-2.1-only", etc if only perhaps < 1% of files carry this choice and we're not telling the user which 1% of files it applies to ?
they can run a license scanner and create an SPDX document that shows the file level license info to determine this. And that report will be far more complex and lengthy than what you came up with above ;) In that way, what you have above is a useful "summary" and accurate reflection of the big picture
The previous effective license analysis addressed this problem, such that everything reduced down to "GPLv2 and Redistributable" I don't want to suggest going back to effective analysis as I think that was overly simplified, but perhaps we can finese what we're doing today.
ie tather than trying to maintain the full list of choices, can we eliminate all the OR clauses, such that we present just a flat list of each distinct SPDX license name that is found. IOW, the above kernel SPDX expression would be
License: Apache-2.0 AND BSD-2-Clause AND BSD-3-Clause AND BSD-3-Clause-Clear AND CDDL-1.0 AND copyleft-next-0.3.1 AND GPL-1.0-or-later AND GPL-1.0-or-later-WITH-Linux-syscall-note AND GPL-2.0-only AND GPL-2.0-only-WITH-Linux-syscall-note AND GPL-2.0-or-later AND GPL-2.0-or-later-WITH-GCC-exception-2.0 AND GPL-2.0-or-later-WITH-Linux-syscall-note AND ISC AND LGPL-2.0-or-later AND LGPL-2.0-or-later-WITH-Linux-syscall-note AND LGPL-2.1-only AND LGPL-2.1-only-WITH-Linux-syscall-note AND LGPL-2.1-or-later AND LGPL-2.1-or-later-WITH-Linux-syscall-note AND Linux-OpenIB AND MIT AND MPL-1.1 AND Redistributable, no modification permitted AND X11 AND Zlib
but then this would be an exception to our original policy? and how would we articulate that? I'm not sure why this is really any "better" than your original - it's just shorter and truncated.
oh, and we should take a look at the "Redistributable, no modification permitted" ones... that is likely the firmware licenses that were never captured
I do think that the current approach can be criticized as being overly pedantic, and perhaps also internally contradictory (some of Florian's recent comments get at the various ways in which we are being contradictory). We have a still-undocumented rule that what I call "true public domain" should not be reflected in the License: field (unless it would otherwise be empty), yet we have carefully attempted to collect nonstandard public domain dedication statements and cover those by `LicenseRef-Fedora-Public-Domain`. We have been using a similar approach with `LicenseRef-Fedora-UltraPermissive`. These basically replace Callaway system names "Public domain" (though this was sometimes used for "true public domain") and "Freely redistributable without restrictions", respectively.
I think it can reasonably be argued that there is little point in including `LicenseRef-Fedora-Public-Domain` and `LicenseRef-Fedora-UltraPermissive` in the License: field since they are associated with no conditions or obligations. In those special cases where the License: field would otherwise be empty, we can ask SPDX to create unique identifiers for the license text in question.
I think there is value in LicenseRef-Fedora-Public-Domain, etc because it expresses the fact that license analysis has actually been performed and these public domain choices have been correctly identified. I don't like the need to special case the omission to avoid an entirely empty License: field. If we have a need to record LicenseRef-Fedora-Public-Domain in any scenario, we should be consistent.
eg consider a package is 100% public domain initially so we have to record that to avoid empty field:
License: LicenseRef-Fedora-Public-Domain
then one day a file is added which is MIT. I would find it pretty strange for the rule to say we can now drop the LicenseRef-Fedora-Public-Domain to go to just record:
License: MIT
when 99% of the files are still LicenseRef-Fedora-Public-Domain and only 1 single file were MIT.
IMHO the package should be changed to say
License: LicenseRef-Fedora-Public-Domain and MIT
IOW, I think we should always be recording the license, even if it is a public domain LicenseRef term.
100% agree
We might want to extend this principle to other things, such as GPL exceptions that entail no conditions in the use case encountered in particular packages. (There is already an old issue about this, I think concerning the Bison exception.)
Personally I like the way we're not recording the existance of each license and exception, just not the creation of the combinatorial expansion of each license choice.
This wouldn't do *that* much to make License: fields simpler, so maybe it's not particularly worthwhile. There is also the problem that if we make it optional, package maintainers may be less likely to scrutinize things that are assumed to fall into these kinds of categories, when in some cases they actually wouldn't, although I think it's now clear that those situations are uncommon. In theory we'd still expect package maintainers to submit issues to have things that seem to qualify for LicenseRef-Fedora-Public-Domain reviewed, but it might be challenging to enforce that expectation and the Fedora Legal team would have to end up doing all that work themselves, which might be a justifiable result.
As with abandoning the "license of the binary" rule, this would seemingly be a major departure from the principles established under the Callaway system.
Any thoughts on this?
With regards, Daniel
legal mailing list -- legal@lists.fedoraproject.org To unsubscribe send an email to legal-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/legal@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
On Fri, Sep 1, 2023 at 8:09 AM Vít Ondruch vondruch@redhat.com wrote:
Can we attach percentage to each license? E.g. "Kernel is from 99,9625 % GPL-2.0-only license"
I am proposing this mostly as a joke. Nevertheless, it is interesting information IMHO.
This sounds similar to an idea I had a few years ago, when I was somewhat more enthusiastic about general use of ScanCode as a tool. I think ScanCode can give you results of licenses identified in source code in a percentage ranking, so my idea was "just list the top few detected licenses". The idea had the problem of being scanner-dependent (or else associated with inconsistent approaches based on the scanner used) and also the mere fact that a license shows up a lot in detections doesn't necessarily mean it *covers* content in the package to the extent that would suggest (i.e., some results would be pretty misleading, seemingly).
It also conflicts with what I think of as a "truth in labeling" principle which may be something that should guide us here, to some degree. It is not uncommon for a package to have a small portion of code covered by a license that is in some sense problematic or unexpected in a way that is disproportionate to how often it appears. To use the example of the kernel, there's the presence of Clear BSD (SPDX: BSD-3-Clause-Clear) on some source files. Arguably there is a value in exposing that fact, especially for those of us who don't consider that to be an open source license. But truth in labelling doesn't mean "list everything in precise detail" necessarily.
Richard
On Fri, 1 Sept 2023 at 15:53, Richard Fontana rfontana@redhat.com wrote:
On Fri, Sep 1, 2023 at 8:09 AM Vít Ondruch vondruch@redhat.com wrote:
Can we attach percentage to each license? E.g. "Kernel is from 99,9625 % GPL-2.0-only license"
I am proposing this mostly as a joke. Nevertheless, it is interesting information IMHO.
This sounds similar to an idea I had a few years ago, when I was somewhat more enthusiastic about general use of ScanCode as a tool. I think ScanCode can give you results of licenses identified in source code in a percentage ranking, so my idea was "just list the top few detected licenses". The idea had the problem of being scanner-dependent (or else associated with inconsistent approaches based on the scanner used) and also the mere fact that a license shows up a lot in detections doesn't necessarily mean it *covers* content in the package to the extent that would suggest (i.e., some results would be pretty misleading, seemingly).
It also conflicts with what I think of as a "truth in labeling" principle which may be something that should guide us here, to some degree. It is not uncommon for a package to have a small portion of code covered by a license that is in some sense problematic or unexpected in a way that is disproportionate to how often it appears. To use the example of the kernel, there's the presence of Clear BSD (SPDX: BSD-3-Clause-Clear) on some source files. Arguably there is a value in exposing that fact, especially for those of us who don't consider that to be an open source license. But truth in labelling doesn't mean "list everything in precise detail" necessarily.
Apologies if this has been discussed before, but why not something like the debian/copyright file? The License tag could be just a list of all licenses found, without AND or OR, to avoid the combinatorial issue, and then a copyright file could exactly list what applies to what.
On Fri, Sep 1, 2023 at 11:56 AM Iñaki Ucar iucar@fedoraproject.org wrote:
Apologies if this has been discussed before, but why not something like the debian/copyright file? The License tag could be just a list of all licenses found, without AND or OR, to avoid the combinatorial issue, and then a copyright file could exactly list what applies to what.
It has basically not been discussed before. I think if we were starting over from scratch I would probably suggest something like the Debian approach. That also has the advantage of solving the problem of what standards to adopt around license file inclusion (which currently haven't been touched in our post-July-2022 documentation and which really don't make much sense, at least in relation to the rest of the current Fedora legal guidelines).
My past assumption has been that it would be culturally too hard for Fedora to copy the Debian approach (although it occurs to me that Fedora could just reuse a lot of the Debian package data with limited modification). Maybe that's wrong? Maybe the current guidelines involve the same sort of effort that Debian package maintainers engage in.
One reason I don't like the "without AND or OR" approach is it is basically abandoning the idea of using per-RPM or per-package SPDX expressions, while retaining the idea that we should be using SPDX expressions at a more atomic level. But if we adopted the Debian approach I don't see why we should need to continue populating the License: field at all (not sure how Debian deals with this, offhand).
Richard
On Fri, 1 Sept 2023 at 18:12, Richard Fontana rfontana@redhat.com wrote:
On Fri, Sep 1, 2023 at 11:56 AM Iñaki Ucar iucar@fedoraproject.org wrote:
Apologies if this has been discussed before, but why not something like the debian/copyright file? The License tag could be just a list of all licenses found, without AND or OR, to avoid the combinatorial issue, and then a copyright file could exactly list what applies to what.
It has basically not been discussed before. I think if we were starting over from scratch I would probably suggest something like the Debian approach. That also has the advantage of solving the problem of what standards to adopt around license file inclusion (which currently haven't been touched in our post-July-2022 documentation and which really don't make much sense, at least in relation to the rest of the current Fedora legal guidelines).
My past assumption has been that it would be culturally too hard for Fedora to copy the Debian approach (although it occurs to me that Fedora could just reuse a lot of the Debian package data with limited modification). Maybe that's wrong? Maybe the current guidelines involve the same sort of effort that Debian package maintainers engage in.
We already have a tool that analyzes all the files and reports the licenses found. This could generate an initial template for the packager to revise. I don't think it would be much more work than making sense of the License tag as is now. It would be easier in many cases.
One reason I don't like the "without AND or OR" approach is it is basically abandoning the idea of using per-RPM or per-package SPDX expressions, while retaining the idea that we should be using SPDX expressions at a more atomic level. But if we adopted the Debian approach I don't see why we should need to continue populating the License: field at all (not sure how Debian deals with this, offhand).
In such case, I see the License tag as a good "effective license" approach that is complementary to the detailed specification. The "98% GPLv2" you were talking about. It could be "License: GPL-2.0-only + COPYRIGHTS file" (i.e. "see the details" file).
On 24. 08. 23 20:52, Richard Fontana wrote:
Some of the complaints that have surfaced since the migration from the Callaway system to SPDX seem to be, at root, an aesthetic distaste for complex license expressions in RPM license metadata. This may explain why some favor application of "effective license" analysis. I suspect there is also a sort of psychological desire to hide the underlying licensing complexity that characterizes many packages.
I do think that the current approach can be criticized as being overly pedantic, and perhaps also internally contradictory (some of Florian's recent comments get at the various ways in which we are being contradictory). We have a still-undocumented rule that what I call "true public domain" should not be reflected in the License: field (unless it would otherwise be empty), yet we have carefully attempted to collect nonstandard public domain dedication statements and cover those by `LicenseRef-Fedora-Public-Domain`. We have been using a similar approach with `LicenseRef-Fedora-UltraPermissive`. These basically replace Callaway system names "Public domain" (though this was sometimes used for "true public domain") and "Freely redistributable without restrictions", respectively.
I think it can reasonably be argued that there is little point in including `LicenseRef-Fedora-Public-Domain` and `LicenseRef-Fedora-UltraPermissive` in the License: field since they are associated with no conditions or obligations. In those special cases where the License: field would otherwise be empty, we can ask SPDX to create unique identifiers for the license text in question.
We might want to extend this principle to other things, such as GPL exceptions that entail no conditions in the use case encountered in particular packages. (There is already an old issue about this, I think concerning the Bison exception.)
This wouldn't do *that* much to make License: fields simpler, so maybe it's not particularly worthwhile. There is also the problem that if we make it optional, package maintainers may be less likely to scrutinize things that are assumed to fall into these kinds of categories, when in some cases they actually wouldn't, although I think it's now clear that those situations are uncommon. In theory we'd still expect package maintainers to submit issues to have things that seem to qualify for LicenseRef-Fedora-Public-Domain reviewed, but it might be challenging to enforce that expectation and the Fedora Legal team would have to end up doing all that work themselves, which might be a justifiable result.
As with abandoning the "license of the binary" rule, this would seemingly be a major departure from the principles established under the Callaway system.
Any thoughts on this?
Hi. Considering the idea behind all the public domain licenses is that anyone can take the code and include it anywhere without any strings attached, we can now easily end up in two different situations:
Situation 1: A project under MIT takes a a public domain function (file, module, whatever) and copies it into the MIT project. Since there is no obligation to mention where this function comes from, they don't do that.
Situation 2: A project under MIT takes a a public domain function (file, module, whatever) and copies it into the MIT project. For good measure, they annotate the function with "a public domain function from X by Y".
Should those two situations be handled differently in the License tag? Previously, we would simply say "MIT" in both situations, but the current no-effective-license-analysis rule kinda makes the second situation more complex for the packager. Being able to omit this woudl certainly makes things a bit easier.