I think Richard said that he would start a thread like this, but it hasn't happened, so I feel like should get this off my chest now.
https://docs.fedoraproject.org/en-US/legal/license-field/#_no_effective_license_analysis starts with this:
| No “effective license” analysis | | The License: field is meant to provide a simple enumeration of the | licenses found in the source code that are reflected in the binary | package. No further analysis should be done regarding what the | "effective" license is, such as analysis based on theories of GPL | interpretation or license compatibility or suppositions that | “top-level” license files somehow negate different licenses appearing | on individual source files.
This is contradictory. I think there are two aspects here:
* Determine possible licenses that end up in the binary package.
* Perform algebraic simplifications on the license list.
Both analyses are forms of effective licensing analysis. Of course, you cannot derive an SPDX identifier without doing any analysis. However, I strongly believe that the first approach (determining the binary package license) is itself a form of effective licensing analysis, and similar reasons for package maintainers not doing this applies. The derived SPDX identifier will reflect both the package source code and what went into the build system.
Below, I'm collecting a list of observations of what I believe is the current approach in this area, as taken by package maintainers carrying out the SPDX conversion. To me, it strongly suggest that the SPDX identifiers we derive today do not accurately reflect binary RPM package licensing, even when lots of package maintainers put in the extra effort to determine binary package licenses.
* Most package maintainers probably assume that License: tags on all built RPMs (source RPMs and binary RPMs) should reflect binary package contents, at least when all subpackages are considered in aggregate. Often, Source RPMs contain the same License: line as binary RPMs.
* No algebraic simplifications on License: lines are performed.
* All forms of dynamic linking are ignored for License: tags. This covers ELF (e.g., C, C++), but also Python, Java, and other languages with late binding.
* C/C++ header file contents is ignored for License: tags, regardless of header file complexity (e.g., substantial code in templates or inline functions is not treated specially).
* Statically linked GCC and glibc startup code is ignored and does not show up in License: lines. The license of glibc startup code isn't even in SPDX yet, so it's not just Fedora who is ignoring this.
* Statically linked libgcc support code is ignored (e.g., outline atomics on aarch64, FMV support code on x86-64). This code comes with the compiler, but is compiled from C sources that ship with the compiler. These items overlap with the startup code, but licensing could theoretically be different.
* Some shared objects come with statically linked support code. I doubt that many package maintainers are aware of that, so they effectively ignore the licensing impact of that. It's structurally similar to inline functions and templates in header files.
* Output from source code generations such as autoconf, bison and flex is often (but not always) ignored, in some cases even if the generated code ships in the source RPM and is compiled as-is, without regeneration. (autoconf can generate more than just build scripts.)
* Licenses of crate build-dependencies end up in License: tags of RPM packages. This is a form of static linking analysis for which we have tooling, and it is mandated by the guidelines. It only covers the Rust part, other gaps for filling out License: are still there. (I don't know if the generated License: tags are accurate for individual subpackages; it seems unlikely.) Go might have something similar.
* Sometimes we ignore upstream SPDX identifiers if we believe them to be incorrect, but that approach is not consistent, as far as I know.
* Apparently, there seems to be some confusion whether AND or OR is the right separator for SPDX tags in License: lines.
* Some package maintainers, when translating to SPDX, merely translate the existing License: line as best as they can, without looking at the actual sources or produced binaries.
I looked around a bit and there are no documented product requirements internally, so I don't think we can justify investing in tooling or training to improve data quality. (I'll keep digging, though.)
In the light of this, I would like to suggest updating the guidelines in the following way:
The License: line should be based on the sources only. Using a tool such as Fossology to discover relevant licenses and their SPDX tags is sufficient. No analysis how licenses from package source code or the build environment propagate into binary RPMs should be performed. Individual SPDX identifiers that a tool has listed should be separated by AND. Package maintainers are encouraged to re-run license analysis tooling on the source code as part of major package rebases, and update the License: tag accordingly.
To me, that seems to be much more manageable.
Thoughts?
Thanks, Florian
On Mon, Aug 21, 2023 at 01:04:29PM +0200, Florian Weimer wrote:
I think Richard said that he would start a thread like this, but it hasn't happened, so I feel like should get this off my chest now.
https://docs.fedoraproject.org/en-US/legal/license-field/#_no_effective_license_analysis starts with this:
| No “effective license” analysis | | The License: field is meant to provide a simple enumeration of the | licenses found in the source code that are reflected in the binary | package. No further analysis should be done regarding what the | "effective" license is, such as analysis based on theories of GPL | interpretation or license compatibility or suppositions that | “top-level” license files somehow negate different licenses appearing | on individual source files.
This is contradictory. I think there are two aspects here:
Determine possible licenses that end up in the binary package.
Perform algebraic simplifications on the license list.
Both analyses are forms of effective licensing analysis. Of course, you cannot derive an SPDX identifier without doing any analysis. However, I strongly believe that the first approach (determining the binary package license) is itself a form of effective licensing analysis, and similar reasons for package maintainers not doing this applies. The derived SPDX identifier will reflect both the package source code and what went into the build system.
It could perhaps be worded better, but I don't see this as contradictory, it is just a matter of what you consider "effective analysis" to refer to. The last sentance expands on this to say that 'effective' in this context is refering to the analysis of license compatibility that Fedora previously recommended.
The analysis maintainers are being asked to do today is not about interpreting licensing. They "merely" being asked to determine what source files are containing code that becomes part of the resulting binary RPM. This is more build system analysis than license analysis, and distinct from what Fedora would traditionally describe as "effective license analysis".
- Some package maintainers, when translating to SPDX, merely translate the existing License: line as best as they can, without looking at the actual sources or produced binaries.
This I think is probably the main flaw in the process we asked our maintainers to follow.
At a high level we portrayed the whole exercise as merely a terminology change, but it was not.
Given the removal of the effective license analysis requirement, that was / is an over simplification.
Strictly speaking I think the exercise ought to have been portrayed as more of a license (re-)audit. In the general case maintainers ought to be redoing the license audit part of the new package review process, for all existing packages, not blindly converting existing terminology.
In the light of this, I would like to suggest updating the guidelines in the following way:
The License: line should be based on the sources only. Using a tool such as Fossology to discover relevant licenses and their SPDX tags is sufficient. No analysis how licenses from package source code or the build environment propagate into binary RPMs should be performed. Individual SPDX identifiers that a tool has listed should be separated by AND. Package maintainers are encouraged to re-run license analysis tooling on the source code as part of major package rebases, and update the License: tag accordingly.
To me, that seems to be much more manageable.
What I'm not a fan on with this approach is that it would cause us to include licenses that are clearly irrelevant for Fedora binary packages. If we consider the "license" tag to be something for end users to look at, I think this will be misleading.
For example in one package I reviewed there is kernel code that is only built on Solaris which is under the CDDL. Including that in the Fedora binary RPM license feels totally wrong.
In many packages using autotools there are snippets of m4 code that are under a variety of licenses, again not affecting the output. Those would "bloat" the license tag for little obvious gain.
I do agree though that doing *perfect* build system analysis to figure out what source files become part of the binary RPMs is impractical for any non-trivial packages.
My approach has been to scan the source for licenses, and then look at source files with any licenses I was surprised to see. Often it is possible to exclude these unexpected licenses, because they are obviously part of the build system, or are obviously for a differnt OS platform.
I would describe this as trying to meet the spirit of the having the RPM license reflect binary content, while acknowledging the reality that maintainers won't fully analyse the build system as it is too time consuming & impractical.
I might suggest adding an extra sentance to make it more explicit that the binary RPM license is not a perfect representation of the binary content, as may sometimes include extra licenses from source files that were not relevant. This would reflect the somewhat pragmatic approach that I think maintainers already take in practice
| The License: field is meant to provide a simple enumeration of the | licenses found in the source code that are reflected in the binary | package. In may also include additional licenses for files that are | not part of the binary where it is impractical to filter them out | during license review. No further analysis should be done regarding | what the "effective" license is, such as analysis based on theories | of GPL interpretation or license compatibility or suppositions that | “top-level” license files somehow negate different licenses appearing | on individual source files.
With regards, Daniel
* Daniel P. Berrangé:
On Mon, Aug 21, 2023 at 01:04:29PM +0200, Florian Weimer wrote:
I think Richard said that he would start a thread like this, but it hasn't happened, so I feel like should get this off my chest now.
https://docs.fedoraproject.org/en-US/legal/license-field/#_no_effective_license_analysis starts with this:
| No “effective license” analysis | | The License: field is meant to provide a simple enumeration of the | licenses found in the source code that are reflected in the binary | package. No further analysis should be done regarding what the | "effective" license is, such as analysis based on theories of GPL | interpretation or license compatibility or suppositions that | “top-level” license files somehow negate different licenses appearing | on individual source files.
This is contradictory. I think there are two aspects here:
Determine possible licenses that end up in the binary package.
Perform algebraic simplifications on the license list.
Both analyses are forms of effective licensing analysis. Of course, you cannot derive an SPDX identifier without doing any analysis. However, I strongly believe that the first approach (determining the binary package license) is itself a form of effective licensing analysis, and similar reasons for package maintainers not doing this applies. The derived SPDX identifier will reflect both the package source code and what went into the build system.
It could perhaps be worded better, but I don't see this as contradictory, it is just a matter of what you consider "effective analysis" to refer to. The last sentance expands on this to say that 'effective' in this context is refering to the analysis of license compatibility that Fedora previously recommended.
I think it goes beyond terminology. I think determining the binary RPM licenses has similar complexities than the license algebra. I can't imagine consensus emerging around that. There's just no firm reasoning why we ignore header files and dynamic linking in the License: tag, the glibc startup code, but not static linking in general. I think coming up with a consistent rules is even more complicated than some sort of license algebra, or rules for ignoring certain copyright files. So I think the perceived simplification of the rules fell short, and the present rules are still unworkable.
The analysis maintainers are being asked to do today is not about interpreting licensing. They "merely" being asked to determine what source files are containing code that becomes part of the resulting binary RPM. This is more build system analysis than license analysis, and distinct from what Fedora would traditionally describe as "effective license analysis".
But that's extremely subjective until we have a consistent set of rules, preferably accompanied by training and tooling for automated license propagation according to the rules we set forth (similar to what we have today for Rust, but for C/C++ and other languages that use a mix of static and dynamic linking).
I just don't think we can come up with a consistent set of rules accepted by the wider industry how a build process transforms the source code licenses and the licenses of the build environment into the binary output licenses. Until then, we are basically in the same spot as we were when there was some expectation to perform effective source license analysis.
For example, we have fairly strong evidence that the industry as a whole believes that the license of the statically linked glibc startup code can be ignored. Why is that so?
- Some package maintainers, when translating to SPDX, merely translate the existing License: line as best as they can, without looking at the actual sources or produced binaries.
This I think is probably the main flaw in the process we asked our maintainers to follow.
At a high level we portrayed the whole exercise as merely a terminology change, but it was not.
Given the removal of the effective license analysis requirement, that was / is an over simplification.
Well, I don't agree with this characterization. We are required to do determine binary RPM licenses, which still requires substantial license impact analysis. And we don't have guidelines for that.
Strictly speaking I think the exercise ought to have been portrayed as more of a license (re-)audit. In the general case maintainers ought to be redoing the license audit part of the new package review process, for all existing packages, not blindly converting existing terminology.
That's how my management and immediate colleagues have interpreted it, and how I looked at it as well, and it has enormous cost because even for core GNU packages, no one seems to have taken such a close look before. Maybe that's because there is traditionally little overlap between SPDX users and GPL users, but I can't really believe these groups a totally separate.
In the light of this, I would like to suggest updating the guidelines in the following way:
The License: line should be based on the sources only. Using a tool such as Fossology to discover relevant licenses and their SPDX tags is sufficient. No analysis how licenses from package source code or the build environment propagate into binary RPMs should be performed. Individual SPDX identifiers that a tool has listed should be separated by AND. Package maintainers are encouraged to re-run license analysis tooling on the source code as part of major package rebases, and update the License: tag accordingly.
To me, that seems to be much more manageable.
What I'm not a fan on with this approach is that it would cause us to include licenses that are clearly irrelevant for Fedora binary packages. If we consider the "license" tag to be something for end users to look at, I think this will be misleading.
We can come up with something that looks at the state of the tree after %prep, or something like that.
The problem with dropping stuff arbitrarily is that it makes it again impossible to rely on tooling.
For example in one package I reviewed there is kernel code that is only built on Solaris which is under the CDDL. Including that in the Fedora binary RPM license feels totally wrong.
I disagree. Upstream may have copied code from the CDDL part of the tree to other parts without updating the license. If we ignore the CDDL license, we say that hasn't happened, and I doubt we are in the position to make such a certification for most packages.
Of course someone may have copied code from a Stackoverflow answer (which is generally available under incompatible license terms), and we wouldn't know about that either. But suppressing license information actually present in the source package (although in a supposedly unused location) seems different.
In many packages using autotools there are snippets of m4 code that are under a variety of licenses, again not affecting the output. Those would "bloat" the license tag for little obvious gain.
We decided we had to include it because the m4 code generates config.h, which is included in the build as if it was a source file. Perhaps we can ignore that because of the general rule that licensing of header files does not matter. But that rule isn't part of the guidelines, even though it is a key part of what makes binary RPM license analysis workable (otherwise you'd end up with a lot of noise from system headers, leading to the problem you noted).
I do agree though that doing *perfect* build system analysis to figure out what source files become part of the binary RPMs is impractical for any non-trivial packages.
It's not impractical, it's just rather costly (training and tooling).
My approach has been to scan the source for licenses, and then look at source files with any licenses I was surprised to see. Often it is possible to exclude these unexpected licenses, because they are obviously part of the build system, or are obviously for a differnt OS platform.
I would describe this as trying to meet the spirit of the having the RPM license reflect binary content, while acknowledging the reality that maintainers won't fully analyse the build system as it is too time consuming & impractical.
That's not unreasonable.
I might suggest adding an extra sentance to make it more explicit that the binary RPM license is not a perfect representation of the binary content, as may sometimes include extra licenses from source files that were not relevant. This would reflect the somewhat pragmatic approach that I think maintainers already take in practice
I would welcome that. And update the Rust guidelines accordingly, to clarift that the kind of buildroot-to-binary-RPM propagation that the tooling performs is optional and not required by (the spirit of) the Fedora guidlines.
Thanks, Florian
On Mon, Aug 21, 2023 at 04:25:22PM +0200, Florian Weimer wrote:
- Daniel P. Berrangé:
On Mon, Aug 21, 2023 at 01:04:29PM +0200, Florian Weimer wrote:
I think Richard said that he would start a thread like this, but it hasn't happened, so I feel like should get this off my chest now.
https://docs.fedoraproject.org/en-US/legal/license-field/#_no_effective_license_analysis starts with this:
| No “effective license” analysis | | The License: field is meant to provide a simple enumeration of the | licenses found in the source code that are reflected in the binary | package. No further analysis should be done regarding what the | "effective" license is, such as analysis based on theories of GPL | interpretation or license compatibility or suppositions that | “top-level” license files somehow negate different licenses appearing | on individual source files.
This is contradictory. I think there are two aspects here:
Determine possible licenses that end up in the binary package.
Perform algebraic simplifications on the license list.
Both analyses are forms of effective licensing analysis. Of course, you cannot derive an SPDX identifier without doing any analysis. However, I strongly believe that the first approach (determining the binary package license) is itself a form of effective licensing analysis, and similar reasons for package maintainers not doing this applies. The derived SPDX identifier will reflect both the package source code and what went into the build system.
It could perhaps be worded better, but I don't see this as contradictory, it is just a matter of what you consider "effective analysis" to refer to. The last sentance expands on this to say that 'effective' in this context is refering to the analysis of license compatibility that Fedora previously recommended.
I think it goes beyond terminology. I think determining the binary RPM licenses has similar complexities than the license algebra. I can't imagine consensus emerging around that. There's just no firm reasoning why we ignore header files and dynamic linking in the License: tag, the glibc startup code, but not static linking in general. I think coming up with a consistent rules is even more complicated than some sort of license algebra, or rules for ignoring certain copyright files. So I think the perceived simplification of the rules fell short, and the present rules are still unworkable.
WRT header file / glibc startup / static linking licenses being ignored, the rationale I would express is that those pieces must (by implication) all already be license compatible (in some way) with the package consuming. This is admittedly though another case of "effective license" doctrine, albeit an implicit one, rather than explicit by the maintainer / package reviewer.
What I'm not a fan on with this approach is that it would cause us to include licenses that are clearly irrelevant for Fedora binary packages. If we consider the "license" tag to be something for end users to look at, I think this will be misleading.
We can come up with something that looks at the state of the tree after %prep, or something like that.
The problem with dropping stuff arbitrarily is that it makes it again impossible to rely on tooling.
IMHO no matter what we do, the value of the License field is rather limited for semantic interpretation by automated tooling, because it is reducing a very complexity situation down to a very crude expression.
It is notable that both Debian "copyright" file format and the REUSE format both provide a massively more granular expression of package licensing, targetted at machine processing.
Although our new SPDX expressions are better for machine readability than in the past, we should be explicit about the limitations of our data and problems with attempting todo any semantic analysis based off it.
For example in one package I reviewed there is kernel code that is only built on Solaris which is under the CDDL. Including that in the Fedora binary RPM license feels totally wrong.
I disagree. Upstream may have copied code from the CDDL part of the tree to other parts without updating the license. If we ignore the CDDL license, we say that hasn't happened, and I doubt we are in the position to make such a certification for most packages.
Of course someone may have copied code from a Stackoverflow answer (which is generally available under incompatible license terms), and we wouldn't know about that either. But suppressing license information actually present in the source package (although in a supposedly unused location) seems different.
I don't think it is different. Both are a case of garbage-in == garbage-out.
If upstream copied CDDL code into a file and didn't record this in the file's stated license, then that's a problem whether the original CDDL code is part of the same project or from stack overflow. In both cases upstream made a mistake and failed to record accurate license info in the source file.
We're not making any judgement or statement about the accuracy of upstream's licensing record. We're summarizing what upstream has presented in its source files and taking that on faith (unless someone happens to notice some blatent inaccuracy).
This feels like a case where we should better document what our input assumptions are with License tag data.
Debian copyright files and REUSE data will suffer the same limitation as they're both promoting a view that license information is trackable and analysable per file, so if upstream fails to record a license the copyright/REUSE files will similarly be inaccurate.
I might suggest adding an extra sentance to make it more explicit that the binary RPM license is not a perfect representation of the binary content, as may sometimes include extra licenses from source files that were not relevant. This would reflect the somewhat pragmatic approach that I think maintainers already take in practice
I would welcome that. And update the Rust guidelines accordingly, to clarift that the kind of buildroot-to-binary-RPM propagation that the tooling performs is optional and not required by (the spirit of) the Fedora guidlines.
I agree with your general point that we've not adequately documented many of the assumptions / simplications that maintainers will / should take when analysing license data in source files. Probably the various scenarios you've illustrated should be answered in some way in the licensing pages.
With regards, Daniel
On Mon, Aug 21, 2023 at 9:30 AM Daniel P. Berrangé berrange@redhat.com wrote:
On Mon, Aug 21, 2023 at 01:04:29PM +0200, Florian Weimer wrote:
I think Richard said that he would start a thread like this, but it hasn't happened, so I feel like should get this off my chest now.
<
https://docs.fedoraproject.org/en-US/legal/license-field/#_no_effective_lice...
starts with this:
| No “effective license” analysis | | The License: field is meant to provide a simple enumeration of the | licenses found in the source code that are reflected in the binary | package. No further analysis should be done regarding what the | "effective" license is, such as analysis based on theories of GPL | interpretation or license compatibility or suppositions that | “top-level” license files somehow negate different licenses appearing | on individual source files.
This is contradictory. I think there are two aspects here:
Determine possible licenses that end up in the binary package.
Perform algebraic simplifications on the license list.
Both analyses are forms of effective licensing analysis. Of course, you cannot derive an SPDX identifier without doing any analysis. However, I strongly believe that the first approach (determining the binary package license) is itself a form of effective licensing analysis, and similar reasons for package maintainers not doing this applies. The derived SPDX identifier will reflect both the package source code and what went into the build system.
It could perhaps be worded better, but I don't see this as contradictory, it is just a matter of what you consider "effective analysis" to refer to. The last sentance expands on this to say that 'effective' in this context is refering to the analysis of license compatibility that Fedora previously recommended.
The 'license compatibility' aspect is what is meant by the process here. Previously Fedora carried a large license compatibility chart and there were rules (both written and passed down through oral tradition) about what is and is not compatible with the different GPL licenses. A common case was a GPL project incorporating BSD licensed code which meant the entire collective work was GPL licensed _per FSF guidelines_. This is the part that Fedora Legal wants to do away with. Package maintainers do not need to perform this compatibility analysis.
So yes, it could probably be worded better.
The analysis maintainers are being asked to do today is not about interpreting licensing. They "merely" being asked to determine what source files are containing code that becomes part of the resulting binary RPM. This is more build system analysis than license analysis, and distinct from what Fedora would traditionally describe as "effective license analysis".
Correct.
- Some package maintainers, when translating to SPDX, merely translate the existing License: line as best as they can, without looking at the actual sources or produced binaries.
This I think is probably the main flaw in the process we asked our maintainers to follow.
At a high level we portrayed the whole exercise as merely a terminology change, but it was not.
Given the removal of the effective license analysis requirement, that was / is an over simplification.
Strictly speaking I think the exercise ought to have been portrayed as more of a license (re-)audit. In the general case maintainers ought to be redoing the license audit part of the new package review process, for all existing packages, not blindly converting existing terminology.
We did portray this as a re-audit and not simply a change in abbreviations. I did many presentations on just that and we held numerous hack fests where we helped people analyze packages to determine the correct license expression in SPDX.
There are definitely maintainers who thought it was just a change in abbreviations and I do not think there is an easy way to stop that. But as we have been going through this process, the number of maintainers auditing packages has been high and we are seeing more licenses captured and added to SPDX that were not previously represented.
What I would like to see in the future is a tool connected to Bodhi that can run for each build of a package and perform a license analysis, build the License tag string, then compare it to what is in the spec file. If it's different then alert the package maintainer and say "Hey, I'm a script and I analyzed the licenses in this package and I *think* I found something different than what you put in the spec file License tag. If I were you, I would check it out because one of us is probably wrong." Maybe we'll have that at some point.
In the light of this, I would like to suggest updating the guidelines in the following way:
The License: line should be based on the sources only. Using a tool such as Fossology to discover relevant licenses and their SPDX tags is sufficient. No analysis how licenses from package source code or the build environment propagate into binary RPMs should be performed. Individual SPDX identifiers that a tool has listed should be separated by AND. Package maintainers are encouraged to re-run license analysis tooling on the source code as part of major package rebases, and update the License: tag accordingly.
To me, that seems to be much more manageable.
What I'm not a fan on with this approach is that it would cause us to include licenses that are clearly irrelevant for Fedora binary packages. If we consider the "license" tag to be something for end users to look at, I think this will be misleading.
For example in one package I reviewed there is kernel code that is only built on Solaris which is under the CDDL. Including that in the Fedora binary RPM license feels totally wrong.
In many packages using autotools there are snippets of m4 code that are under a variety of licenses, again not affecting the output. Those would "bloat" the license tag for little obvious gain.
I do agree though that doing *perfect* build system analysis to figure out what source files become part of the binary RPMs is impractical for any non-trivial packages.
My approach has been to scan the source for licenses, and then look at source files with any licenses I was surprised to see. Often it is possible to exclude these unexpected licenses, because they are obviously part of the build system, or are obviously for a differnt OS platform.
I agree here. To me, the source archive already includes its licensing information. What Fedora does is build this source in a curated way for distribution, so the licenses that apply to our build do not necessarily align with other distributions. This is the licensing information that I think is relevant for Fedora to understand and convey to users. There will always be additional licenses in the source archives (given your examples above such as autotools or optional code that is not enabled at build time), but the source archives include all of that licensing information. If they didn't, we would not be able to build anything from that source to include in Fedora.
I would describe this as trying to meet the spirit of the having the RPM license reflect binary content, while acknowledging the reality that maintainers won't fully analyse the build system as it is too time consuming & impractical.
I might suggest adding an extra sentance to make it more explicit that the binary RPM license is not a perfect representation of the binary content, as may sometimes include extra licenses from source files that were not relevant. This would reflect the somewhat pragmatic approach that I think maintainers already take in practice
One thing I have learned through this whole SPDX project is that being a convergence of something technical and something not makes it very difficult to arrive at what we think is an acceptable completion. I think what we have now in Fedora is pretty good compared to other distributions, but it can always improve. We just keep making improvements and keep refining the process. Shaving the yak, as it were.
Thanks,
| The License: field is meant to provide a simple enumeration of the | licenses found in the source code that are reflected in the binary | package. In may also include additional licenses for files that are | not part of the binary where it is impractical to filter them out | during license review. No further analysis should be done regarding | what the "effective" license is, such as analysis based on theories | of GPL interpretation or license compatibility or suppositions that | “top-level” license files somehow negate different licenses appearing | on individual source files.
With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| _______________________________________________ legal mailing list -- legal@lists.fedoraproject.org To unsubscribe send an email to legal-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/legal@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
On 8/22/23 15:40, David Cantrell wrote:
On Mon, Aug 21, 2023 at 9:30 AM Daniel P. Berrangé <berrange@redhat.com mailto:berrange@redhat.com> wrote:
On Mon, Aug 21, 2023 at 01:04:29PM +0200, Florian Weimer wrote: > I think Richard said that he would start a thread like this, but it > hasn't happened, so I feel like should get this off my chest now. > > <https://docs.fedoraproject.org/en-US/legal/license-field/#_no_effective_license_analysis <https://docs.fedoraproject.org/en-US/legal/license-field/#_no_effective_license_analysis>> > starts with this: > > | No “effective license” analysis > | > | The License: field is meant to provide a simple enumeration of the > | licenses found in the source code that are reflected in the binary > | package. No further analysis should be done regarding what the > | "effective" license is, such as analysis based on theories of GPL > | interpretation or license compatibility or suppositions that > | “top-level” license files somehow negate different licenses appearing > | on individual source files. > > This is contradictory. I think there are two aspects here: > > * Determine possible licenses that end up in the binary package. > > * Perform algebraic simplifications on the license list. > > Both analyses are forms of effective licensing analysis. Of course, you > cannot derive an SPDX identifier without doing any analysis. However, I > strongly believe that the first approach (determining the binary package > license) is itself a form of effective licensing analysis, and similar > reasons for package maintainers not doing this applies. The derived > SPDX identifier will reflect both the package source code and what went > into the build system. It could perhaps be worded better, but I don't see this as contradictory, it is just a matter of what you consider "effective analysis" to refer to. The last sentance expands on this to say that 'effective' in this context is refering to the analysis of license compatibility that Fedora previously recommended.The 'license compatibility' aspect is what is meant by the process here. Previously Fedora carried a large license compatibility chart and there were rules (both written and passed down through oral tradition) about what is and is not compatible with the different GPL licenses. A common case was a GPL project incorporating BSD licensed code which meant the entire collective work was GPL licensed _per FSF guidelines_. This is the part that Fedora Legal wants to do away with. Package maintainers do not need to perform this compatibility analysis.
So yes, it could probably be worded better.
The analysis maintainers are being asked to do today is not about interpreting licensing. They "merely" being asked to determine what source files are containing code that becomes part of the resulting binary RPM. This is more build system analysis than license analysis, and distinct from what Fedora would traditionally describe as "effective license analysis".Correct.
> * Some package maintainers, when translating to SPDX, merely translate > the existing License: line as best as they can, without looking at the > actual sources or produced binaries. This I think is probably the main flaw in the process we asked our maintainers to follow. At a high level we portrayed the whole exercise as merely a terminology change, but it was not. Given the removal of the effective license analysis requirement, that was / is an over simplification. Strictly speaking I think the exercise ought to have been portrayed as more of a license (re-)audit. In the general case maintainers ought to be redoing the license audit part of the new package review process, for all existing packages, not blindly converting existing terminology.We did portray this as a re-audit and not simply a change in abbreviations. I did many presentations on just that and we held numerous hack fests where we helped people analyze packages to determine the correct license expression in SPDX.
There are definitely maintainers who thought it was just a change in abbreviations and I do not think there is an easy way to stop that. But as we have been going through this process, the number of maintainers auditing packages has been high and we are seeing more licenses captured and added to SPDX that were not previously represented.
What I would like to see in the future is a tool connected to Bodhi that can run for each build of a package and perform a license analysis, build the License tag string, then compare it to what is in the spec file. If it's different then alert the package maintainer and say "Hey, I'm a script and I analyzed the licenses in this package and I *think* I found something different than what you put in the spec file License tag. If I were you, I would check it out because one of us is probably wrong." Maybe we'll have that at some point.
> In the light of this, I would like to suggest updating the guidelines in > the following way: > > The License: line should be based on the sources only. Using a tool > such as Fossology to discover relevant licenses and their SPDX tags is > sufficient. No analysis how licenses from package source code or the > build environment propagate into binary RPMs should be performed. > Individual SPDX identifiers that a tool has listed should be separated > by AND. Package maintainers are encouraged to re-run license analysis > tooling on the source code as part of major package rebases, and > update the License: tag accordingly. > > To me, that seems to be much more manageable. What I'm not a fan on with this approach is that it would cause us to include licenses that are clearly irrelevant for Fedora binary packages. If we consider the "license" tag to be something for end users to look at, I think this will be misleading. For example in one package I reviewed there is kernel code that is only built on Solaris which is under the CDDL. Including that in the Fedora binary RPM license feels totally wrong. In many packages using autotools there are snippets of m4 code that are under a variety of licenses, again not affecting the output. Those would "bloat" the license tag for little obvious gain. I do agree though that doing *perfect* build system analysis to figure out what source files become part of the binary RPMs is impractical for any non-trivial packages. My approach has been to scan the source for licenses, and then look at source files with any licenses I was surprised to see. Often it is possible to exclude these unexpected licenses, because they are obviously part of the build system, or are obviously for a differnt OS platform.I agree here. To me, the source archive already includes its licensing information. What Fedora does is build this source in a curated way for distribution, so the licenses that apply to our build do not necessarily align with other distributions. This is the licensing information that I think is relevant for Fedora to understand and convey to users. There will always be additional licenses in the source archives (given your examples above such as autotools or optional code that is not enabled at build time), but the source archives include all of that licensing information. If they didn't, we would not be able to build anything from that source to include in Fedora.
I would describe this as trying to meet the spirit of the having the RPM license reflect binary content, while acknowledging the reality that maintainers won't fully analyse the build system as it is too time consuming & impractical. I might suggest adding an extra sentance to make it more explicit that the binary RPM license is not a perfect representation of the binary content, as may sometimes include extra licenses from source files that were not relevant. This would reflect the somewhat pragmatic approach that I think maintainers already take in practiceOne thing I have learned through this whole SPDX project is that being a convergence of something technical and something not makes it very difficult to arrive at what we think is an acceptable completion. I think what we have now in Fedora is pretty good compared to other distributions, but it can always improve. We just keep making improvements and keep refining the process. Shaving the yak, as it were.
Thanks,
| The License: field is meant to provide a simple enumeration of the | licenses found in the source code that are reflected in the binary | package. In may also include additional licenses for files that are | not part of the binary where it is impractical to filter them out | during license review. No further analysis should be done regarding | what the "effective" license is, such as analysis based on theories | of GPL interpretation or license compatibility or suppositions that | “top-level” license files somehow negate different licenses appearing | on individual source files. With regards, Daniel -- |: https://berrange.com <https://berrange.com> -o- https://www.flickr.com/photos/dberrange <https://www.flickr.com/photos/dberrange> :| |: https://libvirt.org <https://libvirt.org> -o- https://fstop138.berrange.com <https://fstop138.berrange.com> :| |: https://entangle-photo.org <https://entangle-photo.org> -o- https://www.instagram.com/dberrange <https://www.instagram.com/dberrange> :|
Hello,
Regarding this, I am testing a new tool for license analysis.
I took a random package from an SPDX PR in my Inbox.
https://src.fedoraproject.org/rpms/90-Second-Portraits/pull-request/1
Surprise, surprise, we have non free code, this is just amazing!
Analysis:
OFL-1.1 --- 90-Second-Portraits-1.01b-16.fc38.src.rpm-extract/90-Second-Portraits-1.01b.zip-extract/data/fonts/neuton.ttf
CC-BY-SA-4.0 AND CC-BY-3.0 AND Zlib --- 90-Second-Portraits-1.01b-16.fc38.src.rpm-extract/90-Second-Portraits-1.01b.zip-extract/LICENSE.txt
LicenseRef-proprietary-license --- 90-Second-Portraits-1.01b-16.fc38.src.rpm-extract/90-Second-Portraits-1.01b.zip-extract/data/fonts/yb.ttf
After research: https://www.dafont.com/young-beautiful.font
Note of the author: My fonts are free for PERSONAL use only. For any commercial use (anything you make money from), you must send a paypal donation.
Please visit my website http://mistifonts.com/ to see my affordable prices.
And on that site we have:
Terms Of Use
This font is free for personal use and non-profit use. If you make money from using this font, you must purchase a license. You may not trace or edit my fonts and then resell them as your own creation. For example: Taking my font, tracing over the letters or editing some of the letters, and then selling that font.
Please download, install and test the font before purchasing a license to ensure that it works for your intended use.
You can read license Terms of Use >>here<<
=> NON FREE (AS IN SPEECH, NOT BEER)
MIT --- 90-Second-Portraits-1.01b-16.fc38.src.rpm-extract/90-Second-Portraits-1.01b.zip-extract/middleclass/CHANGELOG.md 90-Second-Portraits-1.01b-16.fc38.src.rpm-extract/90-Second-Portraits-1.01b.zip-extract/middleclass/middleclass.lua 90-Second-Portraits-1.01b-16.fc38.src.rpm-extract/90-Second-Portraits-1.01b.zip-extract/middleclass/MIT-LICENSE.txt 90-Second-Portraits-1.01b-16.fc38.src.rpm-extract/90-Second-Portraits-1.01b.zip-extract/middleclass/README.md 90-Second-Portraits-1.01b-16.fc38.src.rpm-extract/90-Second-Portraits-1.01b.zip-extract/middleclass/rockspecs/middleclass-3.0-0.rockspec
X11 --- 90-Second-Portraits-1.01b-16.fc38.src.rpm-extract/90-Second-Portraits-1.01b.zip-extract/slam.lua 90-Second-Portraits-1.01b-16.fc38.src.rpm-extract/90-Second-Portraits-1.01b.zip-extract/hump/camera.lua 90-Second-Portraits-1.01b-16.fc38.src.rpm-extract/90-Second-Portraits-1.01b.zip-extract/hump/class.lua 90-Second-Portraits-1.01b-16.fc38.src.rpm-extract/90-Second-Portraits-1.01b.zip-extract/hump/gamestate.lua 90-Second-Portraits-1.01b-16.fc38.src.rpm-extract/90-Second-Portraits-1.01b.zip-extract/hump/README.md 90-Second-Portraits-1.01b-16.fc38.src.rpm-extract/90-Second-Portraits-1.01b.zip-extract/hump/signal.lua 90-Second-Portraits-1.01b-16.fc38.src.rpm-extract/90-Second-Portraits-1.01b.zip-extract/hump/timer.lua 90-Second-Portraits-1.01b-16.fc38.src.rpm-extract/90-Second-Portraits-1.01b.zip-extract/hump/vector-light.lua 90-Second-Portraits-1.01b-16.fc38.src.rpm-extract/90-Second-Portraits-1.01b.zip-extract/hump/vector.lua
So the Licence field should be:
# Zlib: Main package # CC-BY-SA-4.0: assets by Tangram Games # CC-BY-3.0: data/music/monkeys.ogg # OFL-1.1: data/fonts/neuton.ttf # MIT: middleclass/ # X11: slam.lua and hump/ License: Zlib AND CC-BY-SA-4.0 AND CC-BY-3.0 AND OFL-1.1 AND MIT AND X11
And then we have a non-free font data/fonts/yb.ttf that should be removed/replaced.
On Wed, Aug 23, 2023 at 4:00 PM Robert-André Mauchin zebob.m@gmail.com wrote:
https://src.fedoraproject.org/rpms/90-Second-Portraits/pull-request/1
Surprise, surprise, we have non free code, this is just amazing!
Analysis:
OFL-1.1
90-Second-Portraits-1.01b-16.fc38.src.rpm-extract/90-Second-Portraits-1.01b.zip-extract/data/fonts/neuton.ttf
CC-BY-SA-4.0 AND CC-BY-3.0 AND Zlib
90-Second-Portraits-1.01b-16.fc38.src.rpm-extract/90-Second-Portraits-1.01b.zip-extract/LICENSE.txt
LicenseRef-proprietary-license
90-Second-Portraits-1.01b-16.fc38.src.rpm-extract/90-Second-Portraits-1.01b.zip-extract/data/fonts/yb.ttf
After research: https://www.dafont.com/young-beautiful.font
Note of the author: My fonts are free for PERSONAL use only. For any commercial use (anything you make money from), you must send a paypal donation.
Please visit my website http://mistifonts.com/ to see my affordable prices.
And on that site we have:
Terms Of Use
This font is free for personal use and non-profit use. If you make money from using this font, you must purchase a license. You may not trace or edit my fonts and then resell them as your own creation. For example: Taking my font, tracing over the letters or editing some of the letters, and then selling that font.
Please download, install and test the font before purchasing a license to ensure that it works for your intended use.
You can read license Terms of Use >>here<<
=> NON FREE (AS IN SPEECH, NOT BEER)
Thanks, I agree and also confirmed using 'strings' on the data/yb.ttf file.
I submitted this ticket: https://bugzilla.redhat.com/show_bug.cgi?id=2234478
I think the atari.ttf font file may also be problematic.
Richard
Yeah, I also noticed another one of my packages has a possible problematic font, but it's unclear to me: https://src.fedoraproject.org/rpms/safetyblanket https://github.com/SimonLarsen/safetyblanket/blob/master/res/fonts/notalot35...
For the yb.ttf, I just replaced it with another font included in the source that's OFL in the %prep section. I assume this is sufficient?
Any guidance is much appreciated.
This tool is quite useful though, as sometimes these things are easy to miss.
On Thu, Aug 24, 2023 at 12:57 PM Jeremy Newton mystro256@fedoraproject.org wrote:
Yeah, I also noticed another one of my packages has a possible problematic font, but it's unclear to me: https://src.fedoraproject.org/rpms/safetyblanket https://github.com/SimonLarsen/safetyblanket/blob/master/res/fonts/notalot35...
For the yb.ttf, I just replaced it with another font included in the source that's OFL in the %prep section. I assume this is sufficient?
I think so.
However, it's possible the atari.tff file is problematic too. It was not clear to me how this was licensed even after locating a likely origin for the file.
Richard
On 23-08-2023 21:59, Robert-André Mauchin wrote:
Regarding this, I am testing a new tool for license analysis.
With "new tool" are you referring to Fossology as mentioned Florian's initial message?
I took a random package from an SPDX PR in my Inbox.
https://src.fedoraproject.org/rpms/90-Second-Portraits/pull-request/1
Surprise, surprise, we have non free code, this is just amazing!
Surprise, surprise, indeed. I submitted that PR as part of the SPDX workshop during Flock given by Tom "spot" Callaway and The Right Honourable Miroslav Suchý [1].
Seeing what the tool unearthed, I'm wondering if this should be made part of fedora-review. The current license check does not detect all the licenses nor the problematic font the new tool does. The license check is what I usually rely on doing reviews including my own package submissions.
[1] https://www.youtube.com/live/Hjhe6jtx3Zw?feature=shared&t=8657&start...
-- Sandro
On Tue, Aug 22, 2023 at 9:41 AM David Cantrell dcantrell@redhat.com wrote:
We did portray this as a re-audit and not simply a change in abbreviations. I did many presentations on just that and we held numerous hack fests where we helped people analyze packages to determine the correct license expression in SPDX.
There are definitely maintainers who thought it was just a change in abbreviations and I do not think there is an easy way to stop that. But as we have been going through this process, the number of maintainers auditing packages has been high and we are seeing more licenses captured and added to SPDX that were not previously represented.
+1. Whatever else one thinks of the changes over the past year, this 'license re-audit' side effect has been a great ongong success and improvement for Fedora and its downstreams, probably without parallel in Linux distributions (apart from Debian), and also has resulted in a vast improvement to SPDX.
Richard
I'm concerned about the assumption that algebraic simplifications on license expressions will "work". I believe that this is a thing that needs to be proved. I've spent some time trying to prove it, and I think I may have, but I am not done yet. Consider that every rule, exp_1 -> exp_2, for transforming a Boolean expression must obey the following meta-rule: For every valuation of the Boolean variables occurring in exp_1 and exp_2, the value of exp_1 must be equal to the value of exp_2. That is what justifies every rule. It is not actually clear how that idea extends to license expressions and hence not clear if the AND and OR operators in license expressions can be treated identically to those for Boolean expressions when transforming the license expressions.
V Mon, Aug 21, 2023 at 01:04:29PM +0200, Florian Weimer napsal(a):
- Most package maintainers probably assume that License: tags on all built RPMs (source RPMs and binary RPMs) should reflect binary package contents, at least when all subpackages are considered in aggregate. Often, Source RPMs contain the same License: line as binary RPMs.
That's a shortcomming of RPM. It reuses License tag of the main subpackage for source RPM. Current RPM has a new SourceLicesne tag https://rpm-software-management.github.io/rpm/manual/spec.html#sourcelicense to for those willing to declare a distinct license of the sources.
- All forms of dynamic linking are ignored for License: tags. This covers ELF (e.g., C, C++), but also Python, Java, and other languages with late binding.
I guess you realy mean a code which is dynamically linked into a process at run-time. E.g. a library linked to an application. Not listing a license of that library in an RPM package of the application makes perfect sense for me: The packaged application does not contain a code of the library when the application is distributed in a form of the binary RPM package. Even after installing the package, the application still does not contain the library code. It's when the application is executed and a dynamic linker links in the library code. At that point the application's license changes.
C/C++ header file contents is ignored for License: tags, regardless of header file complexity (e.g., substantial code in templates or inline functions is not treated specially).
Statically linked GCC and glibc startup code is ignored and does not show up in License: lines. The license of glibc startup code isn't even in SPDX yet, so it's not just Fedora who is ignoring this.
Statically linked libgcc support code is ignored (e.g., outline atomics on aarch64, FMV support code on x86-64). This code comes with the compiler, but is compiled from C sources that ship with the compiler. These items overlap with the startup code, but licensing could theoretically be different.
Some shared objects come with statically linked support code. I doubt that many package maintainers are aware of that, so they effectively ignore the licensing impact of that. It's structurally similar to inline functions and templates in header files.
Output from source code generations such as autoconf, bison and flex is often (but not always) ignored, in some cases even if the generated code ships in the source RPM and is compiled as-is, without regeneration. (autoconf can generate more than just build scripts.)
There is no mean of retrieving the licenses of these third-party pieces of code at build time of my package. I do not want to hardcode their license and then worry about keeping them in sync.
Are you willing to define %glibc_startup_code_license macro delivered within glibc-devel so that I can use it in License of my packages?
In ideal world I'd rather see compilers, linkers, and other processors to process the license metadata automatically. The SPDX identifiers of input and output files could be stored in file extended attributes. rpmbuild would then simply collect these attributes and paste them into License tag.
It reminds a vision of reproducible builds. A similarly noble, yet unachievable goal.
- Sometimes we ignore upstream SPDX identifiers if we believe them to be incorrect, but that approach is not consistent, as far as I know.
Not only SPDX identifiers. I saw similar issues with license headers or even whole license texts. Many times upstreams declare "this file comes from..." and then forget to properly declare it's license or carry a copy of the license as mandated by an origin license of the file.
In the light of this, I would like to suggest updating the guidelines in the following way:
The License: line should be based on the sources only. Using a tool such as Fossology to discover relevant licenses and their SPDX tags is sufficient. No analysis how licenses from package source code or the build environment propagate into binary RPMs should be performed. Individual SPDX identifiers that a tool has listed should be separated by AND. Package maintainers are encouraged to re-run license analysis tooling on the source code as part of major package rebases, and update the License: tag accordingly.
To me, that seems to be much more manageable.
Yes, that's a very realistic approach of what we can expect from the packagers.
I only worry whether the resulting License tag will be any helpful for our users. E.g. most of the subpackages of perl.spec are "GPL-1.0-or-later OR Artistic-1.0-Perl". With your approach their license tag would gain a ridiculous license identifiers that are not really contained.
-- Petr
On Tue, Aug 22, 2023 at 3:34 AM Petr Pisar ppisar@redhat.com wrote:
V Mon, Aug 21, 2023 at 01:04:29PM +0200, Florian Weimer napsal(a):
- Most package maintainers probably assume that License: tags on all built RPMs (source RPMs and binary RPMs) should reflect binary package contents, at least when all subpackages are considered in aggregate. Often, Source RPMs contain the same License: line as binary RPMs.
That's a shortcomming of RPM. It reuses License tag of the main subpackage for source RPM.
Out of curiosity, is what subpackage is the "main" subpackage a well defined concept, or is it just "the builit package that is described first in the spec file" and beyond that a matter of convention? I couldn't find the answer to this in a few minutes of naive searching.
In the light of this, I would like to suggest updating the guidelines in the following way:
The License: line should be based on the sources only. Using a tool such as Fossology to discover relevant licenses and their SPDX tags is sufficient. No analysis how licenses from package source code or the build environment propagate into binary RPMs should be performed. Individual SPDX identifiers that a tool has listed should be separated by AND. Package maintainers are encouraged to re-run license analysis tooling on the source code as part of major package rebases, and update the License: tag accordingly.
To me, that seems to be much more manageable.
Yes, that's a very realistic approach of what we can expect from the packagers.
I only worry whether the resulting License tag will be any helpful for our users. E.g. most of the subpackages of perl.spec are "GPL-1.0-or-later OR Artistic-1.0-Perl". With your approach their license tag would gain a ridiculous license identifiers that are not really contained.
This could possibly be addressed by Daniel's idea of a "use your judgment to exclude what is clearly irrelevant" standard (as I would put it).
Richard
V Sun, Aug 27, 2023 at 12:30:01PM -0400, Richard Fontana napsal(a):
On Tue, Aug 22, 2023 at 3:34 AM Petr Pisar ppisar@redhat.com wrote:
V Mon, Aug 21, 2023 at 01:04:29PM +0200, Florian Weimer napsal(a):
- Most package maintainers probably assume that License: tags on all built RPMs (source RPMs and binary RPMs) should reflect binary package contents, at least when all subpackages are considered in aggregate. Often, Source RPMs contain the same License: line as binary RPMs.
That's a shortcomming of RPM. It reuses License tag of the main subpackage for source RPM.
Out of curiosity, is what subpackage is the "main" subpackage a well defined concept, or is it just "the builit package that is described first in the spec file" and beyond that a matter of convention? I couldn't find the answer to this in a few minutes of naive searching.
I don't think the "main" title is an offical RPM term. Probably that's why you were unable to find it. It's used at few places in Fedora packaging guidelines https://docs.fedoraproject.org/en-US/packaging-guidelines/.
Otherwise, the concept is well defined. At least empirically.
The "main" subpackage is a subpackage which is defined with Name spec tag. All subsequent subpackages can and are only defined with a %package macro. RPM requires the Name tag to exist and come before any %package macros. Hence, the main subpackage which shares a License value with a source package is usually and effectively the first subpackage.
I write usually, because there can be spec files which do not produce a binary package for the first subpackage. Then the only package where the Name tag (and its License tag, if overriden in other subpackages) manifests is the source package. We use these spec files without main binary subpackge rarely in EPEL.
Nevertheless, in short, the main subpackage is the first package defined in a spec file.
-- Petr
Dne 28. 08. 23 v 17:26 Petr Pisar napsal(a):
V Sun, Aug 27, 2023 at 12:30:01PM -0400, Richard Fontana napsal(a):
On Tue, Aug 22, 2023 at 3:34 AM Petr Pisar ppisar@redhat.com wrote:
V Mon, Aug 21, 2023 at 01:04:29PM +0200, Florian Weimer napsal(a):
- Most package maintainers probably assume that License: tags on all built RPMs (source RPMs and binary RPMs) should reflect binary package contents, at least when all subpackages are considered in aggregate. Often, Source RPMs contain the same License: line as binary RPMs.
That's a shortcomming of RPM. It reuses License tag of the main subpackage for source RPM.
Out of curiosity, is what subpackage is the "main" subpackage a well defined concept, or is it just "the builit package that is described first in the spec file" and beyond that a matter of convention? I couldn't find the answer to this in a few minutes of naive searching.
I don't think the "main" title is an offical RPM term. Probably that's why you were unable to find it. It's used at few places in Fedora packaging guidelines https://docs.fedoraproject.org/en-US/packaging-guidelines/.
Otherwise, the concept is well defined. At least empirically.
The "main" subpackage is a subpackage which is defined with Name spec tag. All subsequent subpackages can and are only defined with a %package macro. RPM requires the Name tag to exist and come before any %package macros. Hence, the main subpackage which shares a License value with a source package is usually and effectively the first subpackage.
I write usually, because there can be spec files which do not produce a binary package for the first subpackage. Then the only package where the Name tag (and its License tag, if overriden in other subpackages) manifests is the source package. We use these spec files without main binary subpackge rarely in EPEL.
Isn't it the default case for python packages? IOW I don't think it is that rare (although I am not sure why you mention EPEL instead of Fedora).
Vít
Nevertheless, in short, the main subpackage is the first package defined in a spec file.
-- Petr
legal mailing list -- legal@lists.fedoraproject.org To unsubscribe send an email to legal-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/legal@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
V Mon, Aug 28, 2023 at 06:39:55PM +0200, Vít Ondruch napsal(a):
Dne 28. 08. 23 v 17:26 Petr Pisar napsal(a):
I write usually, because there can be spec files which do not produce a binary package for the first subpackage. Then the only package where the Name tag (and its License tag, if overriden in other subpackages) manifests is the source package. We use these spec files without main binary subpackge rarely in EPEL.
Isn't it the default case for python packages?
It is. I completely forgot Python.
(although I am not sure why you mention EPEL instead of Fedora)
Because no better example (suppling -devel subpackages in EPEL for main main packages delivered by RHEL) came to my mind.
-- Petr
On Mon, Aug 28, 2023 at 6:40 PM Vít Ondruch vondruch@redhat.com wrote:
Dne 28. 08. 23 v 17:26 Petr Pisar napsal(a):
V Sun, Aug 27, 2023 at 12:30:01PM -0400, Richard Fontana napsal(a):
On Tue, Aug 22, 2023 at 3:34 AM Petr Pisar ppisar@redhat.com wrote:
V Mon, Aug 21, 2023 at 01:04:29PM +0200, Florian Weimer napsal(a):
- Most package maintainers probably assume that License: tags on all built RPMs (source RPMs and binary RPMs) should reflect binary package contents, at least when all subpackages are considered in aggregate. Often, Source RPMs contain the same License: line as binary RPMs.
That's a shortcomming of RPM. It reuses License tag of the main subpackage for source RPM.
Out of curiosity, is what subpackage is the "main" subpackage a well defined concept, or is it just "the builit package that is described first in the spec file" and beyond that a matter of convention? I couldn't find the answer to this in a few minutes of naive searching.
I don't think the "main" title is an offical RPM term. Probably that's why you were unable to find it. It's used at few places in Fedora packaging guidelines https://docs.fedoraproject.org/en-US/packaging-guidelines/.
Otherwise, the concept is well defined. At least empirically.
The "main" subpackage is a subpackage which is defined with Name spec tag. All subsequent subpackages can and are only defined with a %package macro. RPM requires the Name tag to exist and come before any %package macros. Hence, the main subpackage which shares a License value with a source package is usually and effectively the first subpackage.
I write usually, because there can be spec files which do not produce a binary package for the first subpackage. Then the only package where the Name tag (and its License tag, if overriden in other subpackages) manifests is the source package. We use these spec files without main binary subpackge rarely in EPEL.
Isn't it the default case for python packages? IOW I don't think it is that rare (although I am not sure why you mention EPEL instead of Fedora).
It's also the default case for all packages for Rust crates (about ~2200 packages in Fedora), and for most Go packages.
In the Rust case, the main package's name is rust-foo, but names of built binary packages are rust-foo-devel, rust-foo+bar-devel, and optionally, foo (but *not* rust-foo). This way, the license tag for the source package (i.e. the "main" package, the thing that doesn't really exist) applies to all these *-devel subpackages (which is what we want), and the "foo" package gets a separate license tag (to account for statically linked dependencies).
So yes, we rely on and adhere to the "License tag reflects binary package contents" rule.
Fabio
On Mon, Aug 28, 2023 at 3:25 PM Fabio Valentini decathorpe@gmail.com wrote:
So yes, we rely on and adhere to the "License tag reflects binary package contents" rule.
So you are reasonably happy with the current rules as they affect the License: field for Rust crate packages? Or is there anything you would like to see changed?
Richard
On Wed, Aug 30, 2023 at 12:29 AM Richard Fontana rfontana@redhat.com wrote:
On Mon, Aug 28, 2023 at 3:25 PM Fabio Valentini decathorpe@gmail.com wrote:
So yes, we rely on and adhere to the "License tag reflects binary package contents" rule.
So you are reasonably happy with the current rules as they affect the License: field for Rust crate packages? Or is there anything you would like to see changed?
I wouldn't say I'm happy, but at this point it's well established, and we've updated our tools to automate almost all parts of the process (with human review, of course).
The only thing that would make me happy would be saying that basic "license arithmetic" would be OK (i.e. allowing the simplification of "MIT AND (MIT OR Apache-2.0)" to just "MIT" (since "MIT" already implies "MIT OR Apache-2.0"), but I don't think this is going to happen ;)
Fabio
V Wed, Aug 30, 2023 at 06:53:41PM +0200, Fabio Valentini napsal(a):
simplification of "MIT AND (MIT OR Apache-2.0)" to just "MIT" (since "MIT" already implies "MIT OR Apache-2.0"
It does not imply that. Apache-2.0 license requires distributing Apache-2.0 license text, while MIT does not require distributing Apache-2.0 license text.
A practical implication: If you have a binary package with "License: MIT OR Apache-2.0", you have to include Apache-2.0 license file with %license macro. Omitting the file means you have breached Apache-2.0 license.
Another implication: You can reduce "MIT OR Apache-2.0" to "MIT". But such reduced package takes away a freedom from a recipient to choose the Apache-2.0 license.
-- Petr
On Mon, Aug 21, 2023 at 7:04 AM Florian Weimer fweimer@redhat.com wrote:
I think Richard said that he would start a thread like this, but it hasn't happened, so I feel like should get this off my chest now.
https://docs.fedoraproject.org/en-US/legal/license-field/#_no_effective_license_analysis starts with this:
| No “effective license” analysis | | The License: field is meant to provide a simple enumeration of the | licenses found in the source code that are reflected in the binary | package. No further analysis should be done regarding what the | "effective" license is, such as analysis based on theories of GPL | interpretation or license compatibility or suppositions that | “top-level” license files somehow negate different licenses appearing | on individual source files.
This is contradictory. I think there are two aspects here:
Determine possible licenses that end up in the binary package.
Perform algebraic simplifications on the license list.
Both analyses are forms of effective licensing analysis. Of course, you cannot derive an SPDX identifier without doing any analysis. However, I strongly believe that the first approach (determining the binary package license) is itself a form of effective licensing analysis, and similar reasons for package maintainers not doing this applies. The derived SPDX identifier will reflect both the package source code and what went into the build system.
We were using "effective license" somewhat more narrowly, referring to how that phrase was used in some of the legacy Fedora documentation as well as how it is used sometimes in non-Fedora FLOSS-legal contexts. I am certain the phrase was not invented by Fedora but somehow it crept into FLOSS legal commentary about 10 or so years ago and I wasn't even aware it was used in Fedora documentation until last year. It partially embodies (usually in a highly distorted way) a much older set of folk-understandings of the operation of the *GPL license family in particular but is often used more generally. It may have some connection to what SPDX calls the "concluded license" (which is contrasted with the "declared license") but to be honest I am not sure what those concepts mean.
It's true that in a less specific way we are doing lots of "effective license analysis", for example anytime I have said that something is "not a license" despite the license text appearing in some source code.
Below, I'm collecting a list of observations of what I believe is the current approach in this area, as taken by package maintainers carrying out the SPDX conversion. To me, it strongly suggest that the SPDX identifiers we derive today do not accurately reflect binary RPM package licensing, even when lots of package maintainers put in the extra effort to determine binary package licenses.
- Most package maintainers probably assume that License: tags on all built RPMs (source RPMs and binary RPMs) should reflect binary package contents, at least when all subpackages are considered in aggregate. Often, Source RPMs contain the same License: line as binary RPMs.
This is the most important issue I was hoping to raise, if we mean the same thing.
We (Jilayne and I and others who worked on the new Fedora license-related docs) did not invent this concept. The old Fedora documentation had a "license of the binary" policy, I assume developed mainly by Tom Callaway, that I always thought was a great analytical or representational advance. Here's what the old Fedora docs said:
The oldest archived version of http://fedoraproject.org/wiki/Packaging:LicensingGuidelines(dated from 2008) says "The License: field refers to the licenses of the contents of the *binary* rpm." The author was clearly at pains to make clear that it was not meant to encompass the entirety of the source code as packaged in source RPMs.
At least since 2009, this was followed by: "If a source package generates multiple binary packages, the License: field may differ between them if necessary. This implies that a single spec may have multiple per-subpackage License: tags. Each of those License: tags must comply with all applicable guidelines."
I thought I understood what that meant and I thought I saw examples of that in operation. Recently, I've started to wonder whether I misunderstood that all along, though I don't see how. The text seems very clear to me.
When I look randomly at spec files of Fedora packages, I begin to suspect that most Fedora package maintainers must have always ignored this directive and have continued to ignore it after the rule was recast in the post-July-2022 docs. In *most* cases of packages other than possibly those coming from ecosystems or historical contexts featuring highly uncomplicated licensing structures, there will be some differences in the makeup of binary packages from a built source code licensing standpoint. I only rarely see attempts to reflect this via multiple License: fields. While in the scheme of things I only look at a small sample of Fedora packages I suspect they are representative.
I can conclude one of two things: 1. The license of the binary rule is too hard for most Fedora package maintainers to comply with. 2. Fedora package maintainers are unaware of the rule and are substituting their own intuition, which I think must be something like "each RPM should have one License: field that reflects the makeup of all the binary RPMs without attempting to distinguish among them".
BTW I don't think #1 is "The license of the binary rule is too hard for most Fedora package maintainers to comply with *without the application of effective licensing folkloric concepts". Because even when "effective licensing" was assumed by some Fedora package maintainers to be legitimate (even though it was never consistently endorsed in Fedora legal/packaging rules) it must be the case that most Fedora package maintainers were still ignoring the rule.
This puzzles and disappoints me since, as I have said, the license of the binary concept was in my view a major advance in the way people were thinking about appropriate ways of representing licenses of packages. If you look into SPDX, for example, SPDX doesn't even have (as far as I can tell) a sophisticated way of distinguishing between binary and source licensing. I believe this reflects the source code-centric and non-packaging-centric world view of many of the people who got involved with SPDX early on, but that may be unfair.
When we (a bunch of us inside Red Hat that is) started to think about revamping the rules on RPM license metadata, we thought about a number of options. One thing I should note is that my enthusiasm for a "license of the binary" rule was never really shared by anyone else I talked to at Red Hat (though I think this is partly because those who I discussed it with came from those "source code centric" backgrounds wrt open source license compliance and such). Anyway, we considered switching to a "license of the source" rule, sort of like how I think Petr Pisar is choosing to use the Source-License: field. We also considered a more complex sort of "license of the binary" rule that would attempt to do what I thought of as orthodox GPL-style analysis on the components of binary RPMs (so that a binary RPM might have "License: GPL-2.0-or-later AND GPL-2.0-or-later") but this was rejected as unnecessarily complicated. We ended up with the "simple enumeration of the licenses of the binary" rule which is in the current Fedora docs, which I think of as a restatement of the 2009 (or earlier) "license of the binary" rule. This was also discussed on this list prior to incorporation into the present-day legal docs.
I'm deliberately ignoring most of the rest of your comments in this message because I think they raise some additional topics, because I want to make sure there is some focus on this one. What do we do about the "license of the binary" rule? If it is really too hard to comply with, I think we can only conclude that it has to be replaced with some other approach. Since I'm not a Fedora package maintainer I do not have good intuition for what's too hard vs. what's merely annoying or cumbersome. I know why I find it challenging to figure out what source files map to a given binary RPM, but I don't really directly understand why this is hard for a Fedora package maintainer who is theoretically highly familiar with the code they are packaging and theoretically has some expertise in the language(s) and build tools at issue. I just see the evidence suggesting that it is.
In the light of this, I would like to suggest updating the guidelines in the following way:
The License: line should be based on the sources only. Using a tool such as Fossology to discover relevant licenses and their SPDX tags is sufficient. No analysis how licenses from package source code or the build environment propagate into binary RPMs should be performed. Individual SPDX identifiers that a tool has listed should be separated by AND. Package maintainers are encouraged to re-run license analysis tooling on the source code as part of major package rebases, and update the License: tag accordingly.
This seems to be close to what is *really* happening today, except that there are categories of things that package maintainers know they can exclude as a matter of convention.
Richard
On Thu, Aug 24, 2023 at 12:15 PM Richard Fontana rfontana@redhat.com wrote:
When I look randomly at spec files of Fedora packages, I begin to suspect that most Fedora package maintainers must have always ignored this directive and have continued to ignore it after the rule was recast in the post-July-2022 docs. In *most* cases of packages other than possibly those coming from ecosystems or historical contexts featuring highly uncomplicated licensing structures, there will be some differences in the makeup of binary packages from a built source code licensing standpoint. I only rarely see attempts to reflect this via multiple License: fields. While in the scheme of things I only look at a small sample of Fedora packages I suspect they are representative.
If you only looked at a small sample, it may not have been representative. The package reviews I have been involved in recently, both as submitter and reviewer, have tried to faithfully reflect the license of the binary package. And, really, a single year isn't enough time for the new docs to have had a big effect. We have a huge number of packages in Fedora, so changes to packaging guidelines take quite awhile to propagate throughout the collection. I suspect that if you narrow your sample to packages that have been reviewed in the last 12 months, you might get a quite different impression.
I can conclude one of two things:
- The license of the binary rule is too hard for most Fedora package
maintainers to comply with. 2. Fedora package maintainers are unaware of the rule and are substituting their own intuition, which I think must be something like "each RPM should have one License: field that reflects the makeup of all the binary RPMs without attempting to distinguish among them".
Or, as I suggested above: 3. Fedora packagers are (by and large) aware of the binary rule, but there's a lot of inertia to overcome in the existing corpus of packages.
This puzzles and disappoints me since, as I have said, the license of the binary concept was in my view a major advance in the way people were thinking about appropriate ways of representing licenses of packages. If you look into SPDX, for example, SPDX doesn't even have (as far as I can tell) a sophisticated way of distinguishing between binary and source licensing. I believe this reflects the source code-centric and non-packaging-centric world view of many of the people who got involved with SPDX early on, but that may be unfair.
I don't think you should be disappointed. Give it more time. I think the license of the binary concept is useful.
I'm deliberately ignoring most of the rest of your comments in this message because I think they raise some additional topics, because I want to make sure there is some focus on this one. What do we do about the "license of the binary" rule? If it is really too hard to comply with, I think we can only conclude that it has to be replaced with some other approach. Since I'm not a Fedora package maintainer I do not have good intuition for what's too hard vs. what's merely annoying or cumbersome. I know why I find it challenging to figure out what source files map to a given binary RPM, but I don't really directly understand why this is hard for a Fedora package maintainer who is theoretically highly familiar with the code they are packaging and theoretically has some expertise in the language(s) and build tools at issue. I just see the evidence suggesting that it is.
There are cases that make it tricky. Florian mentioned C/C++ header files that contain inline functions, for example. Figuring out which header files have such definitions, and whether or not a given binary actually uses any such definition is nontrivial.
As others have suggested, to make this tractable, we really need some automation that can help us out. What I as a packager would really like to do when working on package P is: 1. List all of the licenses introduced by P itself 2. List all of the packages that might inject something into the final binary package 3. Use magic nonexistent tooling to automatically construct the final list of relevant licenses by starting with (1), and then iterating over the list of packages in (2) and extracting their licenses.
That will probably produce a list that is too big, as some package in (2) may only inject a single artifact covered by a single license, but itself be covered by a longer list of licenses. But it would be a start. Tracking down transitive licenses is most of the work, in my experience, and the results can be invalidated at any time by a change in some other package.
On Thu, Aug 24, 2023 at 02:15:21PM -0400, Richard Fontana wrote:
On Mon, Aug 21, 2023 at 7:04 AM Florian Weimer fweimer@redhat.com wrote:
Below, I'm collecting a list of observations of what I believe is the current approach in this area, as taken by package maintainers carrying out the SPDX conversion. To me, it strongly suggest that the SPDX identifiers we derive today do not accurately reflect binary RPM package licensing, even when lots of package maintainers put in the extra effort to determine binary package licenses.
- Most package maintainers probably assume that License: tags on all built RPMs (source RPMs and binary RPMs) should reflect binary package contents, at least when all subpackages are considered in aggregate. Often, Source RPMs contain the same License: line as binary RPMs.
This is the most important issue I was hoping to raise, if we mean the same thing.
When I look randomly at spec files of Fedora packages, I begin to suspect that most Fedora package maintainers must have always ignored this directive and have continued to ignore it after the rule was recast in the post-July-2022 docs. In *most* cases of packages other than possibly those coming from ecosystems or historical contexts featuring highly uncomplicated licensing structures, there will be some differences in the makeup of binary packages from a built source code licensing standpoint. I only rarely see attempts to reflect this via multiple License: fields. While in the scheme of things I only look at a small sample of Fedora packages I suspect they are representative.
I can conclude one of two things:
- The license of the binary rule is too hard for most Fedora package
maintainers to comply with. 2. Fedora package maintainers are unaware of the rule and are substituting their own intuition, which I think must be something like "each RPM should have one License: field that reflects the makeup of all the binary RPMs without attempting to distinguish among them".
FWIW, I was not even aware that it was possible have multiple License fields, one per sub-RPM. I suspect many people will be in the same boat, because if it is used, it is very rare.
With regards, Daniel
Dne 25. 08. 23 v 9:52 Daniel P. Berrangé napsal(a):
FWIW, I was not even aware that it was possible have multiple License fields, one per sub-RPM. I suspect many people will be in the same boat, because if it is used, it is very rare.
It is not so rare:
We have in Fedora 23075 spec files.
They have together 29469 license tags.
* Richard Fontana:
Below, I'm collecting a list of observations of what I believe is the current approach in this area, as taken by package maintainers carrying out the SPDX conversion. To me, it strongly suggest that the SPDX identifiers we derive today do not accurately reflect binary RPM package licensing, even when lots of package maintainers put in the extra effort to determine binary package licenses.
- Most package maintainers probably assume that License: tags on all built RPMs (source RPMs and binary RPMs) should reflect binary package contents, at least when all subpackages are considered in aggregate. Often, Source RPMs contain the same License: line as binary RPMs.
This is the most important issue I was hoping to raise, if we mean the same thing.
We (Jilayne and I and others who worked on the new Fedora license-related docs) did not invent this concept. The old Fedora documentation had a "license of the binary" policy, I assume developed mainly by Tom Callaway, that I always thought was a great analytical or representational advance. Here's what the old Fedora docs said:
The oldest archived version of http://fedoraproject.org/wiki/Packaging:LicensingGuidelines(dated from 2008) says "The License: field refers to the licenses of the contents of the *binary* rpm." The author was clearly at pains to make clear that it was not meant to encompass the entirety of the source code as packaged in source RPMs.
At least since 2009, this was followed by: "If a source package generates multiple binary packages, the License: field may differ between them if necessary. This implies that a single spec may have multiple per-subpackage License: tags. Each of those License: tags must comply with all applicable guidelines."
Very interesting. I was not aware of this advice.
I thought I understood what that meant and I thought I saw examples of that in operation. Recently, I've started to wonder whether I misunderstood that all along, though I don't see how. The text seems very clear to me.
Yes, it's quite clear.
When I look randomly at spec files of Fedora packages, I begin to suspect that most Fedora package maintainers must have always ignored this directive and have continued to ignore it after the rule was recast in the post-July-2022 docs. In *most* cases of packages other than possibly those coming from ecosystems or historical contexts featuring highly uncomplicated licensing structures, there will be some differences in the makeup of binary packages from a built source code licensing standpoint. I only rarely see attempts to reflect this via multiple License: fields. While in the scheme of things I only look at a small sample of Fedora packages I suspect they are representative.
Yeah, I knew it was possible to have per-subpackage License: tags, but I haven't seen much use of them.
I can conclude one of two things:
- The license of the binary rule is too hard for most Fedora package
maintainers to comply with.
I don't see how this is practical to implement, particularly for projects which an assortment of licenses. Unbundling helps somewhat because if code comes from a system library, it seems that we still do not have to declare its license under the rules as written (except if it's a Rust package).
- Fedora package maintainers are unaware of the rule and are
substituting their own intuition, which I think must be something like "each RPM should have one License: field that reflects the makeup of all the binary RPMs without attempting to distinguish among them".
The multiple License: field thing is probably not well-known.
BTW I don't think #1 is "The license of the binary rule is too hard for most Fedora package maintainers to comply with *without the application of effective licensing folkloric concepts". Because even when "effective licensing" was assumed by some Fedora package maintainers to be legitimate (even though it was never consistently endorsed in Fedora legal/packaging rules) it must be the case that most Fedora package maintainers were still ignoring the rule.
This puzzles and disappoints me since, as I have said, the license of the binary concept was in my view a major advance in the way people were thinking about appropriate ways of representing licenses of packages.
But the rules seem to exclude things that come from the buildroot instead of source package, so framing this as “license of the binary” is slightly misleading. It may not be what people want to know if they are looking for the “license of the binary”.
If you look into SPDX, for example, SPDX doesn't even have (as far as I can tell) a sophisticated way of distinguishing between binary and source licensing. I believe this reflects the source code-centric and non-packaging-centric world view of many of the people who got involved with SPDX early on, but that may be unfair.
SPDX seems to have evolved somewhat, and I found some talk of covering dynamic linking and self-configuring components:
| Packages and Relationships | | While the single package concept of files worked well, the notion of | relationships was added beginning with SPDX 2.0. This allows SPDX | documents to address more complex use cases by being able to refer to | one another along with what the relationship is between them. | | As an example, consider a binary-only delivery or download as shown in | the following figure. | | In this particular example, the binary SPDX document has two | relationships: | | 1. That it was “generated from” these source files; and | 2. It dynamically links (say at runtime) with this particular library. | | This now gives a complete licensing picture as you know the licenses | of the sources used to build the application and then what it links | with at runtime as well.
https://spdx.dev/resources/use/#fws_64e844a377638
When we (a bunch of us inside Red Hat that is) started to think about revamping the rules on RPM license metadata, we thought about a number of options. One thing I should note is that my enthusiasm for a "license of the binary" rule was never really shared by anyone else I talked to at Red Hat
There might now be downstream requirements for something like this. Some people are trying to figure out.
One the one hand, I'd be exciting about tooling support for this. (There is significant overlap with recording “this program has components that used include files from glibc-devel-2.36-9.fc37.x86_64 for compilation”.) But it's a lot of work to implement across many components before such information can be propagated automatically. But I don't know if that matches commercial needs, and to what these needs can be met with tooling alone and without per-source-file markup.
It's also possible that people who are looking for “license of the binary” actually want information about potential run-time executions and what is constructed by very late binding.
Personally, I'm also worried that the data may be used to minimize shipping source code, although I don't think it's technically suited to that.
I'm deliberately ignoring most of the rest of your comments in this message because I think they raise some additional topics, because I want to make sure there is some focus on this one. What do we do about the "license of the binary" rule?
I think we should definitely try to get a downstream view on this, if there is one.
If it is really too hard to comply with, I think we can only conclude that it has to be replaced with some other approach. Since I'm not a Fedora package maintainer I do not have good intuition for what's too hard vs. what's merely annoying or cumbersome. I know why I find it challenging to figure out what source files map to a given binary RPM, but I don't really directly understand why this is hard for a Fedora package maintainer who is theoretically highly familiar with the code they are packaging and theoretically has some expertise in the language(s) and build tools at issue.
Detailed knowledge of a package (not even its implementation language) or the changes that come in from upstream (which might invalidate previously license analysis) is not actually required. There's definitely a generic maintainer skillset that focus more on scale than deep knowledge of one upstream project. Things like knowing how to report things to upstreams in such a way that it's likely they they give you a workaround or a fix in the form of a patch.
And even with deep technical knowledge, the license aspects are probably not on many people's radar. For many packages with uniform licensing or without subpackages, the licensing changes due to ongoing development is likely minimal. But otherwise, it's really not something that is on the radar. From time to time, the matter of dependency management (and minimizing unwanted dependencies) gets increased attention, but that is generally only a concern across packages, and as far as I understand it, we are ignoring the cross-package aspects of licensing.
One possible outcome is that we need to produce per-file instead of per-binary-package licensing metadata, with reasonable efforts to reproduce over-reporting. That would extend the complications of the multi-binary-package scenario to many more source packages.
Thanks, Florian
On Fri, Aug 25, 2023 at 5:58 AM Florian Weimer fweimer@redhat.com wrote:
- Richard Fontana:
When we (a bunch of us inside Red Hat that is) started to think about revamping the rules on RPM license metadata, we thought about a number of options. One thing I should note is that my enthusiasm for a "license of the binary" rule was never really shared by anyone else I talked to at Red Hat
There might now be downstream requirements for something like this. Some people are trying to figure out.
[. . .]
One the one hand, I'd be exciting about tooling support for this. (There is significant overlap with recording “this program has components that used include files from glibc-devel-2.36-9.fc37.x86_64 for compilation”.) But it's a lot of work to implement across many components before such information can be propagated automatically. But I don't know if that matches commercial needs, and to what these needs can be met with tooling alone and without per-source-file markup.
It's also possible that people who are looking for “license of the binary” actually want information about potential run-time executions and what is constructed by very late binding.
Personally, I'm also worried that the data may be used to minimize shipping source code, although I don't think it's technically suited to that.
I'm deliberately ignoring most of the rest of your comments in this message because I think they raise some additional topics, because I want to make sure there is some focus on this one. What do we do about the "license of the binary" rule?
I think we should definitely try to get a downstream view on this, if there is one.
I assume you primarily mean the view of engineers working on packaging for CentOS Stream/RHEL, but the main downstream interest in package license metadata I am aware of doesn't relate directly to engineering. Red Hat periodically gets requests from some customers or partners for detailed lists of information about components, primarily licensing information. For reasons that go well beyond the need to respond to such requests, Red Hat Product Security has also invested in developing systems for producing SBOMs and this is now being used as a basis for responding to those kinds of requests.
Anyway, the approach that has always been taken in responding to these requests for RPMs, at least those coming from RHEL specifically, has been to use the License: field contents (ignoring any varying information for subpackages). So basically there is one list item corresponding to each SRPM. This is justified partly by the quality we associate with the Fedora-based approach, i.e. we feel we can report the contents of the License: field in most cases rather than scan or otherwise review the package anew.
Note that there are other important ways in which the license review and categorization work done by Fedora benefits Red Hat. Most notably, the data on allowed and not allowed licenses serves as Red Hat's own policy on what licenses are allowed and not allowed. The switch to SPDX identifiers has facilitated this because it enables a much more precise definition of allowed/not allowed than was possible under the Callaway system. But none of that is really directly dependent on any particular rule adopted for what license information gets reported in RPM license metadata.
Richard
* Richard Fontana:
I think we should definitely try to get a downstream view on this, if there is one.
I assume you primarily mean the view of engineers working on packaging for CentOS Stream/RHEL,
No, not engineering actually.
[requests for SBOMs]
Anyway, the approach that has always been taken in responding to these requests for RPMs, at least those coming from RHEL specifically, has been to use the License: field contents (ignoring any varying information for subpackages). So basically there is one list item corresponding to each SRPM. This is justified partly by the quality we associate with the Fedora-based approach, i.e. we feel we can report the contents of the License: field in most cases rather than scan or otherwise review the package anew.
Yes, but doesn't need SPDX, as you point out below.
Some people on the development side (not those who drive SPDX adoption in Fedora as far as I know) have started to talk about compliance in this context, and this makes me nervous because they don't say where these alleged compliance requirements come from.
Thanks, Florian
Dne 24. 08. 23 v 20:15 Richard Fontana napsal(a):
On Mon, Aug 21, 2023 at 7:04 AM Florian Weimer fweimer@redhat.com wrote:
I think Richard said that he would start a thread like this, but it hasn't happened, so I feel like should get this off my chest now. https://docs.fedoraproject.org/en-US/legal/license-field/#_no_effective_license_analysis starts with this:
| No “effective license” analysis | | The License: field is meant to provide a simple enumeration of the | licenses found in the source code that are reflected in the binary | package. No further analysis should be done regarding what the | "effective" license is, such as analysis based on theories of GPL | interpretation or license compatibility or suppositions that | “top-level” license files somehow negate different licenses appearing | on individual source files.
This is contradictory. I think there are two aspects here:
Determine possible licenses that end up in the binary package.
Perform algebraic simplifications on the license list.
Both analyses are forms of effective licensing analysis. Of course, you cannot derive an SPDX identifier without doing any analysis. However, I strongly believe that the first approach (determining the binary package license) is itself a form of effective licensing analysis, and similar reasons for package maintainers not doing this applies. The derived SPDX identifier will reflect both the package source code and what went into the build system.
We were using "effective license" somewhat more narrowly, referring to how that phrase was used in some of the legacy Fedora documentation as well as how it is used sometimes in non-Fedora FLOSS-legal contexts. I am certain the phrase was not invented by Fedora but somehow it crept into FLOSS legal commentary about 10 or so years ago and I wasn't even aware it was used in Fedora documentation until last year. It partially embodies (usually in a highly distorted way) a much older set of folk-understandings of the operation of the *GPL license family in particular but is often used more generally. It may have some connection to what SPDX calls the "concluded license" (which is contrasted with the "declared license") but to be honest I am not sure what those concepts mean.
It's true that in a less specific way we are doing lots of "effective license analysis", for example anytime I have said that something is "not a license" despite the license text appearing in some source code.
Below, I'm collecting a list of observations of what I believe is the current approach in this area, as taken by package maintainers carrying out the SPDX conversion. To me, it strongly suggest that the SPDX identifiers we derive today do not accurately reflect binary RPM package licensing, even when lots of package maintainers put in the extra effort to determine binary package licenses.
- Most package maintainers probably assume that License: tags on all built RPMs (source RPMs and binary RPMs) should reflect binary package contents, at least when all subpackages are considered in aggregate. Often, Source RPMs contain the same License: line as binary RPMs.
This is the most important issue I was hoping to raise, if we mean the same thing.
We (Jilayne and I and others who worked on the new Fedora license-related docs) did not invent this concept. The old Fedora documentation had a "license of the binary" policy, I assume developed mainly by Tom Callaway, that I always thought was a great analytical or representational advance. Here's what the old Fedora docs said:
The oldest archived version of http://fedoraproject.org/wiki/Packaging:LicensingGuidelines(dated from 2008) says "The License: field refers to the licenses of the contents of the *binary* rpm." The author was clearly at pains to make clear that it was not meant to encompass the entirety of the source code as packaged in source RPMs.
At least since 2009, this was followed by: "If a source package generates multiple binary packages, the License: field may differ between them if necessary. This implies that a single spec may have multiple per-subpackage License: tags. Each of those License: tags must comply with all applicable guidelines."
I thought I understood what that meant and I thought I saw examples of that in operation. Recently, I've started to wonder whether I misunderstood that all along, though I don't see how. The text seems very clear to me.
When I look randomly at spec files of Fedora packages, I begin to suspect that most Fedora package maintainers must have always ignored this directive and have continued to ignore it after the rule was recast in the post-July-2022 docs. In *most* cases of packages other than possibly those coming from ecosystems or historical contexts featuring highly uncomplicated licensing structures, there will be some differences in the makeup of binary packages from a built source code licensing standpoint. I only rarely see attempts to reflect this via multiple License: fields. While in the scheme of things I only look at a small sample of Fedora packages I suspect they are representative.
I can conclude one of two things:
- The license of the binary rule is too hard for most Fedora package
maintainers to comply with. 2. Fedora package maintainers are unaware of the rule and are substituting their own intuition, which I think must be something like "each RPM should have one License: field that reflects the makeup of all the binary RPMs without attempting to distinguish among them".
BTW I don't think #1 is "The license of the binary rule is too hard for most Fedora package maintainers to comply with *without the application of effective licensing folkloric concepts". Because even when "effective licensing" was assumed by some Fedora package maintainers to be legitimate (even though it was never consistently endorsed in Fedora legal/packaging rules) it must be the case that most Fedora package maintainers were still ignoring the rule.
We try to follow this guideline in e.g. Ruby:
https://src.fedoraproject.org/rpms/ruby/blob/rawhide/f/ruby.spec
However, I am afraid we completely fail for all rubygem-*-doc subpackages, where the largest amount of payload is typically generated documentation, which contains bundled fonts and libraries. Trying to address this is non-trivial (mainly, we don't really want to diverge from upstream, but other users consuming similar content via `gem` command don't care too much). I am trying to find reasonable solution (and closing my eyes) for years. At least keep it with the spirit of how the upstream works:
https://bugzilla.redhat.com/show_bug.cgi?id=1224715
https://lists.fedoraproject.org/archives/list/ruby-sig@lists.fedoraproject.o...
This puzzles and disappoints me since, as I have said, the license of the binary concept was in my view a major advance in the way people were thinking about appropriate ways of representing licenses of packages.
Even if the situation is not perfect (or maybe it is even worse ;) ), I still think it is the right thing to try to do.
Vít
If you look into SPDX, for example, SPDX doesn't even have (as far as I can tell) a sophisticated way of distinguishing between binary and source licensing. I believe this reflects the source code-centric and non-packaging-centric world view of many of the people who got involved with SPDX early on, but that may be unfair.
When we (a bunch of us inside Red Hat that is) started to think about revamping the rules on RPM license metadata, we thought about a number of options. One thing I should note is that my enthusiasm for a "license of the binary" rule was never really shared by anyone else I talked to at Red Hat (though I think this is partly because those who I discussed it with came from those "source code centric" backgrounds wrt open source license compliance and such). Anyway, we considered switching to a "license of the source" rule, sort of like how I think Petr Pisar is choosing to use the Source-License: field. We also considered a more complex sort of "license of the binary" rule that would attempt to do what I thought of as orthodox GPL-style analysis on the components of binary RPMs (so that a binary RPM might have "License: GPL-2.0-or-later AND GPL-2.0-or-later") but this was rejected as unnecessarily complicated. We ended up with the "simple enumeration of the licenses of the binary" rule which is in the current Fedora docs, which I think of as a restatement of the 2009 (or earlier) "license of the binary" rule. This was also discussed on this list prior to incorporation into the present-day legal docs.
I'm deliberately ignoring most of the rest of your comments in this message because I think they raise some additional topics, because I want to make sure there is some focus on this one. What do we do about the "license of the binary" rule? If it is really too hard to comply with, I think we can only conclude that it has to be replaced with some other approach. Since I'm not a Fedora package maintainer I do not have good intuition for what's too hard vs. what's merely annoying or cumbersome. I know why I find it challenging to figure out what source files map to a given binary RPM, but I don't really directly understand why this is hard for a Fedora package maintainer who is theoretically highly familiar with the code they are packaging and theoretically has some expertise in the language(s) and build tools at issue. I just see the evidence suggesting that it is.
In the light of this, I would like to suggest updating the guidelines in the following way:
The License: line should be based on the sources only. Using a tool such as Fossology to discover relevant licenses and their SPDX tags is sufficient. No analysis how licenses from package source code or the build environment propagate into binary RPMs should be performed. Individual SPDX identifiers that a tool has listed should be separated by AND. Package maintainers are encouraged to re-run license analysis tooling on the source code as part of major package rebases, and update the License: tag accordingly.
This seems to be close to what is *really* happening today, except that there are categories of things that package maintainers know they can exclude as a matter of convention.
Richard _______________________________________________ legal mailing list -- legal@lists.fedoraproject.org To unsubscribe send an email to legal-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/legal@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
On Mon, Aug 21, 2023 at 7:04 AM Florian Weimer fweimer@redhat.com wrote:
Below, I'm collecting a list of observations of what I believe is the current approach in this area, as taken by package maintainers carrying out the SPDX conversion. To me, it strongly suggest that the SPDX identifiers we derive today do not accurately reflect binary RPM package licensing, even when lots of package maintainers put in the extra effort to determine binary package licenses.
I recently noticed something that could be added to this list. There's a package that generates a '-docs' subpackage using Doxygen. Apparently Doxygen injects various pieces of minified JavaScript (mostly from the jQuery ecosystem, mostly MIT-licensed) in a way that is not obvious from analyzing the source code of the package that uses Doxygen. I assume this must be compliant with Fedora packaging guidelines -- although I could not verify this from reading Fedora guidelines on bundling and JavaScript.
Anyway, I would guess no Fedora package maintainer of a package that has a Doxygen docs subpackage is taking this issue into account when thinking about License: tags. Should they? I am having trouble seeing why the licensing of the Doxygen pieces should be deliberately ignored. But I also am not sure if a Fedora package maintainer should realistically be expected to know that this situation occurs. I was moving toward the view that if the package build process results in the inclusion of some licensed material from another package, this can be ignored if (a) the inclusion occurs in huge numbers of Fedora packages and (b) most normal Fedora installs will have the other package. I was thinking that would take care of Florian's gcc and glibc statically-linked startup code examples, but surely neither (a) nor (b) apply to the Doxygen case which seems sort of analogous.
Richard
On Sat, Oct 28, 2023 at 6:05 PM Richard Fontana rfontana@redhat.com wrote:
On Mon, Aug 21, 2023 at 7:04 AM Florian Weimer fweimer@redhat.com wrote:
Below, I'm collecting a list of observations of what I believe is the current approach in this area, as taken by package maintainers carrying out the SPDX conversion. To me, it strongly suggest that the SPDX identifiers we derive today do not accurately reflect binary RPM package licensing, even when lots of package maintainers put in the extra effort to determine binary package licenses.
I recently noticed something that could be added to this list. There's a package that generates a '-docs' subpackage using Doxygen. Apparently Doxygen injects various pieces of minified JavaScript (mostly from the jQuery ecosystem, mostly MIT-licensed) in a way that is not obvious from analyzing the source code of the package that uses Doxygen. I assume this must be compliant with Fedora packaging guidelines -- although I could not verify this from reading Fedora guidelines on bundling and JavaScript.
Anyway, I would guess no Fedora package maintainer of a package that has a Doxygen docs subpackage is taking this issue into account when thinking about License: tags. Should they?
This might have been the case a few years ago, but no longer.
New packages often don't bother building documentation (be it with doxygen, sphinx, or rubygems), because the effort to make this work "properly" for the latest guidelines wrt/ bundling and license tags makes it very onerous. You will find many mentions of this, especially for sphinx, in recent Python package reviews, or on the mailing lists.
Fabio
I am having trouble seeing why the licensing of the Doxygen pieces should be deliberately ignored. But I also am not sure if a Fedora package maintainer should realistically be expected to know that this situation occurs. I was moving toward the view that if the package build process results in the inclusion of some licensed material from another package, this can be ignored if (a) the inclusion occurs in huge numbers of Fedora packages and (b) most normal Fedora installs will have the other package. I was thinking that would take care of Florian's gcc and glibc statically-linked startup code examples, but surely neither (a) nor (b) apply to the Doxygen case which seems sort of analogous.
On 28-10-2023 18:33, Fabio Valentini wrote:
On Sat, Oct 28, 2023 at 6:05 PM Richard Fontana rfontana@redhat.com wrote:
On Mon, Aug 21, 2023 at 7:04 AM Florian Weimer fweimer@redhat.com wrote:
Below, I'm collecting a list of observations of what I believe is the current approach in this area, as taken by package maintainers carrying out the SPDX conversion. To me, it strongly suggest that the SPDX identifiers we derive today do not accurately reflect binary RPM package licensing, even when lots of package maintainers put in the extra effort to determine binary package licenses.
I recently noticed something that could be added to this list. There's a package that generates a '-docs' subpackage using Doxygen. Apparently Doxygen injects various pieces of minified JavaScript (mostly from the jQuery ecosystem, mostly MIT-licensed) in a way that is not obvious from analyzing the source code of the package that uses Doxygen. I assume this must be compliant with Fedora packaging guidelines -- although I could not verify this from reading Fedora guidelines on bundling and JavaScript.
Anyway, I would guess no Fedora package maintainer of a package that has a Doxygen docs subpackage is taking this issue into account when thinking about License: tags. Should they?
This might have been the case a few years ago, but no longer.
New packages often don't bother building documentation (be it with doxygen, sphinx, or rubygems), because the effort to make this work "properly" for the latest guidelines wrt/ bundling and license tags makes it very onerous. You will find many mentions of this, especially for sphinx, in recent Python package reviews, or on the mailing lists.
Since this came up again in a review I took part in wrt Sphinx generated docs, here is a mailinglist thread, where this has been discussed:
https://lists.fedoraproject.org/archives/list/packaging@lists.fedoraproject....
I also decided to no longer bother with Sphinx generated documentation. It would be great if this could be mentioned in the packaging guidelines, since many packagers / reviewers are still unaware, it appears.
-- Sandro
Richard Fontana wrote:
Apparently Doxygen injects various pieces of minified JavaScript
Sphinx does similar things.
Anyway, I would guess no Fedora package maintainer of a package that has a Doxygen docs subpackage is taking this issue into account when thinking about License: tags.
I'm a counterexample, or maybe I'm just the exception that confirms the rule. I have text like this in some of my spec files:
%package doc Summary: Documentation for the XML/Ada library BuildArch: noarch License: AdaCore-doc AND MIT AND BSD-2-Clause # License for the documentation is AdaCore-doc. The Javascript and CSS files # that Sphinx includes with the documentation are BSD 2-Clause and MIT-licensed.
I wish Sphinx would put those files in a subpackage that documentation packages could depend on, instead of multiplying them all over the distribution. Then I wouldn't have to analyze their licenses for my License tags, and would have more time for useful work. The dependency should preferably be added automatically.
Björn Persson
On Sat, Oct 28, 2023 at 12:05:06PM -0400, Richard Fontana wrote:
On Mon, Aug 21, 2023 at 7:04 AM Florian Weimer fweimer@redhat.com wrote:
Below, I'm collecting a list of observations of what I believe is the current approach in this area, as taken by package maintainers carrying out the SPDX conversion. To me, it strongly suggest that the SPDX identifiers we derive today do not accurately reflect binary RPM package licensing, even when lots of package maintainers put in the extra effort to determine binary package licenses.
I recently noticed something that could be added to this list. There's a package that generates a '-docs' subpackage using Doxygen. Apparently Doxygen injects various pieces of minified JavaScript (mostly from the jQuery ecosystem, mostly MIT-licensed) in a way that is not obvious from analyzing the source code of the package that uses Doxygen. I assume this must be compliant with Fedora packaging guidelines -- although I could not verify this from reading Fedora guidelines on bundling and JavaScript.
FWIW, in my recent re-reviews of packages I came across one that was shipping doxygen generted content in its tarball, including jquery JS.
Anyway, I would guess no Fedora package maintainer of a package that has a Doxygen docs subpackage is taking this issue into account when thinking about License: tags. Should they? I am having trouble seeing why the licensing of the Doxygen pieces should be deliberately ignored.
The act of copying content from one RPM build dependancy package into the RPM output is a more general problem that Doxygen, with static linking being the poster-child for it. This static linking happens continually with some languages like Rust/Go/Ocaml, and AFAICT the license tags implications have been essentially ignored.
I doubt that maintainers of packages with static linked libraries are remembering to copy any needed the license info from all the depenedent static libs they link against either.
But I also am not sure if a Fedora package maintainer shouldrealistically be expected to know that this situation occurs.
For Doxygen the copying is certainly very subtle, so trivial to overlook. When considering licensing I think maintainers are generally only looking at the source tarball contents, and not even thinking about files copied into the output RPMs from other RPMs.
I wasmoving toward the view that if the package build process results in the inclusion of some licensed material from another package, this can be ignored if (a) the inclusion occurs in huge numbers of Fedora packages and (b) most normal Fedora installs will have the other package. I was thinking that would take care of Florian's gcc and glibc statically-linked startup code examples, but surely neither (a) nor (b) apply to the Doxygen case which seems sort of analogous.
There is a practicality question for expressing copied content in the License tag. Even if developers identify copied contet, which is certainly not easy in many cases, watching deps forever more to see if the copied content gains/losses licenses is even harder.
When we bundle content from multiple upstream source tarballs in a source package, via fake provides:
$ rpm -qa --provides | grep bundle | head -10 bundled(python3dist(appdirs)) = 1.4.3 bundled(python3dist(importlib-metadata)) = 4.11.1 bundled(python3dist(importlib-resources)) = 5.4 bundled(python3dist(jaraco-text)) = 3.7 bundled(python3dist(more-itertools)) = 8.8 bundled(python3dist(ordered-set)) = 3.1.1 bundled(python3dist(packaging)) = 21.3 bundled(python3dist(pyparsing)) = 3.0.9 bundled(python3dist(tomli)) = 2.0.1 bundled(python3dist(typing-extensions)) = 4.0.1
if we want to keep track of copied content (either static linked libraries or simply copied file like doxygen), perhaps we should do so via another kind of fake provides tag:
eg for the doyxgen case
copied(doxygen)
eg for qemu-user-static which static links glibc and glib2
copied(glibc) copied(glib2)
Then if any tool needs to cnosider the full binary license, then it can follow the fake 'copied' Provides tags. The downside is that is not as precise - we are not copying the whole of doxygen, just the query files, so potentially some of the Doxygen.spec license may be inappropriate.
With regards, Daniel
On Mon, Oct 30, 2023 at 10:56 AM Daniel P. Berrangé berrange@redhat.com wrote:
The act of copying content from one RPM build dependancy package into the RPM output is a more general problem that Doxygen, with static linking being the poster-child for it. This static linking happens continually with some languages like Rust/Go/Ocaml, and AFAICT the license tags implications have been essentially ignored.
This is just not true, especially for Rust. We do our best to accurately reflect statically linked dependencies, and we have helper scripts / macros to determine *which* dependencies are actually linked into the final binaries, and which are only needed at build-time. And I also do similar things for the one Go package I maintain.
Fabio
Hello,
Wrt to Go, new packages and updated packages that are statically linked have correct license info. I have made a script to generate them in Pagure Go-sig tools.
We probably need to do a pass on the whole set of binary packages but this is taken into account.
Best regards,
Robert-André
On Mon, 30 Oct 2023, 12:02 Fabio Valentini, decathorpe@gmail.com wrote:
On Mon, Oct 30, 2023 at 10:56 AM Daniel P. Berrangé berrange@redhat.com wrote:
The act of copying content from one RPM build dependancy package into the RPM output is a more general problem that Doxygen, with static linking being the poster-child for it. This static linking happens continually with some languages like Rust/Go/Ocaml, and AFAICT the license tags implications have been essentially ignored.
This is just not true, especially for Rust. We do our best to accurately reflect statically linked dependencies, and we have helper scripts / macros to determine *which* dependencies are actually linked into the final binaries, and which are only needed at build-time. And I also do similar things for the one Go package I maintain.
Fabio _______________________________________________ legal mailing list -- legal@lists.fedoraproject.org To unsubscribe send an email to legal-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/legal@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
On Mon, Oct 30, 2023 at 12:01:54PM +0100, Fabio Valentini wrote:
On Mon, Oct 30, 2023 at 10:56 AM Daniel P. Berrangé berrange@redhat.com wrote:
The act of copying content from one RPM build dependancy package into the RPM output is a more general problem that Doxygen, with static linking being the poster-child for it. This static linking happens continually with some languages like Rust/Go/Ocaml, and AFAICT the license tags implications have been essentially ignored.
This is just not true, especially for Rust. We do our best to accurately reflect statically linked dependencies, and we have helper scripts / macros to determine *which* dependencies are actually linked into the final binaries, and which are only needed at build-time. And I also do similar things for the one Go package I maintain.
My bad, I see we did actually call out static linking implications on licenses for Rust in particular:
https://docs.fedoraproject.org/en-US/legal/license-field/#_rust_packages
That would probably benefit from being expanded since it impacts more than just the Rust ecosystem.
With regards, Daniel
This discussion made me look into this:
https://github.com/ruby/rdoc/pull/1019/files
This means that RDoc now embeds Racc. Luckily, the embedding is mentioned in Racc README.rdoc [1]:
~~~
Note that you do NOT need to follow ruby license for your own parser (racc outputs). You can distribute those files under any licenses you want.
~~~
and source code [2]:
~~~
# As a special exception, when this code is copied by Racc # into a Racc output file, you may use that output file # without restriction.
~~~
So this seem to be explicit "don't bother" exception.
However, shouldn't we require such license for every tool which produces some content? IMHO, e.g. GCC is probably not any different, after all. It takes source code and embeds itself (in fragments) into the code producing some binary.
What also bothers me, that while I have learnt now that the embedding is fine from licensing POV, I don't have any meaningful way to share this knowledge more broadly.
Vít
[1] https://github.com/ruby/racc#label-License
[2] https://github.com/ruby/racc/blob/5eb07b28bfb3e193a1cac07798fe7be7e1e246c4/l...
Dne 28. 10. 23 v 18:05 Richard Fontana napsal(a):
On Mon, Aug 21, 2023 at 7:04 AM Florian Weimer fweimer@redhat.com wrote:
Below, I'm collecting a list of observations of what I believe is the current approach in this area, as taken by package maintainers carrying out the SPDX conversion. To me, it strongly suggest that the SPDX identifiers we derive today do not accurately reflect binary RPM package licensing, even when lots of package maintainers put in the extra effort to determine binary package licenses.
I recently noticed something that could be added to this list. There's a package that generates a '-docs' subpackage using Doxygen. Apparently Doxygen injects various pieces of minified JavaScript (mostly from the jQuery ecosystem, mostly MIT-licensed) in a way that is not obvious from analyzing the source code of the package that uses Doxygen. I assume this must be compliant with Fedora packaging guidelines -- although I could not verify this from reading Fedora guidelines on bundling and JavaScript.
Anyway, I would guess no Fedora package maintainer of a package that has a Doxygen docs subpackage is taking this issue into account when thinking about License: tags. Should they? I am having trouble seeing why the licensing of the Doxygen pieces should be deliberately ignored. But I also am not sure if a Fedora package maintainer should realistically be expected to know that this situation occurs. I was moving toward the view that if the package build process results in the inclusion of some licensed material from another package, this can be ignored if (a) the inclusion occurs in huge numbers of Fedora packages and (b) most normal Fedora installs will have the other package. I was thinking that would take care of Florian's gcc and glibc statically-linked startup code examples, but surely neither (a) nor (b) apply to the Doxygen case which seems sort of analogous.
Richard _______________________________________________ legal mailing list -- legal@lists.fedoraproject.org To unsubscribe send an email to legal-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/legal@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
The irony also is, that the RDoc source repository itself does not contain copy of Racc AFAICT, but the .gem file we are using for packaging already contains the Racc copy. So the original source would have different license then the .gem.
And checking the Ruby repository, which has yet another copy of Racc does not contain the original *.ry files, while it contains the processed *.rb files with embedded Racc:
https://github.com/ruby/ruby/blob/14fa5e39d72c84d3e12e10dc5d77a6e6200c10f5/l...
What a mess.
The worst thing is that it appears to me, that often only we care about such subtleties.
Vít
Dne 30. 10. 23 v 13:32 Vít Ondruch napsal(a):
This discussion made me look into this:
https://github.com/ruby/rdoc/pull/1019/files
This means that RDoc now embeds Racc. Luckily, the embedding is mentioned in Racc README.rdoc [1]:
Note that you do NOT need to follow ruby license for your own parser (racc outputs). You can distribute those files under any licenses you want.and source code [2]:
# As a special exception, when this code is copied by Racc # into a Racc output file, you may use that output file # without restriction.So this seem to be explicit "don't bother" exception.
However, shouldn't we require such license for every tool which produces some content? IMHO, e.g. GCC is probably not any different, after all. It takes source code and embeds itself (in fragments) into the code producing some binary.
What also bothers me, that while I have learnt now that the embedding is fine from licensing POV, I don't have any meaningful way to share this knowledge more broadly.
Vít
[1] https://github.com/ruby/racc#label-License
[2] https://github.com/ruby/racc/blob/5eb07b28bfb3e193a1cac07798fe7be7e1e246c4/l...
Dne 28. 10. 23 v 18:05 Richard Fontana napsal(a):
On Mon, Aug 21, 2023 at 7:04 AM Florian Weimer fweimer@redhat.com wrote:
Below, I'm collecting a list of observations of what I believe is the current approach in this area, as taken by package maintainers carrying out the SPDX conversion. To me, it strongly suggest that the SPDX identifiers we derive today do not accurately reflect binary RPM package licensing, even when lots of package maintainers put in the extra effort to determine binary package licenses.
I recently noticed something that could be added to this list. There's a package that generates a '-docs' subpackage using Doxygen. Apparently Doxygen injects various pieces of minified JavaScript (mostly from the jQuery ecosystem, mostly MIT-licensed) in a way that is not obvious from analyzing the source code of the package that uses Doxygen. I assume this must be compliant with Fedora packaging guidelines -- although I could not verify this from reading Fedora guidelines on bundling and JavaScript.
Anyway, I would guess no Fedora package maintainer of a package that has a Doxygen docs subpackage is taking this issue into account when thinking about License: tags. Should they? I am having trouble seeing why the licensing of the Doxygen pieces should be deliberately ignored. But I also am not sure if a Fedora package maintainer should realistically be expected to know that this situation occurs. I was moving toward the view that if the package build process results in the inclusion of some licensed material from another package, this can be ignored if (a) the inclusion occurs in huge numbers of Fedora packages and (b) most normal Fedora installs will have the other package. I was thinking that would take care of Florian's gcc and glibc statically-linked startup code examples, but surely neither (a) nor (b) apply to the Doxygen case which seems sort of analogous.
Richard _______________________________________________ legal mailing list -- legal@lists.fedoraproject.org To unsubscribe send an email to legal-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/legal@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
On Mon, Oct 30, 2023 at 8:45 AM Vít Ondruch vondruch@redhat.com wrote:
The worst thing is that it appears to me, that often only we care about such subtleties.
Welcome to my life. :)
Richard