[Fedora-legal-list] Re: Determining minimum package review requirements relating to licenses

Friday, 31 July 2020

On Fri, Jul 31, 2020 at 1:59 PM Richard Fontana <rfontana(a)redhat.com&gt; wrote:
...

 On Fri, Jul 24, 2020 at 3:16 AM Jason Tibbitts <tibbs(a)math.uh.edu&gt; wrote:
 >
 > One of the various reasons for having package reviews is having a human
 > verify that the packager's choice of License: tag is valid.  The
 > Packaging Committee is was faced with a request
 > (https://pagure.io/packaging-committee/issue/1007) that has us
 > questioning just how much license review is required.
 >
 > Are any of the following acceptable?
 >
 > 1) Trust the packager to do a license review, with no reviewer
 >    verification.

 It seems like two different things may be getting conflated here:

 1. Review of a package to determine whether it satisfies Fedora
 licensing policy.
 2. Choice of what to put for the License: tag -- assuming 1 has been done.

 I don't have a view on whether the existing approach of having a human
 reviewer is absolutely needed. However, a general point I'd make is
 that Red Hat has been assuming a certain level of very high quality in
 community legal review of new Fedora packages -- this assumption is
 baked into certain internal processes we have at Red Hat for RHEL --
 and the additional human review probably contributes to this. If we
 relax some elements of Fedora legal review perhaps we'll need to
 introduce others to compensate for this, whether on the Fedora side or
 the RHEL side, or both.

 On the other hand, that observation doesn't really apply to the
 License: tags themselves. I would say we (or I, anyway) don't really
 find the Fedora License: tags that helpful to begin with because they
 are generally not super-accurate (by my own standards, at least) and
 there seems to be substantial inconsistency in how they are selected
 across different packages and reviewers/maintainers.

 > 2) Trust the output of an automated tool which attempts to detect
 >    project licenses (such as askalono).

 In general, license scanning tools are pretty bad, probably
 unavoidably so, since there are limits to how much you can automate
 license detection. The best or least bad one, and the only one I would
 personally vouch for, is ScanCode (mentioned by David Cantrell in his
 response). I would encourage Fedora to consider some sort of formal
 expectation for the use of ScanCode to aid in one or both of the
 distinct tasks noted above (review for conformance to Fedora licensing
 policy, decision for choice of License: tag). But even with a high
 quality tool like ScanCode you can't "trust" its output for purposes
 of task 1 (whereas its output might be okay enough to be used in a
 fairly mechanical way for task 2).

 > 3) Trust the license tag from a project hosting service such as github?
 >    (I understand that the answer may depend on the hosting service.)

 This is just another license scanning tool. There's no particular
 reason to "trust" it any more than use of non-hosted tools. My
 impression in the past was that the GitHub license identification was
 based on the licensee tool which seemed to be pretty primitive and
 naive in its assumptions. Maybe that's good enough for task 2, but not
 for task 1, IMO.

 Incidentally I also would encourage Fedora to look into the potential
 for collaboration with the ClearlyDefined project
 (https://clearlydefined.io/) which is currently not oriented towards
 Linux distributions or RPM-based packages at all. I could imagine a
 future where ClearlyDefined could be helpful in both tasks 1 and 2
 identified above. 
One thing I've been slowly working on getting packaged up and usable
in Fedora is openSUSE's Cavil tool[1], which does not make a judgement
on licenses, per se, but does a deep scan similar to our licensecheck
tool but presents the output in a much more meaningfully
understandable form. There's also some pattern matching and confidence
interval stuff which can help in determining what the effective
license is for the project, which is something that human reviewers
tend to struggle with.

The problem with most license scanning tools is that they try to
"judge" the result based on a very limited set of heuristics, mostly
based on license file detection. As we know, this is insufficient for
figuring out the true nature of a project's licensing, and this is one
thing Cavil is better at handling.

Perhaps this could help with making package reviews easier to go
through for legal review stuff.

[1]: https://github.com/openSUSE/cavil

-- 
真実はいつも一つ！/ Always, there's only one truth!

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

[Fedora-legal-list] Re: Determining minimum package review requirements relating to licenses