Sometime over the past few days the status of license tags crossed the 50% mark. Current status:
47.03% of spec files (2204 out of 4686) have invalid licenses (as of Wed, 29 Aug 2007 19:26:29 +0000)
Attached are lists grouped by owner and package.
I have a few questions after poking through these lists.
There are multiple versions of the GFDL license (currently it's at 1.2 on the FSF site). However, the Licensing page doesn't mention any versions. Several packages (21 to be precise), use GFDL+ as part of the license tag. This is flagged as incorrect in the current report (and by rpmlint). But should it be? If for some reason a package ends up using GDFL 1.1 without any "or later version" statement, shouldn't that be respected?
Several perl packages (including perl itself) use the license tag:
(GPL+ or Artistic) and (GPLv2+ or Artistic)
This isn't being parsed correctly by the regex used in rpmlint (which I've stolen and used in the check-licenses script). The regex is:
'\s(?:and|or)\s|[()]'
Does anyone have suggestions for improving this regex so it won't fail to parse the above license tag and others like it?
On Wed, 2007-08-29 at 15:31 -0400, Todd Zullinger wrote:
Sometime over the past few days the status of license tags crossed the 50% mark. Current status:
47.03% of spec files (2204 out of 4686) have invalid licenses (as of Wed, 29 Aug 2007 19:26:29 +0000)
Attached are lists grouped by owner and package.
I have a few questions after poking through these lists.
There are multiple versions of the GFDL license (currently it's at 1.2 on the FSF site). However, the Licensing page doesn't mention any versions. Several packages (21 to be precise), use GFDL+ as part of the license tag. This is flagged as incorrect in the current report (and by rpmlint). But should it be? If for some reason a package ends up using GDFL 1.1 without any "or later version" statement, shouldn't that be respected?
Yeah, GFDL+ should be ok.
Several perl packages (including perl itself) use the license tag:
(GPL+ or Artistic) and (GPLv2+ or Artistic)This isn't being parsed correctly by the regex used in rpmlint (which I've stolen and used in the check-licenses script). The regex is:
'\s(?:and|or)\s|[()]'Does anyone have suggestions for improving this regex so it won't fail to parse the above license tag and others like it?
How did that regex get in there?
When I did the first pass of the changes for rpmlint, this was my regex:
'\sand\s|\sor\s|(|)'
It works properly on the perl license tag, with the exception that GPL+ or Artistic and GPLv2+ or Artistic are ok, and should be special cased in rpmlint and your script (just Artistic is not OK).
~spot
On Wednesday 29 August 2007, Tom "spot" Callaway wrote:
On Wed, 2007-08-29 at 15:31 -0400, Todd Zullinger wrote:
Several perl packages (including perl itself) use the license tag:
(GPL+ or Artistic) and (GPLv2+ or Artistic)This isn't being parsed correctly by the regex used in rpmlint (which I've stolen and used in the check-licenses script). The regex is:
'\s(?:and|or)\s|[()]'Does anyone have suggestions for improving this regex so it won't fail to parse the above license tag and others like it?
How did that regex get in there?
That's how I committed it.
When I did the first pass of the changes for rpmlint, this was my regex:
'\sand\s|\sor\s|(|)'
The version currently in rpmlint works equivalently to that.
It works properly on the perl license tag, with the exception that GPL+ or Artistic and GPLv2+ or Artistic are ok, and should be special cased in rpmlint and your script (just Artistic is not OK).
I mentioned to you in private mail that the "or" in "GPL+ or Artistic" will cause problems in rpmlint with its current implementation of the license tag check. When "GPL+ or Artistic" used alone, it works because that string is in the license list as is and thus no splitting will occur. But when combined with anything else (or even put in parenthesis), it no longer works as intended because the special case is no longer found as is, and thus will be split at "or" and the split parts are checked separately.
By the way, "GPLv2+ or Artistic" is not in the license list in Wiki nor thus in rpmlint.
On Wed, 2007-08-29 at 23:41 +0300, Ville Skyttä wrote:
I mentioned to you in private mail that the "or" in "GPL+ or Artistic" will cause problems in rpmlint with its current implementation of the license tag check. When "GPL+ or Artistic" used alone, it works because that string is in the license list as is and thus no splitting will occur. But when combined with anything else (or even put in parenthesis), it no longer works as intended because the special case is no longer found as is, and thus will be split at "or" and the split parts are checked separately.
OK. Not sure how I missed this, but I've been wildly multi-tasking lately.
The only way I think we can sanely handle this case is like this:
A. Does the License Tag string match an entry in the License list? [Yes] Stop, its fine. [No] Goto B. B. Does the License Tag have any parenthesis in it? [Yes] Goto C. [No] Goto D. C. Find parenthesis wrapped strings, make them into substrings. For each substring, goto D. D. Parse string through regex. Does each remaining item (things not regexed to /dev/null) match an entry in the License list? [Yes] Do nothing, keep looking until we're out of strings. [No] Print out the non-matching item. Keep looking until we're out of strings.
Anyone want to take a crack at coding that in python? (Or propose a better method?)
By the way, "GPLv2+ or Artistic" is not in the license list in Wiki nor thus in rpmlint.
Ahh. I'll fix that right now.
~spot
Tom spot Callaway wrote:
The only way I think we can sanely handle this case is like this:
A. Does the License Tag string match an entry in the License list? [Yes] Stop, its fine. [No] Goto B. B. Does the License Tag have any parenthesis in it? [Yes] Goto C. [No] Goto D. C. Find parenthesis wrapped strings, make them into substrings. For each substring, goto D. D. Parse string through regex. Does each remaining item (things not regexed to /dev/null) match an entry in the License list? [Yes] Do nothing, keep looking until we're out of strings. [No] Print out the non-matching item. Keep looking until we're out of strings.
Anyone want to take a crack at coding that in python? (Or propose a better method?)
Don't sign me up yet, but I'll look at it when I'm trying to avoid doing other things (the structured procrastination method). Hopefully someone will beat me to it and do it better than I could.
By the way, "GPLv2+ or Artistic" is not in the license list in Wiki nor thus in rpmlint.
Ahh. I'll fix that right now.
While we still need to handle cases like this, in the particular case of "(GPL+ or Artistic) and (GPLv2+ or Artistic)", isn't it rather pointless? GPLv2+ or Artistic is a subset of GPL+ or Artistic. Why is there any need to complicate the license tag like this? It seems as silly as saying GPL+ or GPLv2+ or GPLv3+.
I think I must be missing something peculiar and historic about the Perl license
On Wed, 2007-08-29 at 17:23 -0400, Todd Zullinger wrote:
While we still need to handle cases like this, in the particular case of "(GPL+ or Artistic) and (GPLv2+ or Artistic)", isn't it rather pointless? GPLv2+ or Artistic is a subset of GPL+ or Artistic. Why is there any need to complicate the license tag like this? It seems as silly as saying GPL+ or GPLv2+ or GPLv3+.
I think I must be missing something peculiar and historic about the Perl license
The Fedora perl package is derived from the upstream perl tarball. That tarball is a "meta" tarball, containing not just base perl, but also some perl modules which perl upstream has deemed for various reasons (good, bad, otherwise) to be included as well. One of these addon modules is explicitly licensed as GPLv2+ or Artistic. The rest of the modules (and base perl) are GPL+ or Artistic. Thus, the unique licensing.
~spot
I wrote:
While we still need to handle cases like this, in the particular case of "(GPL+ or Artistic) and (GPLv2+ or Artistic)", isn't it rather pointless? GPLv2+ or Artistic is a subset of GPL+ or Artistic. Why is there any need to complicate the license tag like this? It seems as silly as saying GPL+ or GPLv2+ or GPLv3+.
I think I must be missing something peculiar and historic about the Perl license
Or, I'm missing the large comment right above the License tag in the spec file. D'oh!
Some of the other perl packages use this same license without any such comment, which makes me wonder if they have just copied the perl license tag or if they truly need such a license tag:
devel/perl-Jcode/perl-Jcode.spec devel/perl-Unicode-Map8/perl-Unicode-Map8.spec devel/perl-Unicode-Map/perl-Unicode-Map.spec devel/perl-Unicode-MapUTF8/perl-Unicode-MapUTF8.spec devel/perl-Unicode-String/perl-Unicode-String.spec
On Wed, 2007-08-29 at 17:30 -0400, Todd Zullinger wrote:
I wrote:
While we still need to handle cases like this, in the particular case of "(GPL+ or Artistic) and (GPLv2+ or Artistic)", isn't it rather pointless? GPLv2+ or Artistic is a subset of GPL+ or Artistic. Why is there any need to complicate the license tag like this? It seems as silly as saying GPL+ or GPLv2+ or GPLv3+.
I think I must be missing something peculiar and historic about the Perl license
Or, I'm missing the large comment right above the License tag in the spec file. D'oh!
Some of the other perl packages use this same license without any such comment, which makes me wonder if they have just copied the perl license tag or if they truly need such a license tag:
devel/perl-Jcode/perl-Jcode.spec devel/perl-Unicode-Map8/perl-Unicode-Map8.spec devel/perl-Unicode-Map/perl-Unicode-Map.spec devel/perl-Unicode-MapUTF8/perl-Unicode-MapUTF8.spec devel/perl-Unicode-String/perl-Unicode-String.spec
Almost certainly, this is not correct for these packages.
~spot
I wrote:
Tom spot Callaway wrote:
Anyone want to take a crack at coding that in python? (Or propose a better method?)
Don't sign me up yet, but I'll look at it when I'm trying to avoid doing other things (the structured procrastination method). Hopefully someone will beat me to it and do it better than I could.
Well, the door's still wide open for someone to do this better, but here's a crack at it. The attached patch updates rpmlint-0.80-3 (which is itself already patched for the Fedora licensing changes).
It's surely not as elegant as it could be ant there are likely bugs in it that I'll notice after sending this. I have used this same code in the check_licenses script and it seems to produce good results, which are attached. If you notice incorrect results, please let me know.
Should "(L)GPLv2+ with exceptions" be considered valid? A few packages use one of the two: fltk, gcc, glibc, jokosher, and rpm.
On Thursday 30 August 2007, Todd Zullinger wrote:
I wrote:
Tom spot Callaway wrote:
Anyone want to take a crack at coding that in python? (Or propose a better method?)
Don't sign me up yet, but I'll look at it when I'm trying to avoid doing other things (the structured procrastination method). Hopefully someone will beat me to it and do it better than I could.
Well, the door's still wide open for someone to do this better, but here's a crack at it. The attached patch updates rpmlint-0.80-3 (which is itself already patched for the Fedora licensing changes).
It's surely not as elegant as it could be ant there are likely bugs in it that I'll notice after sending this. I have used this same code in the check_licenses script and it seems to produce good results, which are attached. If you notice incorrect results, please let me know.
Looks good to me, applied in 0.81-1. Thanks.
Ville Skyttä wrote:
Looks good to me, applied in 0.81-1. Thanks.
Thanks Ville. There's one issue (that I know of) with the code I submitted, but it depends on whether or not it's possible to have a License with nested groups.
Is something like this ever possible:
License: BSD and (MPL or (GPL+ or Artistic))
If it is, the current regex won't split it properly. If such a license isn't possible, then there's probably no need to worry about it (except maybe to flag it as invalid right up front in the tag check). If it is, then we'll need to improved the regex a little.
Tom spot Callaway wrote:
By the way, "GPLv2+ or Artistic" is not in the license list in Wiki nor thus in rpmlint.
Ahh. I'll fix that right now.
It seems that this never made it into the wiki.
Aside from perl (and a few perl modules that appear to have wrongly copied the perl license tag), procmail uses GPLv2+ or Artistic as it's license.
BTW, current status:
Invalid licenses: 2086 out of 4706 (44.33%) [as of 2007-09-06 22:26 UTC]
Tom spot Callaway wrote:
Yeah, GFDL+ should be ok.
What about the different license versions?
GDFL+ GDFLv1.1 GDFLv1.1+ GDFLv1.2 GDFLv1.2+
AFAIK, the first version was 1.1. So following what's done with LGPL, both GDFL and GDFL+ could be removed. It would all be easier if the short license tag was just GDFL, but if the license is versioned, it could make a difference in the future[*], so it seems like it'd be best to use the version numbers from the start.
Of course, I'd be very glad to hear that we don't need to be that pedantic.
[*] if that weren't true, we'd still be happily using GPL as the license tag, right? :)
On Thu, 2007-08-30 at 02:46 -0400, Todd Zullinger wrote:
Tom spot Callaway wrote:
Yeah, GFDL+ should be ok.
What about the different license versions?
GDFL+ GDFLv1.1 GDFLv1.1+ GDFLv1.2 GDFLv1.2+
AFAIK, the first version was 1.1. So following what's done with LGPL, both GDFL and GDFL+ could be removed. It would all be easier if the short license tag was just GDFL, but if the license is versioned, it could make a difference in the future[*], so it seems like it'd be best to use the version numbers from the start.
Of course, I'd be very glad to hear that we don't need to be that pedantic.
[*] if that weren't true, we'd still be happily using GPL as the license tag, right? :)
We're only not using GPL as the license tag because the version matters for its interoperability between other licenses (including older versions of the GPL).
But you're right. It's better to be safe than sorry here. I'll update the table.
~spot
On Thu, 2007-08-30 at 09:47 -0400, Tom "spot" Callaway wrote:
On Thu, 2007-08-30 at 02:46 -0400, Todd Zullinger wrote:
Tom spot Callaway wrote:
Yeah, GFDL+ should be ok.
What about the different license versions?
GDFL+ GDFLv1.1 GDFLv1.1+ GDFLv1.2 GDFLv1.2+
AFAIK, the first version was 1.1. So following what's done with LGPL, both GDFL and GDFL+ could be removed. It would all be easier if the short license tag was just GDFL, but if the license is versioned, it could make a difference in the future[*], so it seems like it'd be best to use the version numbers from the start.
Of course, I'd be very glad to hear that we don't need to be that pedantic.
[*] if that weren't true, we'd still be happily using GPL as the license tag, right? :)
We're only not using GPL as the license tag because the version matters for its interoperability between other licenses (including older versions of the GPL).
But you're right. It's better to be safe than sorry here. I'll update the table.
Actually, I've changed my mind. This is a bit unnecessary for a documentation license, where we're not so worried about interoperability. If it becomes a problem, we can always introduce this versioning scheme later.
~spot