Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

License problem: ConvertUTF is non-free, use libicu instead #349

Closed
sebastic opened this issue Jan 22, 2017 · 22 comments
Closed

License problem: ConvertUTF is non-free, use libicu instead #349

sebastic opened this issue Jan 22, 2017 · 22 comments
Assignees
Milestone

Comments

@sebastic
Copy link
Contributor

sebastic commented Jan 22, 2017

The lintian QA tool reported a license problem with the ConvertUTF.{c,h} files included in ncgen (license-problem-convert-utf-code):

The following file source files include material under a non-free license from Unicode Inc. Therefore, it is not possible to ship this in main or contrib.

This license does not grant any permission to modify the files (thus failing DFSG#3). Moreover, the license grant to attempt to restrict use to "products supporting the Unicode Standard" (thus failing DFSG#6).

In this case a solution is to use libicu and to remove this code by repacking.

If this is a false-positive, please report a bug against Lintian.

Refer to https://bugs.debian.org/823100 for details.

Quoting the mentioned Debian Free Software Guidelines (DFSG) paragraphs:

3. Derived Works

The license must allow modifications and derived works, and must allow them to be distributed under the same terms as the license of the original software.

6. No Discrimination Against Fields of Endeavor

The license must not restrict anyone from making use of the program in a specific field of endeavor. For example, it may not restrict the program from being used in a business, or from being used for genetic research.

Please remove the problematic ConvertUTF.{c,h} files and use libicu instead.

@DennisHeimbigner
Copy link
Collaborator

Not sure I see this as a problem; AFAIK we have not modified it and since it was included
to support utf8 in netcdf-3, it meets that criteria. Is the issue transitivity? That is,
that the program using the netcdf-c library only indirectly support utf8 by using
netcdf-c? Please elaborate your concerns.
In any case, I will look at libicu.

@DennisHeimbigner
Copy link
Collaborator

Ok, so after a very quick look, the problem with libicu is that it is serious overkill
for our purposes and is way to general. We need something a very small footprint.
It appears to me that I will have to do major surgery on the source code to
extract just the parts I need. So, this switch would/will take a while; it will not happen
any time soon.

@WardF
Copy link
Member

WardF commented Jan 22, 2017

I agree libicu is overkill. On Monday I'll take a closer look at the convertutf license and see if there are other alternatives; I'll also contribute to the conversation regarding the potential problem for NetCDF that it may pose.

@sebastic
Copy link
Contributor Author

The problem with the ConvertUTF code is that its license is incompatible with the license of NetCDF. The NetCDF license explicitly allows modification, which the ConvertUTF license does not.

The ghostscript bugreport linked from the Debian bugreport has more information:

According to http://unicode.org/forum/viewtopic.php?f=9&t=90 - summarized at http://stackoverflow.com/questions/2685004/why-does-unicode-org-no-longer-offer-a-reference-utf-8-16-32-converter . ConvertUTF is obsolete and buggy.

According to discussion at https://lists.debian.org/debian-legal/2006/01/msg00534.html, Richard Stallman and the Unicode consortium has noth acknowledged compatibility issues with licensing of the code - issues has been solved for later code releases issued by the Unicode consortium, but according to https://web.archive.org/web/20081228105917/http://www.unicode.org/Public/PROGRAMS/CVTUTF/ there has been no newer release of ConvertUTF since 2004.

Because NetCDF does not comply with the DFSG due to the inclusion of the ConvertUTF files which don't allow modification, NetCDF and all its reverse dependencies need to be removed from Debian & Ubuntu if this issue is not resolved. Which would be a great disservice to our users.

@DennisHeimbigner
Copy link
Collaborator

I found an alternative that claims to be the MIT license.
I have attached (below) the actual LICENSE file; Does it look acceptable?
=Dennis Heimbigner

Copyright (C) 2014-2016 Quinten Lansu

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

@sebastic
Copy link
Contributor Author

Yes, the MIT licensed alternative would be a good replacement (license-wise), since both it and the NetCDF explicitly allow modification and don't contain terms contrary to the other license.

@DennisHeimbigner
Copy link
Collaborator

DennisHeimbigner commented Feb 16, 2017 via email

@sebastic
Copy link
Contributor Author

Unfortunately the Unicode data license is non-free due to the advertising clause (like BSD-4-Clause).

@DennisHeimbigner
Copy link
Collaborator

Interesting. You are aware, I presume that libicu also has this same restriction. Hence
we cannot use that either. In fact, my guess is that all utf software suffers from this same
problem.

@sebastic
Copy link
Contributor Author

I was not aware of that icu used the same license terms, since the icu license terms were apparently deemed acceptable for Debian main by the FTP masters (although that's no precedent), it's probably fine to adopt the utf8proc from Julia. If they reject the netcdf upload due to those license terms I'll raise that issue then.

DennisHeimbigner added a commit that referenced this issue Feb 16, 2017
It turns out that the utf8proc software we are using
was turned over to the Julia Language developers
and the license terms changed to allow modification.
(https://github.com/JuliaLang/utf8proc/blob/master/LICENSE.md).

So the fix here is as follows:
1. Wrap the library with a fixed interface: libdispatch/dutf8.c
   and include/ncutf8.h.
2. Replace the existing utf8proc code with the new version
   from https://github.com/JuliaLang/utf8proc.
3. Add a couple more test cases: nc_test/tst_utf8_validate.c
   and nc_test_utf8_phrases.c.  If/when I can find a usable
   normalization test, I will incorporate that later.
@DennisHeimbigner
Copy link
Collaborator

DennisHeimbigner commented Feb 16, 2017 via email

@WardF WardF added this to the 4.4.2 milestone Feb 16, 2017
DennisHeimbigner added a commit that referenced this issue Feb 16, 2017
Update utf8proc.[ch] to use the version now
maintained by the Julia Language project
(https://github.com/JuliaLang/utf8proc/blob/master/LICENSE.md).
The license for the previous version was
unacceptable for the Debian and Ubuntu release
systems. The new version both updates the code
and addresses the license issue.

It turns out that the utf8proc software we are using
was turned over to the Julia Language developers
and the license terms changed to allow modification.
(https://github.com/JuliaLang/utf8proc/blob/master/LICENSE.md).

So the fix here is as follows:
1. Wrap the library with a fixed interface: libdispatch/dutf8.c
   and include/ncutf8.h.
2. Replace the existing utf8proc code with the new version
   from https://github.com/JuliaLang/utf8proc.
3. Add a couple more test cases: nc_test/tst_utf8_validate.c
   and nc_test_utf8_phrases.c.  If/when I can find a usable
   normalization test, I will incorporate that later.
@WardF
Copy link
Member

WardF commented Mar 24, 2017

This issue is resolved, closing.

@WardF WardF closed this as completed Mar 24, 2017
@sebastic
Copy link
Contributor Author

sebastic commented Jun 6, 2017

ncgen/ConvertUTF.c & ncgen/ConvertUTF.h are still included in 4.5.0-rc1, please re-open this issue and remove/replace those files.

@WardF
Copy link
Member

WardF commented Jun 6, 2017

@DennisHeimbigner Can the solution you provided for libdispatch/ in #364 also be applied in ncgen/?

@WardF WardF reopened this Jun 6, 2017
@DennisHeimbigner
Copy link
Collaborator

I did not remember that this code was being used in ncgen. I will take responsibility for it.
Also, odd because it means we are still including the old code?

@WardF
Copy link
Member

WardF commented Jun 6, 2017

The old code (convertUTF.c/h) is currently only in ncgen; it was removed from libdispatch and the new code was put in place. I looked at libicu and I'm glad you found this solution as libicu is not practical for our purposes; it is too large, too difficult to deploy, and is an unnecessary dependency.

If you have yet to create a branch to work from, would you ~~~fork~~~ branch from v4.5.0-release-branch? If it's too late, no worries, I will make the necessary merges.

@DennisHeimbigner
Copy link
Collaborator

Ok, I will fork the release branch. This is going to be harder than I thought.
The old convert code was used only to convert utf8 to utf16 for java. The new
code apparently has no utf16 support. Since I sincerely doubt that the cdl->java
code is being used, I may take the easy way out.

@WardF
Copy link
Member

WardF commented Jun 6, 2017

Ok, I will fork the release branch. This is going to be harder than I thought.
The old convert code was used only to convert utf8 to utf16 for java. The new
code apparently has no utf16 support. Since I sincerely doubt that the cdl->java
code is being used, I may take the easy way out.

To make sure I understand, it was only used to convert utf8 to utf16 when having ncgen generate Java code? If this is the case I'd be loathe to rip it out completely as that is very useful, maybe just leave the hooks in and commented out or something. I dug into this a bit and it wouldn't be impossible to write our own converter if need be. But having this functionality removed for the next release candidate wouldn't be a problem. And would give people a chance to speak up if they need/rely on this.

@WardF
Copy link
Member

WardF commented Jun 6, 2017

Also, thanks for forking that branch; I've set it up so that anything in that branch can propagate downstream into a release candidate as well as upstream back into master, but the inverse would be messy.

@DennisHeimbigner
Copy link
Collaborator

It turns out that I do have utf8 -> utf32 conversion. And converting
utf32 -> utf16 can be approximated by truncating the 32bits to 16 bits.
I will put in an error for when the approximation fails. In any case, this
fix should be "good enough".

@WardF
Copy link
Member

WardF commented Jul 14, 2017

@DennisHeimbigner Is this issue ready to be closed out? I think it is but I thought I'd double check before closing it.

@WardF
Copy link
Member

WardF commented Jul 14, 2017

Actually, the fix was merged so closing this out, I'll reopen if I hear I need to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants