Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix case error when opendap keyword is used as an identifier. #434

Merged
merged 12 commits into from
Jun 29, 2017

Conversation

DennisHeimbigner
Copy link
Collaborator

Fix for github issue #310.
The problem was that for opendap, it is possible to use keywords
as identifiers when there is no ambiguity. However, the DAP2
parser lost the case of the identifier used the lower case version.
Fix is to use the actual text of the symbol when it is used as an identifier.
Also added a test case for this (kwcase.*).

Additionally cleaned up some misc. dap2 testing problems.

  1. ncdap_test/tst_ncdap3.sh was using an empty test set.
    restored the testing of datasets.
  2. as a consequence of Netcdf cmake #1, some tests needed to be updated with minor
    tweeks.
  3. fix dapmerge to handle multiple DODS_EXTRAS attributes.
  4. modify buildattribute to suppress nul characters and terminate
    the name at the first nul.
  5. clean up various test scripts to remove residual, unused
    references to obsolete netcdf-4 translation.
  6. export e.g. NCDUMP from test_common.in so that non-top-level
    shell scripts can access it.

DennisHeimbigner and others added 12 commits June 6, 2017 15:23
This is a follow-on in that the old utf8 code was still being
used in ncgen to convert utf8->utf16 when converting cdl to Java
(see genj.c).

The new code apparently has no utf16 support, but it does have
utf32 support. Converting utf32 -> utf16 can be approximated by
truncating the 32bits to 16 bits, unless the top 16 bits are
not zero. This latter condition is unlikely to be common because
it implies use of some rather obscure characters.

So solution is to convert to utf32 and truncate to 16 bits to
get utf16. An error is reported if the high-order truncated 16
bits are not zero. If we get complaints, then I will figure out
how to convert full utf32 to a utf16 pair.

Also removed the old code from ncgen.
Running a build on the .nc file corresponding to the below ncdump output
with -fsanitize=undefined raises

libsrc/ncx.c:4722:26: runtime error: left shift of 255 by 24 places cannot be represented in type 'int'

This is due to *cp being promoted to int before doing the left shift, instead
of the intended unsigned. So do the cast to unsigned internally rather than
externally

ncdump file to reproduce:

netcdf temp {
dimensions:
	y = UNLIMITED ; // (0 currently)
	x = 109067 ;
variables:
	byte t(y, x, x) ;
data:
}

Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=2265

Credit to OSS Fuzz
This is a follow-on in that the old utf8 code was still being
used in ncgen to convert utf8->utf16 when converting cdl to Java
(see genj.c).

The new code apparently has no utf16 support, but it does have
utf32 support. Converting utf32 -> utf16 can be approximated by
truncating the 32bits to 16 bits, unless the top 16 bits are
not zero. This latter condition is unlikely to be common because
it implies use of some rather obscure characters.

So solution is to convert to utf32 and truncate to 16 bits to
get utf16. An error is reported if the high-order truncated 16
bits are not zero. If we get complaints, then I will figure out
how to convert full utf32 to a utf16 pair.

Other changes:
1. removed the old code from ncgen.
2. changed UTF8PROC_DLLEXPORT (in utf8proc) to EXTERNL
   and added appropriate includes. This should fix
   issue #404,
   but since we cannot duplicate the failure, I am not quite
   sure.
Getting the value of the x variable on the file corresponding to the below ncdump output
with -fsanitize=undefined raises

ncx.c:1034:14: runtime error: left shift of 128 by 24 places cannot be represented in type 'int'

This is due to *cp being promoted to int before doing the left shift, instead
of the intended unsigned. So do the cast to unsigned internally rather than
externally

ncdump file to reproduce:

netcdf temp {
dimensions:
	x = 2 ;
	y = 2 ;
	v = 2 ;
variables:
	int x(v) ;
	byte y(y, x) ;
data:

 x = _, _ ;

 y =
  -127, -127,
  -127, -127 ;
}

Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=2356

Credit to OSS Fuzz
The problem was that for opendap, it is possible to use keywords
as identifiers
 when there is no ambiguity. However, the DAP2
parser lost the case of the identifier used the lower case version.
Fix is to use the actual text of the symbol when it is used as an identifier.
Also added a test case for this (kwcase.*).

Additionally cleaned up some misc. dap2 testing problems.
1. ncdap_test/tst_ncdap3.sh was using an empty test set.
   restored the testing of datasets.
2. as a consequence of #1, some tests needed to be updated with minor
   tweeks.
3. fix dapmerge to handle multiple DODS_EXTRAS attributes.
4. modify buildattribute to suppress nul characters and terminate
   the name at the first nul.
5. clean up various test scripts to remove residual, unused
   references to obsolete netcdf-4 translation.
6. export e.g. NCDUMP from test_common.in so that non-top-level
   shell scripts can access it.
@WardF WardF merged commit 0ae2232 into master Jun 29, 2017
@DennisHeimbigner DennisHeimbigner deleted the issue310.dmh branch January 4, 2019 20:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants