-
Notifications
You must be signed in to change notification settings - Fork 267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for HDF5 transient types #2655
Conversation
These are compound or enum datatypes stored directly in the dataset itself, and so aren't read as standalone objects during the initial metadata gathering when the file is opened
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To expand on Dennis' comment, we'll need to consider the same issues outlined in #2652. I've been occupied with summer intern stuff the last few days, let me refresh my memory and tag in folk as/if need be.
I'm pretty sure this should be fine for netCDF Java, assuming it uses the C library to read HDF5 files, as the unnamed type gets read in by the netCDF C library on the initial file open as it iterates over all objects in the HDF5 file. It's just that instead of being encountered as a standalone HDF5 object in the file, it's encountered as part of reading the metadata for the associated dataset. It's still added to the list of known types in the file, and can be queried: if (nc_open(FILE_NAME, NC_NOWRITE, &ncid)) ERR;
if (nc_inq_typeids(ncid, &num_types, NULL)) ERR;
typeids = (int*)malloc(num_types * sizeof(int));
if (nc_inq_typeids(ncid, NULL, typeids)) ERR;
if (nc_inq_user_type(ncid, typeids[0], name, &size, &base_nc_type, &nfields, &class)) ERR;
printf("\n\n***** type name: '%s', size: %zu, fields: %zu\n\n", name, size, nfields); prints: The missing type name is perhaps the only point of contention in regards to the data model, but an unambiguous name could be given to it on the netCDF side, e.g. There's also no issues of backwards compatibility that I can see -- no previously valid files will become invalid. It also only affects reading, so newer versions of netCDF won't make possibly invalid files for older versions. |
The appveyor build is failing, I guess because it's not using a C99 compliant compiler and the new test uses The failing MacOS build has this error:
It looks like a network failure? But everything else works, so that's a bit odd |
You could use your hdf5 test program to create the test file and then see what happens when |
@ZedThree - The netCDF-Java library uses its all Java implementation to read nc-4/HDF5 files. Writing, on the other hand, is done through the netCDF-C library. |
Right, but it must be able to read the file created by your hdf5 program. |
The one failing test is a dap4 test, which this PR doesn't touch at all:
I can only reproduce this locally very rarely -- are there any known issues with this fileserver? |
Try the ncdump tab also on toolsui |
I tried the ncdump tab on a few files, and it didn't work for any of them. It doesn't produce any error messages, so I can't debug it |
Is it possible for someone with access to re-run the nc-cmake-tests-oneoff-parallel (1.12.2) build to see if it is an intermittent failure, please? |
Re-running jobs. I'll echo what @ethanrd and @DennisHeimbigner mentioned above; if netCDF-Java can read these out of the box, great, if not, we'll need to have that broader conversation. But first things first, I've started the tests. Thanks! |
It looks like netCDF-Java can already read these variables out of the box:
This is the file created as part of the new test for transient types. With this PR, netCDF-C can now read the file:
and without:
I'm still not sure why that one test is failing. I notice that #2681 was recently merged, maybe that will help? |
* main: (56 commits) Fix a mismatched if,endif block label. Update issue with make distcheck Update dap4_test for systems without getopt Missed unit_test Update release notes Fix bug in szip handling. Update release notes "Simplify" XGetopt usage added fortran documenting logging documenting logging documenting logging Fix issue Unidata#2674 Update to latest main ubuntufix1 ub1" bad file reference oops Maintainer mode should only be turned on prior to minting a release. Fix run_jsonconvention.sh to be resilient against irrelevant changes to _NCProperties. ...
Hello, is there anything I can do to help move this along? It would be a very useful feature to enable greater compatibility between different netCDF implementations: netCDF-Java and h5netcdf can both already read these kinds of files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ZedThree thanks for the ping!
This PR needs to be reversed. It is creating types with empty names and that violates the netcdf-4 |
It passed all the tests; do we need to update our test suite in a way that will test for the failures you're observing? |
The test is bogus. It never checks for the type name. |
This is absolutely not the case, it only adds the ability to read such types. From my comment above:
NetCDF-Java can already read such types, so this only brings netCDF-C functionality in line with the Java library. It's also not clear that it is against the spec. Is there a more formal spec than this page in the docs? In relation to user-defined types, it says:
I don't see the issue here.
If you can provide a minimal reproducer, I will be happy to fix any bugs I've introduced and expand the test suite. As it is, the entire existing test suite passes along with the additional tests I wrote. As you requested, I also verified that netCDF-Java can already correctly read files with unnamed types in them. |
Modify your test program to print the type name. |
netcdf-java assigns names to those types. your code does not. |
But no other tests operate on a transient type. The only test that does is your test case. |
The spec requires type names to be unique within a group. Your code produces types all with the same empty name. |
The right thing to do here is to make sure that these transient types have proper names. |
re: PR Unidata#2655 This PR modifies the transient types PR so that all created transient types are given a created unique name (within a group). The form of the name is "_Anonymous<Class>NN". The class is the user-defined type class: Enum, Compound, Opaque, or Vlen. NN is an integer identifier to ensure uniqueness. Additionally, this was applied to DAP/4 anonymous dimensions. This also required some test baseline data changes. The transient test case is modified to verify that the name exists.
These are compound or enum datatypes stored directly in the dataset itself, and so aren't read as standalone objects during the initial metadata gathering when the file is opened
Closes #2651
Fixes #267
I've added a fairly simple test, maybe it should be expanded to include transient types on attributes, and other types (like enums, as in #267)
This change unfortunately also causesFixednc_test4_tst_h_refs
to fail, and I've not managed to work out why.