Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

h5py Datatypes #8

Closed
bnlawrence opened this issue Jul 10, 2024 · 4 comments
Closed

h5py Datatypes #8

bnlawrence opened this issue Jul 10, 2024 · 4 comments
Assignees

Comments

@bnlawrence
Copy link
Collaborator

H5py has a class Datatype. This is part of uber ticket #7!

Unfortunately, when creating NetCDF groups, h5netcdf does it's own iteration over an HDF5 group, and looks for instances of DataTypes:

for k, v in self._h5group.items():
            if isinstance(v, self._root._h5py.Group):
                # add to the groups collection if this is a h5py(d) Group
                # instance
                self._groups.add(k)
            # todo: add other user types here
            elif isinstance(
                v, self._root._h5py.Datatype
            ) and self._root._h5py.check_enum_dtype(v.dtype):
                self._enumtypes.add(k)
            else:
                if v.attrs.get("CLASS") == b"DIMENSION_SCALE":
                    # add dimension and retrieve size
                    self._dimensions.add(k)
                else:
                    if self._root._phony_dims_mode is not None:
                        # check if malformed variable and raise
                        if _unlabeled_dimension_mix(v) == "unlabeled":
                            # if unscaled variable, get phony dimensions
                            phony_dims |= Counter(v.shape)

                if not _netcdf_dimension_but_not_variable(v):
                    if isinstance(v, self._root._h5py.Dataset):
                        self._variables.add(k)

In this self._root.h5py is in fact, pyfive after my backend substitution. That means we need to have pyfive support for Datatype and check_enum_dtype or I need another strategy.

In pyfive, I think the Datatype corresponds to DatatypeMessage. IF it's that simple, there are a couple of routes to solving this, we modify pyfive to look more like h5py, or we create a new Datatype class which subclasses DatatypeMessage. I suspect I will go that route ...

The H5py datatype is defined here: https://github.com/h5py/h5py/blob/master/h5py/_hl/datatype.py

@bnlawrence
Copy link
Collaborator Author

Yeah, well, I got this wrong:

The relevant loop inside pyfive h5netcdf branch as it currently stands is this one:

dataobjs = DataObjects(self.file._fh, link_target)
if dataobjs.is_dataset:
       if additional_obj != '.':
              raise KeyError('%s is a dataset, not a group' % (obj_name))
       return Dataset(obj_name, DatasetDataObject(self.file._fh, link_target), self)
return Group(obj_name, dataobjs, self)[additional_obj]

Which unfortunately means that when faced with an actual Datatype enum, it reports it as a group. That is definitely a bug.

@bnlawrence
Copy link
Collaborator Author

At this point the logic is that everything that is not a dataset is a group, but we know that's not true, this is a portion of the h5dump of the offending content:

DATATYPE "enum_t" H5T_ENUM {
      H5T_STD_U8LE;
      "stratus"          1;
      "cumulus"          2;
      "nimbus"           3;
      "missing"          255;
   };
DATASET "enum_var" {
      DATATYPE  H5T_ENUM {
         H5T_STD_U8LE;
         "stratus"          1;
         "cumulus"          2;
         "nimbus"           3;
         "missing"          255;
      } ...
";

So we need to do something a wee bit different right at this point, so we can warn the user that we are ignoring the not implemented type rather than pretend it is a group, and then if they try and access enum_var, they get a NotImplementedError.

@bnlawrence
Copy link
Collaborator Author

Ok, so now - 9994598 - at least pyfive can read a file with this in it, but raises a warning when it finds the datatype message, and raises a notimplementederror when one tries to read that particular data variable. That seems to be the right sort of behaviour until such time as we implement enum support (#9, which we may not).

@bnlawrence
Copy link
Collaborator Author

Test support for "sensible things" is here: c12b5b3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant