Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading hdf5 Dataset with two equalsized unlimited dimensions results in only one dimension #945

Closed
kmuehlbauer opened this issue Jul 5, 2019 · 10 comments

Comments

@kmuehlbauer
Copy link

I'm reading an hdf5 file like this:

import netCDF4 as nc
filename = 'test.h5'
ds = nc.Dataset(filename, diskless=True, persist=False)
ds['scan0']['moment_0]

Result:

<class 'netCDF4._netCDF4.Variable'>
uint8 moment_0(phony_dim_0, phony_dim_0)
    moment: Zh
    format: UV8
    dyn_range_max: 95.5
    dyn_range_min: -32.0
    is_dft: 0
    unit: dBZ
path = /scan0
unlimited dimensions: phony_dim_0, phony_dim_0
current shape = (360, 360)
filling off

h5dump -H

HDF5 "test.h5" {
GROUP "/" {
   GROUP "scan0" {
      DATASET "moment_0" {
         DATATYPE  H5T_STD_U8LE
         DATASPACE  SIMPLE { ( 360, 360 ) / ( H5S_UNLIMITED, H5S_UNLIMITED ) }
         ATTRIBUTE "dyn_range_max" {
            DATATYPE  H5T_IEEE_F32LE
            DATASPACE  SCALAR
         }
         ATTRIBUTE "dyn_range_min" {
            DATATYPE  H5T_IEEE_F32LE
            DATASPACE  SCALAR
         }
         ATTRIBUTE "format" {
            DATATYPE  H5T_STRING {
               STRSIZE H5T_VARIABLE;
               STRPAD H5T_STR_NULLTERM;
               CSET H5T_CSET_ASCII;
               CTYPE H5T_C_S1;
            }
            DATASPACE  SCALAR
         }
         ATTRIBUTE "is_dft" {
            DATATYPE  H5T_STD_U8LE
            DATASPACE  SCALAR
         }
         ATTRIBUTE "moment" {
            DATATYPE  H5T_STRING {
               STRSIZE H5T_VARIABLE;
               STRPAD H5T_STR_NULLTERM;
               CSET H5T_CSET_ASCII;
               CTYPE H5T_C_S1;
            }
            DATASPACE  SCALAR
         }
         ATTRIBUTE "unit" {
            DATATYPE  H5T_STRING {
               STRSIZE H5T_VARIABLE;
               STRPAD H5T_STR_NULLTERM;
               CSET H5T_CSET_ASCII;
               CTYPE H5T_C_S1;
            }
            DATASPACE  SCALAR
         }
      }
   }
}
}

Is there any possibility to yield two separate dimensions? I did not find anything related through internet search. Test-file is attached.
test.zip

@jswhit
Copy link
Collaborator

jswhit commented Jul 5, 2019

I see

[jeff-whitakers-imac-9:~/Downloads] jsw% ncdump -h test.h5
netcdf test {

group: scan0 {
  dimensions:
  	phony_dim_0 = UNLIMITED ; // (360 currently)
  variables:
  	ubyte moment_0(phony_dim_0, phony_dim_0) ;
  		string moment_0:moment = "Zh" ;
  		string moment_0:format = "UV8" ;
  		moment_0:dyn_range_max = 95.5f ;
  		moment_0:dyn_range_min = -32.f ;
  		moment_0:is_dft = 0UB ;
  		string moment_0:unit = "dBZ" ;
  } // group scan0
}

and the netcdf4-python output is consistent with that. The fact that the variable has the same dimension associated with it twice can't be changed without re-creating the file.

@kmuehlbauer
Copy link
Author

@jswhit Thanks for looking into this. Do you happen to know how should the hdf5 file be created, that netcdf is able to detect two different dimensions?

@jswhit
Copy link
Collaborator

jswhit commented Jul 5, 2019

I can tell you how to create it with netcdf (netcdf4-python), but not with hdf5 (h5py). How are you creating the file now?

@kmuehlbauer
Copy link
Author

@jswhit With netcdf4-python I know it too 😀
Currently the file is created from a C/C++ application via hdf5. Not much I can do about, unfortunately.

IIUC when reading with netcdf the dimension mapping is done via netcdf-c library. If the array is 360x361 then I would get two dimensions (phony_dim0, phony_dim1). So there must be some logic to detect this.

Maybe that by introducing some switch for loading (eg. squeeze_dims=False) the current behaviour can be overridden?

@jswhit
Copy link
Collaborator

jswhit commented Jul 5, 2019

There's only one dimension in the h5 file - so the netcdf library doesn't have much choice in this case. I think the dimensions are associated with variables in hdf5 using the "dimension scales" API (http://docs.h5py.org/en/stable/high/dims.html).

@kmuehlbauer
Copy link
Author

@jswhit Yes, good chance that the problem is at creation time. Using h5py to retrieve the dimensions I get two different objects out. I'll need to investigate a bit more to track this down. Thanks for the pointer!

I'll close the issue for now. Would you be happy if I reopen when I have more information?

@jswhit
Copy link
Collaborator

jswhit commented Jul 5, 2019

Sure - but you may need to open it under the netcdf-c project if ncdump is not showing the extra dimension.

@jswhit
Copy link
Collaborator

jswhit commented Jul 5, 2019

I bet that the C code is not creating any dimensions for the variables, so the netcdf lib is having to guess (or create it's own 'phony' dimensions based upon the shape of the variable).

@kmuehlbauer
Copy link
Author

@jswhit OK, I'll try at netcdf-c next time. Thanks for the help so far.

kmuehlbauer added a commit to kmuehlbauer/wradlib that referenced this issue Jul 29, 2019
…unking, georeferencing), introduce two classes for holding open netcdf-filehandles (also for properly closing), only hold sweep-data in Dataset-dict, workaround Unidata/netcdf4-python#945, properly load multiple OdimH5 files into one volume (DWD one sweep one moment files), several simplifications
kmuehlbauer added a commit to kmuehlbauer/wradlib that referenced this issue Jul 29, 2019
…unking, georeferencing), introduce two classes for holding open netcdf-filehandles (also for properly closing), only hold sweep-data in Dataset-dict, workaround Unidata/netcdf4-python#945, properly load multiple OdimH5 files into one volume (DWD one sweep one moment files), several simplifications
kmuehlbauer added a commit to kmuehlbauer/wradlib that referenced this issue Jul 29, 2019
…hunking, georeferencing), introduce two classes for holding open netcdf-filehandles (also for properly closing), only hold sweep-data in Dataset-dict, workaround Unidata/netcdf4-python#945, properly load multiple OdimH5 files into one volume (DWD one sweep one moment files), several simplifications
kmuehlbauer added a commit to wradlib/wradlib that referenced this issue Jul 29, 2019
…hunking, georeferencing), introduce two classes for holding open netcdf-filehandles (also for properly closing), only hold sweep-data in Dataset-dict, workaround Unidata/netcdf4-python#945, properly load multiple OdimH5 files into one volume (DWD one sweep one moment files), several simplifications (#367)
@kmuehlbauer
Copy link
Author

@jswhit, FYI, I created an issue outlining the problem at netcdf-c Unidata/netcdf-c#1484

kmuehlbauer added a commit to kmuehlbauer/wradlib that referenced this issue Mar 31, 2022
…hunking, georeferencing), introduce two classes for holding open netcdf-filehandles (also for properly closing), only hold sweep-data in Dataset-dict, workaround Unidata/netcdf4-python#945, properly load multiple OdimH5 files into one volume (DWD one sweep one moment files), several simplifications (wradlib#367)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants