Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attributes of type NDAttrString are not written as H5T_STRING in HDF5 files #176

Closed
MarkRivers opened this issue May 1, 2016 · 3 comments
Assignees

Comments

@MarkRivers
Copy link
Member

NDFileHDF5 does not treat attributes of type string consistently. Constant attributes are written with datatype H5T_C_S1 and appear as type H5T_STRING. However, NDArray attributes of type NDAttrString are written as type H5T_NATIVE_CHAR, and appear as 2-D arrays of type H5T_STD_I8LE (on my Linux system). This seems to me to be in violation of the HDF5 documentation, which states that H5T_NATIVE_CHAR is a numeric datatype, not intended for storing string data.

This is an example of using h5dump to display the CameraModel attribute. Note that h5dump is displaying integer values. This is not very user-friendly to say the least. Note also that the NDAttrDescription is an H5T_STRING, and is nicely legible.

corvette:~/scratch>h5dump --dataset=/entry/instrument/NDAttributes/CameraModel test_004.h5
HDF5 "test_004.h5" {
DATASET "/entry/instrument/NDAttributes/CameraModel" {
   DATATYPE  H5T_STD_I8LE
   DATASPACE  SIMPLE { ( 1, 256 ) / ( H5S_UNLIMITED, H5S_UNLIMITED ) }
   DATA {
   (0,0): 66, 97, 115, 105, 99, 32, 115, 105, 109, 117, 108, 97, 116, 111,
   (0,14): 114, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
   (0,35): 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
   (0,57): 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
   (0,79): 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
   (0,101): 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
   (0,123): 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
   (0,145): 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
   (0,167): 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
   (0,189): 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
   (0,211): 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
   (0,233): 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
   (0,255): 0
   }
   ATTRIBUTE "NDAttrDescription" {
      DATATYPE  H5T_STRING {
            STRSIZE 12;
            STRPAD H5T_STR_NULLTERM;
            CSET H5T_CSET_ASCII;
            CTYPE H5T_C_S1;
         }
      DATASPACE  SCALAR
      DATA {
      (0): "Camera model"
      }
   } (0,255): 0
   }

I made a trivial 1-line change to the plugin to use type H5T_C_S1 for NDAttributes of type NDAttrString. This results in the following output:

corvette:~/scratch>h5dump --dataset=/entry/instrument/NDAttributes/CameraModel test_010.h5
HDF5 "test_010.h5" {
DATASET "/entry/instrument/NDAttributes/CameraModel" {
   DATATYPE  H5T_STRING {
         STRSIZE 1;
         STRPAD H5T_STR_NULLTERM;
         CSET H5T_CSET_ASCII;
         CTYPE H5T_C_S1;
      }
   DATASPACE  SIMPLE { ( 1, 256 ) / ( H5S_UNLIMITED, H5S_UNLIMITED ) }
   DATA {
   (0,0): "B", "a", "s", "i", "c", " ", "s", "i", "m", "u", "l", "a", "t",
   (0,13): "o", "r", "", "", "", "", "", "", "", "", "", "", "", "", "", "",
   (0,29): "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "",
   (0,45): "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "",
   (0,61): "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "",
   (0,77): "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "",
   (0,93): "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "",
   (0,109): "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "",
   (0,125): "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "",
   (0,141): "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "",
   (0,157): "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "",
   (0,173): "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "",
   (0,189): "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "",
   (0,205): "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "",
   (0,221): "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "",
   (0,237): "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "",
   (0,253): "", "", ""
   }
   ATTRIBUTE "NDAttrDescription" {
      DATATYPE  H5T_STRING {
            STRSIZE 12;
            STRPAD H5T_STR_NULLTERM;
            CSET H5T_CSET_ASCII;
            CTYPE H5T_C_S1;
         }
      DATASPACE  SCALAR
      DATA {
      (0): "Camera model"
      }
   }
...

This is better, since at least the ASCII value of each character is shown. However, it is a 2-D array of 1 character strings, which is of course not what we really want, which is a 1-D array of strings.

I am willing to change the code to do this. However, I have a few questions:

  1. Was there a good reason for using H5T_NATIVE_CHAR in the first place?
  2. Will there be problems trying to use 1-D array of fixed length character strings instead (i.e. performance, or it is actually not possible, etc.)?
  3. Do we need to preserve backwards compatibility, i.e. the default is still to write H5T_NATIVE_CHAR, but with an option to use fixed-length strings instead? My inclination is yes, since DLS and probably other sites have written data analysis codes that depend on the current layout.
  4. If it is an option, then should this option be at compile time, run-time in the constructor, or run-time via an EPICS PV? My inclination would be run-time in the constructor, but I am interested in what others think.

Mark

@MarkRivers
Copy link
Member Author

I have implemented a fix for this issue in branch hdf5_string2.

I added a new parameter, NDFileHDF5_stringAttributeDataType. This has 2 possible values: hdf5::nativeChar and hdf5::CString. These are new enums added to NDFile HDF5Layout.h

The default is hdf5::nativeChar, which uses the current data type of an array of H5T_NATIVE_CHAR[NumArrays, MAX_ATTRIBUTE_STRING_SIZE]

If the parameter is set to hdf5::CString then NDAttributes of type NDAttrString are written as H5T_C_S1[numArrays] with size MAX_ATTRIBUTE_STRING_SIZE.

There is a new bo record in NDFileHDF5.template, $(P)$(R)strAttrDataType that controls this parameter. The enum values are "Char" (0) and "String" (1). So the parameter can now be changed a run-time from the OPI and is in save/restore.

This is the contents of an NDAttribute of type NDAttrString when strAttrDataType=Char (default, old behavior)

corvette:~/scratch>h5dump --dataset=/entry/instrument/NDAttributes/CameraModel test_001.h5
HDF5 "test_001.h5" {
DATASET "/entry/instrument/NDAttributes/CameraModel" {
   DATATYPE  H5T_STD_I8LE
   DATASPACE  SIMPLE { ( 1, 256 ) / ( H5S_UNLIMITED, H5S_UNLIMITED ) }
   DATA {
   (0,0): 66, 97, 115, 105, 99, 32, 115, 105, 109, 117, 108, 97, 116, 111,
   (0,14): 114, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
   (0,35): 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
   (0,57): 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
   (0,79): 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
   (0,101): 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
   (0,123): 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
   (0,145): 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
   (0,167): 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
   (0,189): 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
   (0,211): 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
   (0,233): 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
   (0,255): 0
   }

This is the result when strAttrDataType=String

corvette:~/scratch>h5dump --dataset=/entry/instrument/NDAttributes/CameraModel test_003.h5
HDF5 "test_003.h5" {
DATASET "/entry/instrument/NDAttributes/CameraModel" {
   DATATYPE  H5T_STRING {
         STRSIZE 256;
         STRPAD H5T_STR_NULLTERM;
         CSET H5T_CSET_ASCII;
         CTYPE H5T_C_S1;
      }
   DATASPACE  SIMPLE { ( 1 ) / ( H5S_UNLIMITED ) }
   DATA {
   (0): "Basic simulator"
   }

This is the result for a file containing 10 arrays rather than just 1:

corvette:~/scratch>h5dump --dataset=/entry/instrument/NDAttributes/CameraModel test_004.h5
HDF5 "test_004.h5" {
DATASET "/entry/instrument/NDAttributes/CameraModel" {
   DATATYPE  H5T_STRING {
         STRSIZE 256;
         STRPAD H5T_STR_NULLTERM;
         CSET H5T_CSET_ASCII;
         CTYPE H5T_C_S1;
      }
   DATASPACE  SIMPLE { ( 10 ) / ( H5S_UNLIMITED ) }
   DATA {
   (0): "Basic simulator", "Basic simulator", "Basic simulator",
   (3): "Basic simulator", "Basic simulator", "Basic simulator",
   (6): "Basic simulator", "Basic simulator", "Basic simulator",
   (9): "Basic simulator"
   }
 

It seems like a reasonable fix to me, so I am issuing a pull request for this branch.

Mark

@MarkRivers
Copy link
Member Author

Note that when the NDFileNexus plugin is used, the resulting HDF5 file for this same NDAttribute CameraModel is

corvette:~/scratch>h5dump --dataset=/Collected_Data/rawData/model nexus_test_001.h5
HDF5 "nexus_test_001.h5" {
DATASET "/Collected_Data/rawData/model" {
   DATATYPE  H5T_STRING {
         STRSIZE 16;
         STRPAD H5T_STR_NULLTERM;
         CSET H5T_CSET_ASCII;
         CTYPE H5T_C_S1;
      }
   DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
   DATA {
   (0): "Basic simulator"
   }
}
}

So it has DATATYPE=H5T_STRING, which is what makes sense. There appears to be no way with the existing version of the HDF5 plugin to produce such a Nexus file.

@MarkRivers
Copy link
Member Author

Fixed in #177, merged into master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants