-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update AIORI HDF5 for better compatibility with VOL connectors #441
Conversation
Fix IOR to work with native and non-native VOL connectors Add support for HDF5_Access with H5Fis_accessible if available Clean up HDF5 backend and move MPI_Init call Add support for H5Fdelete
Retrieve HDF5 dataset's file dataspace upon dataset setup rather than dataset data transfer
src/ior.c
Outdated
@@ -196,14 +196,14 @@ int ior_main(int argc, char **argv) | |||
out_logfile = stdout; | |||
out_resultfile = stdout; | |||
|
|||
/* start the MPI code */ | |||
MPI_CHECK(MPI_Init(&argc, &argv), "cannot initialize MPI"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With parallel-enabled HDF5 VOL connectors, we've usually found that some command-line options may initialize HDF5 before MPI has been initialized and cause the connectors to believe they're running in serial mode. I see the comment below about check -h option from commandline without starting MPI;
, but it seems like it would be ok to start MPI first, even if all the user is doing is printing out the help summary. Looking for any feedback on whether this is ok or if there might be a better approach for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some MPI implementations that fail if one doesn't run them inside a submitted batch job.
This is a nuisance for the users as they have to run a parallel job with 1 proc to then print out the help.
Isn't there a proper way inside the HDF5 VOL or plugin to fix this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see; thanks for the info! I'll restore the original behavior and revisit.
Isn't there a proper way inside the HDF5 VOL or plugin to fix this?
Unfortunately there's a bit of an ordering problem right now where we assume that MPI has been initialized before HDF5. Some VOL connectors may attempt to bring up MPI if it isn't brought up when they initialize, but that would only break IOR here since it would fail when it tries to call MPI_Init. This ordering issue was initially a problem with the DAOS VOL connector, but I believe that occurred due to the "get version" callback being called and initializing HDF5 and the connector before MPI was initialized. I don't know if that particular situation is still an issue; I would need to investigate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm actually not quite sure how it happens that they try to initialize themself. Is that triggered by H5get_libversion()? Which call does it?
The best strategy would be to initialize HDF5 in the plugin by implementing the AIORI initialize function in the
ior_aiori_t hdf5_aiori struct.
A prospect strategy could be:
initialize options (doesn't require bringing up any VOL connector and such).
initialize plugin => initialize anything necessary...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I believe the H5get_libversion call was the routine bringing everything up the last time I tried this. I don't know if that's still the case since it's been a while, but I brought that change over because I know it was needed previously. I can certainly try implementing the initialization callback to just call H5open and then see if there are any other paths that may prematurely bring up the library and address those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The patch looks alright except for the MPI Init situation.
Hi @JulianKunkel, if the PR looks okay in its current form can it be merged? I believe at this point the MPI_Init change may no longer be needed and if I still have issues related to HDF5 initialization (I haven't had much time to test that yet) I can set up a separate PR for discussion. |
Jep, great, thanks for the reminder, I didn't see the update in the PR before. |
No description provided.