Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT: deprecate --file-list and --file-list-detailed options in darshan-parser #782

Closed
carns opened this issue Jul 26, 2022 · 3 comments · Fixed by #783
Closed

MAINT: deprecate --file-list and --file-list-detailed options in darshan-parser #782

carns opened this issue Jul 26, 2022 · 3 comments · Fixed by #783
Assignees

Comments

@carns
Copy link
Contributor

carns commented Jul 26, 2022

These two options are not frequently used; they produce a text formatted table with a row for each unique file and columns for some select metrics of interest. The maintenance problem is that those options significantly increase the complexity of the darshan-parser utility, particularly in terms of what capabilities could be refactored in #677 .

In the long run it would be better to handle this as described in #781 ; you would first convert a log to collapse partially shared files and then use Python scripts or other conventional analysis tools on the resulting log.

@pramodk
Copy link

pramodk commented Oct 17, 2023

Hello @carns, @shanedsnyder !

I have a question about this: with my old notes, my workflow for analysing our simulations was as follows:

image

@1uc pointed out this issue as he found out these options are now removed. I am wondering what is the equivalent way to achieve the same in newer releases. (I must admit I haven't tested/played much with a newer version of darshan-util or pydarshan).

Thank you very much in advance!

@shanedsnyder
Copy link
Contributor

Hi @pramodk, thanks for reaching out!

For some brief background, we've been working on moving our analysis code to Python via the PyDarshan package you mention, including the job summary tool. As part of that process, we refactored some code that previously lived exclusively in darshan-parser to make it more generally usable in PyDarshan. We found that this new interface was getting too complex trying to support too many use cases, particularly the --file-list command, so we ultimately opted to simplify darshan-parser and no longer support it.

That said, we'd like to make this a simple process using PyDarshan going forward, though I don't think we have that capability quite yet. We have an open PR (#954) for a PyDarshan tool that I think could effectively replace darshan-parser --file-list for providing details about the most I/O intensive files, for instance. I'll try my best to make forward progress on getting that merged and to do a new release that should provide what you need. Now that this issue is linked, I can make sure to keep you posted on progress.

We could probably also further simplify aspects of the workflow you mention. For instance, we could probably modify our PyDarshan job summary tool to generate summaries for given file record IDs (or even file names), so you don't have to bother with darshan-convert.

FYI, here's links to our PyDarshan docs if you're interested in trying out the new tool: https://www.mcs.anl.gov/research/projects/darshan/docs/pydarshan/index.html

@1uc
Copy link

1uc commented Oct 19, 2023

Thank you for the explanation. Someone else has already tried PyDarshan out and is very positive about it.

In case someone else comes looking for how to convert the filename/path to the hash used by darshan one manual way would be to run something like:

darshan-parser *.darshan | grep XYZ.h5 | head -n 1000 | less 

the hash is one of the first large numbers on each line.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants