MAINT: deprecate --file-list and --file-list-detailed options in darshan-parser #782

carns · 2022-07-26T19:15:04Z

These two options are not frequently used; they produce a text formatted table with a row for each unique file and columns for some select metrics of interest. The maintenance problem is that those options significantly increase the complexity of the darshan-parser utility, particularly in terms of what capabilities could be refactored in #677 .

In the long run it would be better to handle this as described in #781 ; you would first convert a log to collapse partially shared files and then use Python scripts or other conventional analysis tools on the resulting log.

The text was updated successfully, but these errors were encountered:

pramodk · 2023-10-17T13:58:55Z

Hello @carns, @shanedsnyder !

I have a question about this: with my old notes, my workflow for analysing our simulations was as follows:

@1uc pointed out this issue as he found out these options are now removed. I am wondering what is the equivalent way to achieve the same in newer releases. (I must admit I haven't tested/played much with a newer version of darshan-util or pydarshan).

Thank you very much in advance!

shanedsnyder · 2023-10-18T21:02:54Z

Hi @pramodk, thanks for reaching out!

For some brief background, we've been working on moving our analysis code to Python via the PyDarshan package you mention, including the job summary tool. As part of that process, we refactored some code that previously lived exclusively in darshan-parser to make it more generally usable in PyDarshan. We found that this new interface was getting too complex trying to support too many use cases, particularly the --file-list command, so we ultimately opted to simplify darshan-parser and no longer support it.

That said, we'd like to make this a simple process using PyDarshan going forward, though I don't think we have that capability quite yet. We have an open PR (#954) for a PyDarshan tool that I think could effectively replace darshan-parser --file-list for providing details about the most I/O intensive files, for instance. I'll try my best to make forward progress on getting that merged and to do a new release that should provide what you need. Now that this issue is linked, I can make sure to keep you posted on progress.

We could probably also further simplify aspects of the workflow you mention. For instance, we could probably modify our PyDarshan job summary tool to generate summaries for given file record IDs (or even file names), so you don't have to bother with darshan-convert.

FYI, here's links to our PyDarshan docs if you're interested in trying out the new tool: https://www.mcs.anl.gov/research/projects/darshan/docs/pydarshan/index.html

1uc · 2023-10-19T12:09:19Z

Thank you for the explanation. Someone else has already tried PyDarshan out and is very positive about it.

In case someone else comes looking for how to convert the filename/path to the hash used by darshan one manual way would be to run something like:

darshan-parser *.darshan | grep XYZ.h5 | head -n 1000 | less

the hash is one of the first large numbers on each line.

carns added the maintenance label Jul 26, 2022

carns self-assigned this Jul 26, 2022

carns mentioned this issue Jul 26, 2022

Remove --file-list and --file-list-detailed options from darshan-parser #783

Merged

shanedsnyder closed this as completed in #783 Aug 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MAINT: deprecate --file-list and --file-list-detailed options in darshan-parser #782

MAINT: deprecate --file-list and --file-list-detailed options in darshan-parser #782

carns commented Jul 26, 2022

pramodk commented Oct 17, 2023

shanedsnyder commented Oct 18, 2023

1uc commented Oct 19, 2023

MAINT: deprecate --file-list and --file-list-detailed options in darshan-parser #782

MAINT: deprecate --file-list and --file-list-detailed options in darshan-parser #782

Comments

carns commented Jul 26, 2022

pramodk commented Oct 17, 2023

shanedsnyder commented Oct 18, 2023

1uc commented Oct 19, 2023