Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[245] Wildcard support and refactoring of runner task outputs and inputs expansion logic #261

Merged
merged 12 commits into from
Jul 19, 2023

Conversation

giventocode
Copy link
Contributor

@giventocode giventocode commented Jun 14, 2023

Closes #245
Addresses #8

This PR includes:

  • A new class: FileOperationResolver that performs the resolution process for inputs and outputs of the runner task.
    • The resolution process consists of:
      • Expanding, directories or paths with wildcards (if a path_prefix is provided).
        • Wildcard support is limited to the functionality of the standard .NET library (Directory.GetFiles()).
      • Apply the SAS resolution strategy for each expanded item.
  • Additional changes:
    • Renamed, FullFileName property to Path in the runner task model (NodeTask), to better align with the TES spec and its semantics.
    • Added the property PathPrefix to the TestOut model in the Web API - 1.1 prerequisite.

@BMurri
Copy link
Collaborator

BMurri commented Jul 1, 2023

If wildcards are not provided, pathprefix should be ignored. If output is a directory and there are no wildcards, the directory's path should be used as the pathprefix. The question is whether to figure this out in the node agent or in TES itself.

Either way, this code doesn't work as intended.

@giventocode
Copy link
Contributor Author

giventocode commented Jul 2, 2023

If wildcards are not provided, pathprefix should be ignored.

I don't see why this is necessary. The implementation assumes that if a pathprefix is set, then pattern search is intended. A pattern without wildcards is a valid scenario as the underlying API (GetFiles) supports it - returns a matching file. This results in a clear and simpler rule: "if you provide a path prefix the path property is a search pattern". Looking for wildcards in the path, and selectively ignoring the pathprefix, seems confusing to me and could obfuscate an incorrect client behavior.

If output is a directory and there are no wildcards, the directory's path should be used as the pathprefix. The question is whether to figure this out in the node agent or in TES itself.

If the output is a directory, then it returns all files in the directory and the pathprefix is ignored, I am open to revisit this, however, I am failing to understand why this is required as well.

@BMurri
Copy link
Collaborator

BMurri commented Jul 5, 2023

If the output is a directory, then it returns all files in the directory and the pathprefix is ignored, I am open to revisit this, however, I am failing to understand why this is required as well.

This is required because 4 days ago the entire VM path of each file was appended to the storage directory (instead of the relative path) in the URL, resulting in file paths in the cromwell-executions storage container like the following: Mutect2/e37d88dc-3c69-4131-a4dc-22cdedca8ead/call-SplitIntervals/execution/glob-0fc990c5ca95eebc97c4c204e3e303e1/mnt/batch/tasks/workitems/TES-TSCF7VLI-D4ads_v5-wnvch6bw475so4yjp3hrea56pqnqytwp-oylhbm2k/job-1/e37d88dc_de398853f4b7407bb6335b254c58c6b4-1/wd/wd/cromwell-executions/Mutect2/e37d88dc-3c69-4131-a4dc-22cdedca8ead/call-SplitIntervals/execution/glob-0fc990c5ca95eebc97c4c204e3e303e1/filename which resulted in Cromwell failing the workflow because it was expecting that file's path in the storage container to be Mutect2/e37d88dc-3c69-4131-a4dc-22cdedca8ead/call-SplitIntervals/execution/glob-0fc990c5ca95eebc97c4c204e3e303e1/filename.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[#148 Implement TES Task Runner] Implement upload directory handling via an OperationsResolver in the library
2 participants