Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update execution path #222

Merged
merged 11 commits into from
May 15, 2023
Merged

Update execution path #222

merged 11 commits into from
May 15, 2023

Conversation

ashanhol
Copy link
Contributor

@ashanhol ashanhol commented May 10, 2023

Fixes #112
This PR further aims to untangle CoA from TES by separating cromwell's execution files from TES's batch files. The majority of this work is done in BatchScheduler.cs's ConvertTesTaskToBatchTaskAsync function.

  • Created a new container in our storage account called tes-internal which is the new location for the upload, download, and batch scripts as well as metrics.txt. New storage path convention is /tes-internal/task.id/[file]
  • On the batch node, truncated the long paths that were previously something like cromwell-executions/assemble_refbased/0fb40b27-dffe-4d3f-b6da-02ea0b82aedd/call-align_to_ref/shard-0/execution/__batch/batch_script to wd/batch_script.
    • This required some finagling with the mounting directories.
  • Updated any tests that were relying on the previous way of doing things.

image

Copy link
Collaborator

@BMurri BMurri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need to run this through our large battery of integration tests (workflows). Matt can help with that.

@ashanhol ashanhol force-pushed the adinas/UpdaeExecutionPath branch from 36c7bd3 to 0e10d54 Compare May 15, 2023 20:46
{
return $"{ExecutionsPathPrefix.TrimStart('/')}";
}
return $"{storageAccountName}/{TesExecutionsPathPrefix.TrimStart('/')}/{task.Id}";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of: /{TesExecutionsPathPrefix.TrimStart('/')}

more performant to do:

{TesExecutionsPathPrefix}

var batchExecutionDirectoryPath = GetBatchExecutionDirectoryPath(task);
var metricsPath = $"/{batchExecutionDirectoryPath}/metrics.txt";
var storageUploadPath = GetStorageUploadPath(task, defaultStorageAccountName);
var metricsPath = $"/{storageUploadPath}/metrics.txt";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"metrics.txt" could be a constant, such as "const string metricFilename", because it's used a lot in the rest of this PR

{
var blobsInExecutionDirectory = (await azureProxy.ListBlobsAsync(executionDirectoryUri)).Where(b => !b.EndsWith($"/{CromwellScriptFileName}"));
var cromwellExecutionDirectory = $"/{cromwellExecutionDirectoryPath.Split("/", StringSplitOptions.RemoveEmptyEntries)[0]}";
additionalInputFiles = blobsInExecutionDirectory.Select(b => $"{cromwellExecutionDirectory}/{b}").Select(b => new TesInput { Content = null, Path = b, Url = b, Name = Path.GetFileName(b), Type = TesFileType.FILEEnum }).ToList();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's canonical in C# to use the full noun for the lambda parameter, instead of b it looks like it's a blobName , which would make those code more clear to read (it was probably me who originally did this :) )

@@ -1013,18 +1009,24 @@ private async Task<CloudTask> ConvertTesTaskToBatchTaskAsync(string taskId, TesT

return $" && echo $(date +%T) && {setVariables} && {downloadSingleFile} && {exitIfDownloadedFileIsNotFound} && {incrementTotalBytesTransferred}";
}))
+ $" && echo FileDownloadSizeInBytes=$total_bytes >> {metricsPath}";
+ $" && echo FileDownloadSizeInBytes=$total_bytes >> metrics.txt";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

metrics.txt could be a const

@@ -1039,25 +1041,31 @@ private async Task<CloudTask> ConvertTesTaskToBatchTaskAsync(string taskId, TesT

return $" && {{ {setVariables} && [ ! -e \"$path\" ] && : || {{ {blobxferCommand} && {incrementTotalBytesTransferred}; }} }}";
}))
+ $" && echo FileUploadSizeInBytes=$total_bytes >> {metricsPath}";
+ $" && echo FileUploadSizeInBytes=$total_bytes >> metrics.txt";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

metrics.txt could be a const


var executorImageIsPublic = containerRegistryProvider.IsImagePublic(executor.Image);
var dockerInDockerImageIsPublic = containerRegistryProvider.IsImagePublic(dockerInDockerImageName);
var blobXferImageIsPublic = containerRegistryProvider.IsImagePublic(blobxferImageName);

var sb = new StringBuilder();

sb.AppendLinuxLine($"write_kv() {{ echo \"$1=$2\" >> $AZ_BATCH_TASK_WORKING_DIR{metricsPath}; }} && \\"); // Function that appends key=value pair to metrics.txt file
sb.AppendLinuxLine($"write_kv() {{ echo \"$1=$2\" >> $AZ_BATCH_TASK_WORKING_DIR/metrics.txt; }} && \\"); // Function that appends key=value pair to metrics.txt file
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

metrics.txt could be a const

sb.AppendLinuxLine($"write_ts UploadEnd && \\");
sb.AppendLinuxLine($"/bin/bash -c 'disk=( `df -k $AZ_BATCH_TASK_WORKING_DIR | tail -1` ) && echo DiskSizeInKiB=${{disk[1]}} >> $AZ_BATCH_TASK_WORKING_DIR{metricsPath} && echo DiskUsedInKiB=${{disk[2]}} >> $AZ_BATCH_TASK_WORKING_DIR{metricsPath}' && \\");
sb.AppendLinuxLine($"/bin/bash -c 'disk=( `df -k $AZ_BATCH_TASK_WORKING_DIR | tail -1` ) && echo DiskSizeInKiB=${{disk[1]}} >> $AZ_BATCH_TASK_WORKING_DIR/metrics.txt && echo DiskUsedInKiB=${{disk[2]}} >> $AZ_BATCH_TASK_WORKING_DIR/metrics.txt' && \\");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

metrics.txt could be a const

@@ -1156,6 +1164,9 @@ private async Task<CloudTask> ConvertTesTaskToBatchTaskAsync(string taskId, TesT

return cloudTask;

string MountBlobxferScriptAndMetrics(string scriptFileName)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

metrics.txt could be a const

@ashanhol ashanhol merged commit 5722a8e into main May 15, 2023
@ashanhol ashanhol deleted the adinas/UpdaeExecutionPath branch May 15, 2023 21:40
ashanhol added a commit to microsoft/CromwellOnAzure that referenced this pull request May 15, 2023
Sister PR to microsoft/ga4gh-tes#222
Issue: microsoft/ga4gh-tes#112

Creates the tes-internal container for convenience to TES.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add config and/or modify implementation of execution paths
3 participants