Reusable run-build workflow #44

joverlee521 · 2023-05-26T00:20:33Z

Description of proposed changes

Adds a reusable workflow to run Nextstrain builds.

Currently only supports the docker, conda, and aws-batch runtimes because I didn't want to spend the extra time / add complexity to support ambient & singularity runtimes.

I took a lot of pieces from other workflows and modified them into scripts that can be called from within the workflow. If these look good, we can refactor other workflows to use these shared scripts.

Related issue(s)

Alternative of #39

Testing

Checks pass
Added tests for docker/conda runtimes using the zika-tutorial repo. (Not sure it's worth adding a aws-batch test)
Tested aws-batch runtime in seasonal-flu

@tsibley

Add `write-envdir` script for managing envvars that the Nextstrain CLI does not automatically forward to builds. This will be used in our resuable workflow for running Nextstrain builds. Originally written by @tsibley, copied unmodified from the ncov-ingest repo.¹ ¹ https://github.com/nextstrain/ncov-ingest/blob/2a8f5bb4419998d5e3e13c97ebd03a1780ea47c2/bin/write-envdir

Modifies the "Set environment variables" step from the pathogen-repo-ci workflow¹ into two separate scripts since we will use the same method to set environment variables to other reusable workflows. I decided to create `json-to-envvars` as a separate script that `yaml-to-envvars` calls because I will be using it to set secrets as environment variables as well. The `--varnames` option will allow us to be explicit about which secrets we want to set as environment variables. ¹ https://github.com/nextstrain/.github/blob/cc6f4385a45bd6ed114ab4840416fd90cc46cd1b/.github/workflows/pathogen-repo-ci.yaml#L183-L198

Modifies the "Generate summary" step from the conda-base repo CI workflow¹ into a separate script that can be used to interpolate any text template with environment variables. I will be using this in the reusable run-build workflow to create the GitHub Action summary for AWS Batch builds. I also see this being useful for creating Slack messages for automated builds such as the onstart and onerror notifications that we use in ncov² if we decide to move the notifications out of the pathogen workflows and into the GitHub Action workflows. ¹ https://github.com/nextstrain/conda-base/blob/5655133be6864ede0acbf40b6b06222f9f2dc031/.github/workflows/ci.yaml#L107-L112 ² https://github.com/nextstrain/ncov/blob/fcac1d16410b1b119a9af70afe7b9325a6e25083/workflow/snakemake_rules/export_for_nextstrain.smk#L499-L519

Add a reusable workflow to run Nextstrain builds so that we can centralize and standardize how our pathogen repos set up these builds. Motivated by the latest hiccup with the Snakemake upgrade that required fixes across multiple pathogen repos.¹ There are some "hacky" steps to get around the limitations of reusable workflows: 1. The 'ref' input is required to ensure that we checkout the matching ref within the reusable workflow since it is currently not possible to detect the called workflow's ref.² 2. Since reusable workflows cannot access environment variables set in the calling workflow, this uses the yaml-to-envvars workaround with an `env` input already used in the patheogen-repo-ci workflow.³ 3. We use a json-to-envvars workaround to allow for setting secrets as environment variables for the build runtime dynamically. ¹ https://bedfordlab.slack.com/archives/C01LCTT7JNN/p1681328249848399 ² actions/toolkit#1264 ³ https://github.com/nextstrain/.github/blob/cc6f4385a45bd6ed114ab4840416fd90cc46cd1b/.github/workflows/pathogen-repo-ci.yaml#L25-L45

Notice this is calling the local run-build workflow with the input ref pointing at the commit SHA that triggered the workflow so they should be using the same ref.

joverlee521 · 2023-05-26T00:28:39Z

.github/workflows/run-build.yaml

There is a lot of overlap with the pathogen-repo-ci workflow, but I think okay to keep separate for better separation of concerns?

We could go down the nested reusable workflow route, but I haven't really tested that to know if there are any pitfalls.

A more straightforward path is to refactor the pathogen-repo-ci workflow to use the same ref input method and use the same local scripts that I had pulled out.

I think fine to keep separate, especially for now. If they continue to be highly parallel over time, we can converge them then.

jameshadfield · 2023-05-29T04:34:01Z

.github/workflows/run-build.yaml

Came to review this PR thinking it was unrelated, and now I see that it's an alternative to #39. I think this PR is headed in a better direction, but the discussions happening over in #39 are still very relevant for this approach I think.

Agreed with this, and I think worth resolving/addressing all the comments over in #39 as part of the review/iteration on this PR.

tsibley · 2023-06-01T21:30:03Z

@joverlee521 wrote:

Currently only supports the docker, conda, and aws-batch runtimes because I didn't want to spend the extra time / add complexity to support ambient & singularity runtimes.

Noting that this is just fine. We really only need one local runtime (e.g. Docker) and one remote runtime (e.g. AWS Batch) for the goals of this workflow. The Conda runtime is probably easy enough to support to allow it, but we're best off preferring containerized runtimes (e.g. Docker, Singularity, AWS Batch) for their better isolation/reproducibility/robustness.

text-templates/attach-aws-batch.md

tsibley

I think this version as a workflow is nicer and a better direction! Thanks for doing it. Lots of little comments, mostly minor and about polish/consistency.

bin/json-to-envvars

bin/yaml-to-envvars

bin/interpolate-text-template

.github/workflows/run-build.yaml

This provides an `envdir` program to our Conda runtime since our other runtimes also provide this useful program and we've felt its absence.¹ Since there's no existing Conda packaging of the Python envdir distribution² we use elsewhere or any other `envdir` program AFAICT³, the particular program provided is a port I wrote long ago⁴ and still use daily on my own machines. It's vendored into this repository and built into the Conda packages we're already producing here. This is a quick and low-overhead solution, and we may want to switch to a packaging of the Python envdir distribution later (particularly if we want to use the Python envdir module that distribution also provides). Resolves <#35>. ¹ <nextstrain/.github#44 (comment)> ² <https://pypi.org/project/envdir/> ³ I checked for both daemontools, which provides the original, and s6, which provides another port, as well as a general search for "envdir". ⁴ <https://github.com/tsibley/envdir>

Convert the null values to empty strings so that we do not set the value "null" as any environment variable values. I had considered just skipping key/value pairs where the value is null, but I think I'd want to see that the environment variable is empty in the GitHub Action logs. This change was motivated by a failed test seasonal-flu build on AWS Batch. The workflow passed `NEXTSTRAIN_DOCKER_IMAGE: null` to the Nextstrain CLI and the AWS Batch job tried to pull a `null` image. With this change, the Nextstrain CLI is able to see that the environment variable as a falsy value and the AWS Batch job pulled the default latest image.

@jameshadfield

Include a direct URL to the AWS Batch console in our AWS Batch summary as requested by @jameshadfield¹ Note: in order for the URL in the summary to _not_ be masked, the repo cannot have the `us-east-1` value as a secret. This means we would need to audit repos of the `AWS_DEFAULT_REGION` secret and remove them where possible. ¹ #39 (comment)

@jameshadfield

Update input descriptions based on feedback from @jameshadfield in review: - #39 (comment) - #39 (comment)

@tsibley

Adds a composite action that can be used to get a reusable workflow's own context within itself. Modified from @tsibley's prototype: https://github.com/tsibley/github-actions-testing/blob/7fbf8f0e895c6e80f2722e3a08925a4e27536cc9/.github/workflows/oidc.yaml#L6-L25 This will allow us to use actions and scripts within the `run-build` workflow at the same calling ref without asking the calling workflow to provide the `ref` input.

Replace the `ref` input with our new `workflow-context` composite action to get the repository and sha of the called reusable workflow of the current run. This removes the burden of having to provide the ref as an input by the caller workflow. However, the caller workflow does need to provide the permissions `id-token: write` in order to use this workflow. This has been reflected in the workflow-template.

Rename to `bin/interpolate-env` to be clearer on what the script is doing and require caller to provide text via stdin for better interoperability.

Because "build" is such an overloaded term, rename the workflow to be more specific and to parallel the existing `pathogne-repo-ci` workflow.

Includes a default set of paths that are the common output paths for pathogen builds based on the artifact paths used in pathogen-repo-ci.

.github/workflows/pathogen-repo-build.yaml

Set the default shell to the same command as the GitHub Action `bash` keyword¹ Completely spelling out the bash command here so that GitHub can't change it out from under us and we don't have to refer to the docs to know the expected behavior. ¹ https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idstepsshell

Switch the directory organization within the workflow so that the nextstrain/.github repo is in a subdir and the pathogen repo is now in the top level working dir. This ensures that we don't have to worry about prefixing the paths of the artifact paths. We are cloning the nextstrain/.github repo into a hardcoded excluded path¹ for AWS Batch jobs so that it does not get uploaded with the job. I had explored other ways to try to exclude the extra support files from the AWS Batch jobs, but they required some extra workarounds that we don't want to deal with due to limitations of GitHub Actions.² ¹ https://github.com/nextstrain/cli/blob/6e36f6d7bdafcb984f62e2a9cc2a9cf9adf0be57/nextstrain/cli/runner/aws_batch/s3.py#L51-L66 ² #44 (comment)

Include the default artifact paths in the upload-artifact step so that callers do not have to copy/paste defaults to add an additional path. Callers will have to add exclusion patterns for any default paths that they do not want to include in the uploaded artifact.

tsibley

This is looking really good! A couple small fixups I noticed…

tsibley · 2023-07-18T21:22:36Z

.github/workflows/pathogen-repo-build.yaml

+          if [[ "$NEXTSTRAIN_BUILD_COMMAND" == "nextstrain build "* ]]; then
+            eval "$NEXTSTRAIN_BUILD_COMMAND" | tee build.log
+          else
+            echo "::error::The run input must be a `nextstrain build` command"
+            exit 1
+          fi


non-blocking

@joverlee521 wrote elsewhere about this bit of code:

Updated in latest push. I would love to know if there's a better way to run the provided command besides using eval.

Instead of eval you could directly interpolate at the workflow level (i.e. pre-execution):

Suggested change

if [[ "$NEXTSTRAIN_BUILD_COMMAND" == "nextstrain build "* ]]; then

eval "$NEXTSTRAIN_BUILD_COMMAND" | tee build.log

else

echo "::error::The run input must be a `nextstrain build` command"

exit 1

fi

${{ inputs.run }} |& tee build.log

The indirection of defining NEXTSTRAIN_BUILD_COMMAND isn't needed for input robustness since we're running it as a bit of shell anyway. I'd also not worry about if the input starts with nextstrain build or not. Maybe emitting a warning is useful? but wouldn't enforce a restriction.

Would also direct stderr to the build log (|&), not just stdout (|).

Ah, that makes sense. Updated in force-push.

Err, slight hiccup with direct interpolation of ${{ inputs.run }}. This retains newlines at the end of the command resulting in a syntax error in a test seasonal-flu run. I thought changing the run block style to > in 357a79b would replace the newline with a space, but that did not work.

Going back to eval again in 8440ef6 and this worked as expected for the test seasonal-flu run.

Ah, yeah. Using the > block style won't work because GitHub's ${{ … }} interpolation is evaluated after parsing the YAML (an ordering which is quite important). An alternative that I think would work is to setup the redirection before hand rather than as a pipeline, e.g.:

exec &> >(tee build.log) ${{ inputs.run }}

But I think also fine to keep eval.

Alternatively alternatively, you could also encase the thing in a subshell:

( ${{ inputs.build.log }} ) | tee build.log

or group command:

{ ${{ inputs.build.log }} ; } | tee build.log

(But really fine to keep eval.)

README.md

tsibley · 2023-07-18T21:25:25Z

.github/workflows/pathogen-repo-build.yaml

+          The aws-batch runtime requires the secrets:
+
+          - AWS_ACCESS_KEY_ID
+          - AWS_SECRET_ACCESS_KEY
+
+          They must be defined in the repo's Actions secrets and passed to this
+          workflow with `secrets: inherit`.


non-blocking

We can loosen this restriction in the future by having this workflow automatically arrange for short-lived session keys using the OIDC token exchange.

Ah, we would still have to require an input from the caller workflow for the role-to-assume, but that can just be a plain string input rather than a secret or environment variable.

I was thinking we'd have a single role used by all our AWS Batch builds, so the workflow here could assume it without further input.

Instead of recapitulating the `nextstrain build` options as inputs, use a single run input for callers to provide the full command. This seems like a simpler transition to be able to copy/paste the full `nextstrain build` command already used locally to the automated GH Action. The expected `nextstrain build` commands are also simplified and will no longer need to use the `--exec` option since the Nextstrain CLI now has the `--env` option to pass environment variables to the build runtime. This allows us to remove the write-envdir script and any references to envdir.

Allow callers to just pass secrets via `secrets: inherit` so that we don't have to maintain a list of allowed secrets. The only required/predefined secrets for setting up the aws-batch runtime are added to the input description. All other secrets for builds can have arbitrary names and can be passed to the build runtime via the `--env` option.

tsibley

🎉

(Remember to drop that testing-only commit before merge.)

Using `eval` to run the command to avoid potential errors as we don't want limit the block scalars for the run input. If run input includes a newline at the end, the direct interpolation of `${{ inputs.run }}` results in a syntax error¹: ``` line 11: syntax error near unexpected token `|&' ``` ¹ https://github.com/nextstrain/seasonal-flu/actions/runs/5604334827/jobs/10252159727#step:7:327

The entire working directory gets zipped and uploaded to AWS Batch when using the `aws-batch` runtime, which includes an early `build.log` file. When the build completes and downloads the build outputs, the early `build.log` overwrites the local `build.log` of the GH Action. This results in a truncated file that does not include the AWS_BATCH_JOB_ID and prevents the generation of the AWS Batch summary. This commit moves the hard-coded `build.log` file to an ignored directory (the same directory that we are using for the nextstrain/.github repo¹). This prevents the `build.log` from being uploaded with the build. Previously tried other paths, but they all caused errors. - "../build.log" - ".." pattern is not allowed for the `upload-artifact` action² - "~/build.log" - `hashfiles` requires files to be in the `GITHUB_WORKSPACE`³ ¹ #44 (comment) ² actions/upload-artifact#176 ³ https://docs.github.com/en/actions/learn-github-actions/expressions#hashfiles

The entire working directory gets zipped and uploaded to AWS Batch when using the `aws-batch` runtime, which includes an early `build.log` file. When the build completes and downloads the build outputs, the early `build.log` overwrites the local `build.log` of the GH Action. This results in a truncated file that does not include the AWS_BATCH_JOB_ID and prevents the generation of the AWS Batch summary and the re-attachment to the AWS Batch job in subsequent `wait` jobs. This commit moves the hard-coded `build.log` file to an ignored directory (the same directory that we are using for the nextstrain/.github repo¹). This prevents the `build.log` from being uploaded with the build. Previously tried other paths, but they all caused errors. - "../build.log" - ".." pattern is not allowed for the `upload-artifact` action² - "~/build.log" - `hashfiles` requires files to be in the `GITHUB_WORKSPACE`³ ¹ #44 (comment) ² actions/upload-artifact#176 ³ https://docs.github.com/en/actions/learn-github-actions/expressions#hashfiles

joverlee521 added 5 commits May 25, 2023 13:45

CI: test the run-build workflow

fca2568

Notice this is calling the local run-build workflow with the input ref pointing at the commit SHA that triggered the workflow so they should be using the same ref.

joverlee521 requested a review from a team May 26, 2023 00:20

joverlee521 commented May 26, 2023

View reviewed changes

jameshadfield reviewed May 29, 2023

View reviewed changes

joverlee521 mentioned this pull request May 30, 2023

Add actions/run-nextstrain-build-on-aws-batch #39

Closed

2 tasks

tsibley self-requested a review May 31, 2023 18:46

joverlee521 force-pushed the run-build-workflow branch 2 times, most recently from c0f10a0 to a57a1b6 Compare May 31, 2023 23:00

joverlee521 commented Jun 1, 2023

View reviewed changes

text-templates/attach-aws-batch.md Show resolved Hide resolved

tsibley requested changes Jun 2, 2023

View reviewed changes

tsibley mentioned this pull request Jun 2, 2023

envdir program is missing nextstrain/conda-base#35

Closed

tsibley mentioned this pull request Jun 2, 2023

Add envdir program nextstrain/conda-base#36

Merged

2 tasks

joverlee521 force-pushed the run-build-workflow branch 3 times, most recently from 34102a6 to 29e0a58 Compare June 2, 2023 23:36

joverlee521 added 8 commits June 6, 2023 10:24

run-build: update input descriptions

392fd63

Update input descriptions based on feedback from @jameshadfield in review: - #39 (comment) - #39 (comment)

Rename bin/interpoate-text-template and require stdin

c198151

Rename to `bin/interpolate-env` to be clearer on what the script is doing and require caller to provide text via stdin for better interoperability.

Rename run-build to pathogen-repo-build

0d5d985

Because "build" is such an overloaded term, rename the workflow to be more specific and to parallel the existing `pathogne-repo-ci` workflow.

pathogen-repo-build: update to artifact-paths

135e1cd

Includes a default set of paths that are the common output paths for pathogen builds based on the artifact paths used in pathogen-repo-ci.

joverlee521 force-pushed the run-build-workflow branch from 29e0a58 to f82eba9 Compare June 6, 2023 18:53

tsibley reviewed Jun 17, 2023

View reviewed changes

.github/workflows/pathogen-repo-build.yaml Outdated Show resolved Hide resolved

.github/workflows/pathogen-repo-build.yaml Outdated Show resolved Hide resolved

.github/workflows/pathogen-repo-build.yaml Outdated Show resolved Hide resolved

joverlee521 force-pushed the run-build-workflow branch from 9dda436 to ce76a84 Compare June 21, 2023 01:06

joverlee521 mentioned this pull request Jun 28, 2023

Send error logs to Slack nextstrain/ncov-ingest#408

Open

joverlee521 force-pushed the run-build-workflow branch from 7155b1c to 9c96172 Compare July 10, 2023 21:23

joverlee521 force-pushed the run-build-workflow branch from 9c96172 to b9983c0 Compare July 10, 2023 21:35

joverlee521 force-pushed the run-build-workflow branch from b9983c0 to aa7ce26 Compare July 10, 2023 21:36

joverlee521 added 2 commits July 10, 2023 14:59

joverlee521 force-pushed the run-build-workflow branch 3 times, most recently from c32bdca to 1097fc9 Compare July 12, 2023 20:45

tsibley requested changes Jul 18, 2023

View reviewed changes

joverlee521 added 2 commits July 19, 2023 14:12

joverlee521 force-pushed the run-build-workflow branch 2 times, most recently from 357a79b to 8440ef6 Compare July 19, 2023 23:03

tsibley approved these changes Jul 20, 2023

View reviewed changes

joverlee521 force-pushed the run-build-workflow branch from 8440ef6 to 6aed9e2 Compare July 20, 2023 23:55

joverlee521 merged commit 261c64f into master Jul 20, 2023

joverlee521 deleted the run-build-workflow branch July 20, 2023 23:56

joverlee521 mentioned this pull request Sep 6, 2023

pathogen-repo-build: Display AWS Batch summary even when job was attached #50

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reusable run-build workflow #44

Reusable run-build workflow #44

joverlee521 commented May 26, 2023

joverlee521 May 26, 2023

tsibley Jun 1, 2023

jameshadfield May 29, 2023

tsibley Jun 2, 2023

tsibley commented Jun 1, 2023 •

edited

Loading

tsibley left a comment

tsibley left a comment

tsibley Jul 18, 2023 •

edited

Loading

joverlee521 Jul 19, 2023

joverlee521 Jul 19, 2023

tsibley Jul 20, 2023

tsibley Jul 20, 2023

tsibley Jul 20, 2023

tsibley Jul 18, 2023

joverlee521 Jul 20, 2023

tsibley Jul 20, 2023

tsibley left a comment •

edited

Loading

Reusable run-build workflow #44

Reusable run-build workflow #44

Conversation

joverlee521 commented May 26, 2023

Description of proposed changes

Related issue(s)

Testing

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tsibley commented Jun 1, 2023 • edited Loading

tsibley left a comment

Choose a reason for hiding this comment

tsibley left a comment

Choose a reason for hiding this comment

tsibley Jul 18, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tsibley left a comment • edited Loading

Choose a reason for hiding this comment

tsibley commented Jun 1, 2023 •

edited

Loading

tsibley Jul 18, 2023 •

edited

Loading

tsibley left a comment •

edited

Loading