Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reusable run-build workflow #44

Merged
merged 28 commits into from
Jul 20, 2023
Merged

Reusable run-build workflow #44

merged 28 commits into from
Jul 20, 2023

Conversation

joverlee521
Copy link
Contributor

Description of proposed changes

Adds a reusable workflow to run Nextstrain builds.

Currently only supports the docker, conda, and aws-batch runtimes because I didn't want to spend the extra time / add complexity to support ambient & singularity runtimes.

I took a lot of pieces from other workflows and modified them into scripts that can be called from within the workflow. If these look good, we can refactor other workflows to use these shared scripts.

Related issue(s)

Alternative of #39

Testing

  • Checks pass
  • Added tests for docker/conda runtimes using the zika-tutorial repo. (Not sure it's worth adding a aws-batch test)
  • Tested aws-batch runtime in seasonal-flu

Add `write-envdir` script for managing envvars that the Nextstrain CLI
does not automatically forward to builds. This will be used in our
resuable workflow for running Nextstrain builds.

Originally written by @tsibley, copied unmodified from the
ncov-ingest repo.¹

¹ https://github.com/nextstrain/ncov-ingest/blob/2a8f5bb4419998d5e3e13c97ebd03a1780ea47c2/bin/write-envdir
Modifies the "Set environment variables" step from the pathogen-repo-ci
workflow¹ into two separate scripts since we will use the same method to
set environment variables to other reusable workflows.

I decided to create `json-to-envvars` as a separate script that
`yaml-to-envvars` calls because I will be using it to set secrets as
environment variables as well. The `--varnames` option will allow us to
be explicit about which secrets we want to set as environment variables.

¹ https://github.com/nextstrain/.github/blob/cc6f4385a45bd6ed114ab4840416fd90cc46cd1b/.github/workflows/pathogen-repo-ci.yaml#L183-L198
Modifies the "Generate summary" step from the conda-base repo CI
workflow¹ into a separate script that can be used to interpolate any
text template with environment variables.

I will be using this in the reusable run-build workflow to create the
GitHub Action summary for AWS Batch builds. I also see this being useful
for creating Slack messages for automated builds such as the onstart
and onerror notifications that we use in ncov² if we decide to move the
notifications out of the pathogen workflows and into the GitHub Action
workflows.

¹ https://github.com/nextstrain/conda-base/blob/5655133be6864ede0acbf40b6b06222f9f2dc031/.github/workflows/ci.yaml#L107-L112
² https://github.com/nextstrain/ncov/blob/fcac1d16410b1b119a9af70afe7b9325a6e25083/workflow/snakemake_rules/export_for_nextstrain.smk#L499-L519
Add a reusable workflow to run Nextstrain builds so that we can
centralize and standardize how our pathogen repos set up these builds.
Motivated by the latest hiccup with the Snakemake upgrade
that required fixes across multiple pathogen repos.¹

There are some "hacky" steps to get around the limitations of reusable
workflows:

1. The 'ref' input is required to ensure that we checkout the matching
ref within the reusable workflow since it is currently not possible to
detect the called workflow's ref.²
2. Since reusable workflows cannot access environment variables set in
the calling workflow, this uses the yaml-to-envvars workaround with an
`env` input already used in the patheogen-repo-ci workflow.³
3. We use a json-to-envvars workaround to allow for setting secrets
as environment variables for the build runtime dynamically.

¹ https://bedfordlab.slack.com/archives/C01LCTT7JNN/p1681328249848399
² actions/toolkit#1264
³ https://github.com/nextstrain/.github/blob/cc6f4385a45bd6ed114ab4840416fd90cc46cd1b/.github/workflows/pathogen-repo-ci.yaml#L25-L45
Notice this is calling the local run-build workflow with the input
ref pointing at the commit SHA that triggered the workflow so they
should be using the same ref.
@joverlee521 joverlee521 requested a review from a team May 26, 2023 00:20
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a lot of overlap with the pathogen-repo-ci workflow, but I think okay to keep separate for better separation of concerns?

We could go down the nested reusable workflow route, but I haven't really tested that to know if there are any pitfalls.

A more straightforward path is to refactor the pathogen-repo-ci workflow to use the same ref input method and use the same local scripts that I had pulled out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think fine to keep separate, especially for now. If they continue to be highly parallel over time, we can converge them then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Came to review this PR thinking it was unrelated, and now I see that it's an alternative to #39. I think this PR is headed in a better direction, but the discussions happening over in #39 are still very relevant for this approach I think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed with this, and I think worth resolving/addressing all the comments over in #39 as part of the review/iteration on this PR.

@tsibley tsibley self-requested a review May 31, 2023 18:46
@joverlee521 joverlee521 force-pushed the run-build-workflow branch 2 times, most recently from c0f10a0 to a57a1b6 Compare May 31, 2023 23:00
@tsibley
Copy link
Member

tsibley commented Jun 1, 2023

@joverlee521 wrote:

Currently only supports the docker, conda, and aws-batch runtimes because I didn't want to spend the extra time / add complexity to support ambient & singularity runtimes.

Noting that this is just fine. We really only need one local runtime (e.g. Docker) and one remote runtime (e.g. AWS Batch) for the goals of this workflow. The Conda runtime is probably easy enough to support to allow it, but we're best off preferring containerized runtimes (e.g. Docker, Singularity, AWS Batch) for their better isolation/reproducibility/robustness.

Copy link
Member

@tsibley tsibley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this version as a workflow is nicer and a better direction! Thanks for doing it. Lots of little comments, mostly minor and about polish/consistency.

tsibley added a commit to nextstrain/conda-base that referenced this pull request Jun 2, 2023
This provides an `envdir` program to our Conda runtime since our other
runtimes also provide this useful program and we've felt its absence.¹

Since there's no existing Conda packaging of the Python envdir
distribution² we use elsewhere or any other `envdir` program AFAICT³,
the particular program provided is a port I wrote long ago⁴ and still
use daily on my own machines.  It's vendored into this repository and
built into the Conda packages we're already producing here.  This is a
quick and low-overhead solution, and we may want to switch to a
packaging of the Python envdir distribution later (particularly if we
want to use the Python envdir module that distribution also provides).

Resolves <#35>.

¹ <nextstrain/.github#44 (comment)>
² <https://pypi.org/project/envdir/>

³ I checked for both daemontools, which provides the original, and s6,
  which provides another port, as well as a general search for "envdir".

⁴ <https://github.com/tsibley/envdir>
@joverlee521 joverlee521 force-pushed the run-build-workflow branch 3 times, most recently from 34102a6 to 29e0a58 Compare June 2, 2023 23:36
Convert the null values to empty strings so that we do not set the
value "null" as any environment variable values.

I had considered just skipping key/value pairs where the value is null,
but I think I'd want to see that the environment variable is empty in
the GitHub Action logs.

This change was motivated by a failed test seasonal-flu build on AWS
Batch. The workflow passed `NEXTSTRAIN_DOCKER_IMAGE: null` to the
Nextstrain CLI and the AWS Batch job tried to pull a `null` image.
With this change, the Nextstrain CLI is able to see that the environment
variable as a falsy value and the AWS Batch job pulled the default
latest image.
Include a direct URL to the AWS Batch console in our AWS Batch summary
as requested by @jameshadfield¹

Note: in order for the URL in the summary to _not_ be masked, the repo
cannot have the `us-east-1` value as a secret. This means we would need
to audit repos of the `AWS_DEFAULT_REGION` secret and remove them where
possible.

¹ #39 (comment)
Update input descriptions based on feedback from @jameshadfield
in review:

- #39 (comment)
- #39 (comment)
Adds a composite action that can be used to get a reusable workflow's
own context within itself. Modified from @tsibley's prototype:
https://github.com/tsibley/github-actions-testing/blob/7fbf8f0e895c6e80f2722e3a08925a4e27536cc9/.github/workflows/oidc.yaml#L6-L25

This will allow us to use actions and scripts within the `run-build`
workflow at the same calling ref without asking the calling workflow to
provide the `ref` input.
Replace the `ref` input with our new `workflow-context` composite action
to get the repository and sha of the called reusable workflow of
the current run.

This removes the burden of having to provide the ref as an input by
the caller workflow. However, the caller workflow does need to provide
the permissions `id-token: write` in order to use this workflow. This
has been reflected in the workflow-template.
Rename to `bin/interpolate-env` to be clearer on what the script is
doing and require caller to provide text via stdin for better
interoperability.
Because "build" is such an overloaded term, rename the workflow to be
more specific and to parallel the existing `pathogne-repo-ci` workflow.
Includes a default set of paths that are the common output paths
for pathogen builds based on the artifact paths used in pathogen-repo-ci.
@joverlee521 joverlee521 force-pushed the run-build-workflow branch from 29e0a58 to f82eba9 Compare June 6, 2023 18:53
Set the default shell to the same command as the GitHub Action `bash`
keyword¹

Completely spelling out the bash command here so that GitHub can't
change it out from under us and we don't have to refer to the docs to
know the expected behavior.

¹ https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idstepsshell
joverlee521 added a commit that referenced this pull request Jul 10, 2023
Switch the directory organization within the workflow so that the
nextstrain/.github repo is in a subdir and the pathogen repo is now in
the top level working dir. This ensures that we don't have to worry
about prefixing the paths of the artifact paths.

We are cloning the nextstrain/.github repo into a hardcoded excluded
path¹ for AWS Batch jobs so that it does not get uploaded with the job.
I had explored other ways to try to exclude the extra support files
from the AWS Batch jobs, but they required some extra workarounds that
we don't want to deal with due to limitations of GitHub Actions.²

¹ https://github.com/nextstrain/cli/blob/6e36f6d7bdafcb984f62e2a9cc2a9cf9adf0be57/nextstrain/cli/runner/aws_batch/s3.py#L51-L66
² #44 (comment)
joverlee521 added a commit that referenced this pull request Jul 10, 2023
Switch the directory organization within the workflow so that the
nextstrain/.github repo is in a subdir and the pathogen repo is now in
the top level working dir. This ensures that we don't have to worry
about prefixing the paths of the artifact paths.

We are cloning the nextstrain/.github repo into a hardcoded excluded
path¹ for AWS Batch jobs so that it does not get uploaded with the job.
I had explored other ways to try to exclude the extra support files
from the AWS Batch jobs, but they required some extra workarounds that
we don't want to deal with due to limitations of GitHub Actions.²

¹ https://github.com/nextstrain/cli/blob/6e36f6d7bdafcb984f62e2a9cc2a9cf9adf0be57/nextstrain/cli/runner/aws_batch/s3.py#L51-L66
² #44 (comment)
joverlee521 added a commit that referenced this pull request Jul 10, 2023
Switch the directory organization within the workflow so that the
nextstrain/.github repo is in a subdir and the pathogen repo is now in
the top level working dir. This ensures that we don't have to worry
about prefixing the paths of the artifact paths.

We are cloning the nextstrain/.github repo into a hardcoded excluded
path¹ for AWS Batch jobs so that it does not get uploaded with the job.
I had explored other ways to try to exclude the extra support files
from the AWS Batch jobs, but they required some extra workarounds that
we don't want to deal with due to limitations of GitHub Actions.²

¹ https://github.com/nextstrain/cli/blob/6e36f6d7bdafcb984f62e2a9cc2a9cf9adf0be57/nextstrain/cli/runner/aws_batch/s3.py#L51-L66
² #44 (comment)
joverlee521 added a commit that referenced this pull request Jul 10, 2023
Switch the directory organization within the workflow so that the
nextstrain/.github repo is in a subdir and the pathogen repo is now in
the top level working dir. This ensures that we don't have to worry
about prefixing the paths of the artifact paths.

We are cloning the nextstrain/.github repo into a hardcoded excluded
path¹ for AWS Batch jobs so that it does not get uploaded with the job.
I had explored other ways to try to exclude the extra support files
from the AWS Batch jobs, but they required some extra workarounds that
we don't want to deal with due to limitations of GitHub Actions.²

¹ https://github.com/nextstrain/cli/blob/6e36f6d7bdafcb984f62e2a9cc2a9cf9adf0be57/nextstrain/cli/runner/aws_batch/s3.py#L51-L66
² #44 (comment)
Switch the directory organization within the workflow so that the
nextstrain/.github repo is in a subdir and the pathogen repo is now in
the top level working dir. This ensures that we don't have to worry
about prefixing the paths of the artifact paths.

We are cloning the nextstrain/.github repo into a hardcoded excluded
path¹ for AWS Batch jobs so that it does not get uploaded with the job.
I had explored other ways to try to exclude the extra support files
from the AWS Batch jobs, but they required some extra workarounds that
we don't want to deal with due to limitations of GitHub Actions.²

¹ https://github.com/nextstrain/cli/blob/6e36f6d7bdafcb984f62e2a9cc2a9cf9adf0be57/nextstrain/cli/runner/aws_batch/s3.py#L51-L66
² #44 (comment)
Include the default artifact paths in the upload-artifact step so that
callers do not have to copy/paste defaults to add an additional path.

Callers will have to add exclusion patterns for any default paths that
they do not want to include in the uploaded artifact.
@joverlee521 joverlee521 force-pushed the run-build-workflow branch 3 times, most recently from c32bdca to 1097fc9 Compare July 12, 2023 20:45
Copy link
Member

@tsibley tsibley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking really good! A couple small fixups I noticed…

Comment on lines 222 to 227
if [[ "$NEXTSTRAIN_BUILD_COMMAND" == "nextstrain build "* ]]; then
eval "$NEXTSTRAIN_BUILD_COMMAND" | tee build.log
else
echo "::error::The run input must be a `nextstrain build` command"
exit 1
fi
Copy link
Member

@tsibley tsibley Jul 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-blocking

@joverlee521 wrote elsewhere about this bit of code:

Updated in latest push. I would love to know if there's a better way to run the provided command besides using eval.

Instead of eval you could directly interpolate at the workflow level (i.e. pre-execution):

Suggested change
if [[ "$NEXTSTRAIN_BUILD_COMMAND" == "nextstrain build "* ]]; then
eval "$NEXTSTRAIN_BUILD_COMMAND" | tee build.log
else
echo "::error::The run input must be a `nextstrain build` command"
exit 1
fi
${{ inputs.run }} |& tee build.log

The indirection of defining NEXTSTRAIN_BUILD_COMMAND isn't needed for input robustness since we're running it as a bit of shell anyway. I'd also not worry about if the input starts with nextstrain build or not. Maybe emitting a warning is useful? but wouldn't enforce a restriction.

Would also direct stderr to the build log (|&), not just stdout (|).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that makes sense. Updated in force-push.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Err, slight hiccup with direct interpolation of ${{ inputs.run }}. This retains newlines at the end of the command resulting in a syntax error in a test seasonal-flu run. I thought changing the run block style to > in 357a79b would replace the newline with a space, but that did not work.

Going back to eval again in 8440ef6 and this worked as expected for the test seasonal-flu run.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yeah. Using the > block style won't work because GitHub's ${{ … }} interpolation is evaluated after parsing the YAML (an ordering which is quite important). An alternative that I think would work is to setup the redirection before hand rather than as a pipeline, e.g.:

exec &> >(tee build.log)
${{ inputs.run }}

But I think also fine to keep eval.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively alternatively, you could also encase the thing in a subshell:

( ${{ inputs.build.log }} ) | tee build.log

or group command:

{ ${{ inputs.build.log }} ; } | tee build.log

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(But really fine to keep eval.)

Comment on lines +32 to +38
The aws-batch runtime requires the secrets:

- AWS_ACCESS_KEY_ID
- AWS_SECRET_ACCESS_KEY

They must be defined in the repo's Actions secrets and passed to this
workflow with `secrets: inherit`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-blocking

We can loosen this restriction in the future by having this workflow automatically arrange for short-lived session keys using the OIDC token exchange.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, we would still have to require an input from the caller workflow for the role-to-assume, but that can just be a plain string input rather than a secret or environment variable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking we'd have a single role used by all our AWS Batch builds, so the workflow here could assume it without further input.

Instead of recapitulating the `nextstrain build` options as inputs,
use a single run input for callers to provide the full command. This
seems like a simpler transition to be able to copy/paste the full
`nextstrain build` command already used locally to the automated
GH Action.

The expected `nextstrain build` commands are also simplified and will
no longer need to use the `--exec` option since the Nextstrain CLI now
has the `--env` option to pass environment variables to the build runtime.
This allows us to remove the write-envdir script and any references
to envdir.
Allow callers to just pass secrets via `secrets: inherit` so that
we don't have to maintain a list of allowed secrets.

The only required/predefined secrets for setting up the aws-batch
runtime are added to the input description. All other secrets for builds
can have arbitrary names and can be passed to the build runtime
via the `--env` option.
@joverlee521 joverlee521 force-pushed the run-build-workflow branch 2 times, most recently from 357a79b to 8440ef6 Compare July 19, 2023 23:03
Copy link
Member

@tsibley tsibley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

(Remember to drop that testing-only commit before merge.)

Using `eval` to run the command to avoid potential errors
as we don't want limit the block scalars for the run input.

If run input includes a newline at the end, the direct interpolation of
`${{ inputs.run }}` results in a syntax error¹:
```
line 11: syntax error near unexpected token `|&'
```

¹ https://github.com/nextstrain/seasonal-flu/actions/runs/5604334827/jobs/10252159727#step:7:327
@joverlee521 joverlee521 merged commit 261c64f into master Jul 20, 2023
@joverlee521 joverlee521 deleted the run-build-workflow branch July 20, 2023 23:56
joverlee521 added a commit that referenced this pull request Mar 15, 2024
The entire working directory gets zipped and uploaded to AWS Batch when
using the `aws-batch` runtime, which includes an early `build.log` file.
When the build completes and downloads the build outputs, the early
`build.log` overwrites the local `build.log` of the GH Action.
This results in a truncated file that does not include the
AWS_BATCH_JOB_ID and prevents the generation of the AWS Batch summary.

This commit moves the hard-coded `build.log` file to an ignored directory
(the same directory that we are using for the nextstrain/.github repo¹).
This prevents the `build.log` from being uploaded with the build.

Previously tried other paths, but they all caused errors.
- "../build.log" - ".." pattern is not allowed for the `upload-artifact` action²
- "~/build.log" - `hashfiles` requires files to be in the `GITHUB_WORKSPACE`³

¹ #44 (comment)
² actions/upload-artifact#176
³ https://docs.github.com/en/actions/learn-github-actions/expressions#hashfiles
joverlee521 added a commit that referenced this pull request Mar 15, 2024
The entire working directory gets zipped and uploaded to AWS Batch when
using the `aws-batch` runtime, which includes an early `build.log` file.
When the build completes and downloads the build outputs, the early
`build.log` overwrites the local `build.log` of the GH Action.
This results in a truncated file that does not include the
AWS_BATCH_JOB_ID and prevents the generation of the AWS Batch summary.

This commit moves the hard-coded `build.log` file to an ignored directory
(the same directory that we are using for the nextstrain/.github repo¹).
This prevents the `build.log` from being uploaded with the build.

Previously tried other paths, but they all caused errors.
- "../build.log" - ".." pattern is not allowed for the `upload-artifact` action²
- "~/build.log" - `hashfiles` requires files to be in the `GITHUB_WORKSPACE`³

¹ #44 (comment)
² actions/upload-artifact#176
³ https://docs.github.com/en/actions/learn-github-actions/expressions#hashfiles
joverlee521 added a commit that referenced this pull request Mar 15, 2024
The entire working directory gets zipped and uploaded to AWS Batch when
using the `aws-batch` runtime, which includes an early `build.log` file.
When the build completes and downloads the build outputs, the early
`build.log` overwrites the local `build.log` of the GH Action.
This results in a truncated file that does not include the
AWS_BATCH_JOB_ID and prevents the generation of the AWS Batch summary
and the re-attachment to the AWS Batch job in subsequent `wait` jobs.

This commit moves the hard-coded `build.log` file to an ignored directory
(the same directory that we are using for the nextstrain/.github repo¹).
This prevents the `build.log` from being uploaded with the build.

Previously tried other paths, but they all caused errors.
- "../build.log" - ".." pattern is not allowed for the `upload-artifact` action²
- "~/build.log" - `hashfiles` requires files to be in the `GITHUB_WORKSPACE`³

¹ #44 (comment)
² actions/upload-artifact#176
³ https://docs.github.com/en/actions/learn-github-actions/expressions#hashfiles
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

3 participants