Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upload artifacts and logs to S3 bucket #114

Merged
merged 4 commits into from
Feb 26, 2025
Merged

Upload artifacts and logs to S3 bucket #114

merged 4 commits into from
Feb 26, 2025

Conversation

marbre
Copy link
Contributor

@marbre marbre commented Feb 25, 2025

  • Uses the configure-aws-credentials action to retrieve short-lived
    credentials using GitHub's OIDC provider
  • Creates index of files using the MIT-licensed indexer.py script
  • Uploads artifacts and logs to the S3 bucket
  • Adds links pointing to the bucket to the job summary

Creating index files and uploading files is to be moved to scripts in a
follow up.

@marbre marbre force-pushed the upload-artifacts branch 6 times, most recently from 694b322 to 87e3a92 Compare February 25, 2025 22:20
* Uses the `configure-aws-credentials` action to retrieve short-lived
  credentials using GitHub's OIDC provider
* Creates index of files using the MIT-licensed indexer.py script
* Uploads artifacts and logs to the S3 bucket
* Adds links pointing to the bucket to the job summary

Creating index files and uploading files is to be moved to scripts in a
follow up.
@marbre marbre changed the title Upload artifacts to S3 bucket Upload artifacts and logs to S3 bucket Feb 25, 2025
@marbre
Copy link
Contributor Author

marbre commented Feb 25, 2025

See for example https://github.com/nod-ai/TheRock/actions/runs/13532379300 for a run limited to build therock-archive-core-runtime.

@marbre marbre marked this pull request as ready for review February 25, 2025 23:24
@marbre marbre marked this pull request as draft February 26, 2025 00:33
@marbre
Copy link
Contributor Author

marbre commented Feb 26, 2025

Artifacts were successfully upload (https://therock-artifacts.s3.us-east-2.amazonaws.com/13532940414/index.html) but seems logs were not. Looking into it.

@marbre
Copy link
Contributor Author

marbre commented Feb 26, 2025

Fixed the issue.

@marbre marbre marked this pull request as ready for review February 26, 2025 13:24
@marbre marbre requested a review from ScottTodd February 26, 2025 13:24
Copy link
Contributor

@stellaraccident stellaraccident left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Left some comments for next steps.

uses: aws-actions/configure-aws-credentials@ececac1a45f3b08a01d2dd070d28d111c5fe6722 # v4.1.0
with:
aws-region: us-east-2
role-to-assume: arn:aws:iam::692859939525:role/therock-artifacts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For later: should we have this in cleartext like this? Not just security, but forks will get it. If a secret, the you could also make the step conditional on the secret existing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a secret and just a role to assume. This role is used to fetch credentials and is configured in AWS IAM to be only accessible from the nod-ai/TheRock repository. Not even another repository within our org is able to assume it. Happy to provide further details.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was more suggesting that storing it in a secret may be a convenient mechanism to make this whole thing fork tolerant: have a key that you can condition any of the AWS steps on only if it is not empty.

The what happens in a fork thing is the main thing I was poking on. What happens if a run fails to get this credential? Whole run fails (vs no uploading happens)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, got your point now. Happy to improve that.

--exclude "*" \
--include "*.log" --include "index.html"

- name: Add Links to Job Summary
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

role-to-assume: arn:aws:iam::692859939525:role/therock-artifacts

# TODO: Move to script
- name: Create Index Files
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we move this into the build, we're going to want another was to collect the index. Probably by having specially named marker files for each entry in a logs subdir (because each action that writes one will need to be able to add a file without stepping on anyone else).

Alternatively, we could have every one "stream_artifacts" dir, and if files go in there, they get uploaded. If the script you write was able to run in the background, you could just start it at the beginning of a run and have it loop/run to the end conditioned on something like:

while not done():

Where done returns true if there isn't a .done file. And then it would upload each file that shows up in that directory. At the end of a run, you would write the .done file, then wait, and run it one more time to get any last bits that weren't found in the background. You'd need to manage a couple more marker files to let it run multiple times without reuploading.

If you did that, would be better to advertise the location link at the beginning of the job and write an updated index page on each iteration.

@marbre
Copy link
Contributor Author

marbre commented Feb 26, 2025

Thanks. Left some comments for next steps.

Thanks for the pointers, I will gladly take them into account!

@marbre marbre merged commit 3d579ae into main Feb 26, 2025
2 checks passed
@marbre marbre deleted the upload-artifacts branch February 26, 2025 14:21
# TODO: Move to script
- name: Upload Artifacts
run: |
aws s3 cp build/artifacts/ s3://therock-artifacts/${{github.run_id}}/ \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you'll probably want to add a "linux" prefix to this path once we start uploading windows artifacts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! While the github.run_id is unique for each workflow run within a repository we right now have a single workflow defined in the ci.yml which then triggers for the different systems. Would you prefer a prefix over a suffix? Or maybe even a separate S3 bucket for Windows artifacts?

Copy link
Contributor

@amd-chrissosa amd-chrissosa Feb 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not opinionated on prefix vs suffix. I suggest we only do different s3 buckets for different data retention policies (for example release builds vs ci builds etc). I expect we will be adding additional suffixes for variants and such over time.

@amd-chrissosa
Copy link
Contributor

Left a quick comment. Thanks for getting this in - makes sense!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants