-
Notifications
You must be signed in to change notification settings - Fork 698
Move sharktank and regression-test MI300 jobs to the ossci cluster. #20359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move sharktank and regression-test MI300 jobs to the ossci cluster. #20359
Conversation
…g merge conflicts on unchanged files Signed-off-by: Elias Joseph <eljoseph@amd.com>
Signed-off-by: Elias Joseph <eljoseph@amd.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, workflows are getting cache hits and the right sets of tests are running. I do want to remove the copy/paste logic from the workflow so it will be easier to maintain going forward, then this should be ready to merge.
case "${{ matrix.name }}" in | ||
"amdgpu_rocm_mi300_gfx942") IREE_TEST_FILES="/shark-cache/data/iree-regression-cache" ;; | ||
*) echo "No cache directory assigned for ${{ matrix.name }}" ;; | ||
esac | ||
if [[ -n "$IREE_TEST_FILES" ]]; then | ||
export IREE_TEST_FILES | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than copy/paste this into three steps, add a step that writes to GITHUB_ENV
- https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/store-information-in-variables#passing-values-between-steps-and-jobs-in-a-workflow
- https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/workflow-commands-for-github-actions#setting-an-environment-variable
I saw some commits on your other PR trying variations on that. In general, prefer to debug on smaller workflows (possibly on a fork) instead of triggering the full production CI. Each run has material costs associated with it.
Signed-off-by: Elias Joseph <eljoseph@amd.com>
case "${{ matrix.name }}" in | ||
"amdgpu_rocm_mi300_gfx942") IREE_TEST_FILES="/shark-cache/data/iree-regression-cache" >> $GITHUB_ENV;; | ||
*) echo "No cache directory assigned for ${{ matrix.name }}" ;; | ||
esac |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ScottTodd This doesn't work unless I conditionally set IREE_TEST_FILES in the specific run clause
https://github.com/iree-org/iree/actions/runs/14045768606/job/39326526495#step:7:207
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to write to the file, e.g. using echo >>
. Test in a smaller workflow in a test repository / fork if you need to.
Signed-off-by: Elias Joseph <eljoseph@amd.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good now!
Progress on nod-ai/shark-ai#793.
This moves the sharktank and regression-test workflow jobs using MI300 runners to the ossci cluster.
ci-exactly: build_packages,regression_test,test_sharktank