Skip to content

Move sharktank and regression-test MI300 jobs to the ossci cluster. #20359

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Mar 24, 2025

Conversation

Eliasj42
Copy link
Contributor

@Eliasj42 Eliasj42 commented Mar 24, 2025

Progress on nod-ai/shark-ai#793.

This moves the sharktank and regression-test workflow jobs using MI300 runners to the ossci cluster.

ci-exactly: build_packages,regression_test,test_sharktank

Elias Joseph added 2 commits March 24, 2025 12:19
…g merge conflicts on unchanged files

Signed-off-by: Elias Joseph <eljoseph@amd.com>
Signed-off-by: Elias Joseph <eljoseph@amd.com>
@Eliasj42 Eliasj42 marked this pull request as ready for review March 24, 2025 17:59
@Eliasj42 Eliasj42 requested a review from ScottTodd as a code owner March 24, 2025 17:59
@Eliasj42 Eliasj42 requested a review from yamiyysu March 24, 2025 19:39
Copy link
Member

@ScottTodd ScottTodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, workflows are getting cache hits and the right sets of tests are running. I do want to remove the copy/paste logic from the workflow so it will be easier to maintain going forward, then this should be ready to merge.

Comment on lines 98 to 104
case "${{ matrix.name }}" in
"amdgpu_rocm_mi300_gfx942") IREE_TEST_FILES="/shark-cache/data/iree-regression-cache" ;;
*) echo "No cache directory assigned for ${{ matrix.name }}" ;;
esac
if [[ -n "$IREE_TEST_FILES" ]]; then
export IREE_TEST_FILES
fi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than copy/paste this into three steps, add a step that writes to GITHUB_ENV

I saw some commits on your other PR trying variations on that. In general, prefer to debug on smaller workflows (possibly on a fork) instead of triggering the full production CI. Each run has material costs associated with it.

@ScottTodd ScottTodd added the infrastructure Relating to build systems, CI, or testing label Mar 24, 2025
@ScottTodd ScottTodd changed the title moved sharktank-ci and regression-test-ci to the ossci cluster Move sharktank and regression-test MI300 jobs to the ossci cluster. Mar 24, 2025
Signed-off-by: Elias Joseph <eljoseph@amd.com>
case "${{ matrix.name }}" in
"amdgpu_rocm_mi300_gfx942") IREE_TEST_FILES="/shark-cache/data/iree-regression-cache" >> $GITHUB_ENV;;
*) echo "No cache directory assigned for ${{ matrix.name }}" ;;
esac
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ScottTodd This doesn't work unless I conditionally set IREE_TEST_FILES in the specific run clause
https://github.com/iree-org/iree/actions/runs/14045768606/job/39326526495#step:7:207

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to write to the file, e.g. using echo >>. Test in a smaller workflow in a test repository / fork if you need to.

@Eliasj42 Eliasj42 requested a review from ScottTodd March 24, 2025 21:20
Signed-off-by: Elias Joseph <eljoseph@amd.com>
@ScottTodd
Copy link
Member

This syntax picked up no jobs:

ci-exactly: PkgCI

image

I believe you'll want this instead:

ci-exactly: build_packages,regression_test,test_sharktank

Copy link
Member

@ScottTodd ScottTodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now!

@Eliasj42 Eliasj42 merged commit 31863ab into main Mar 24, 2025
123 of 145 checks passed
@Eliasj42 Eliasj42 deleted the users/eliasj42/new-move-pkgci-to-ossci-cluster branch March 24, 2025 22:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
infrastructure Relating to build systems, CI, or testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants