-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an initial test_linux_packages sub-workflow to our CI #123
base: main
Are you sure you want to change the base?
Conversation
About ready - fixing a couple last things possibly with the rebase but ready for inspection! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mysteriously the pipeline failed even though I was able to download the file that is logged to be non-existing. I re-triggered the pipeline step to see where it takes us.
Also left some comments. I think we should use lower privileges for download and maybe even for unpacking.
# MOSTLY BOILER PLATE ABOVE. | ||
steps: | ||
- name: Install the AWS tool | ||
run: sudo apt-get install -y awscli |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could use https://raw.githubusercontent.com/nod-ai/TheRock/cc71196999a431e1c31d706771675c3823c96aa1/dockerfiles/install_awscli.sh which implements the what is described in https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html and probably gets us a way more recent version than what apt installs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a TODO for now and will evaluate on the side.
.github/workflows/ci.yml
Outdated
permissions: | ||
id-token: write |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for write permissions if the Configure AWS Credentials
step is dropped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will drop for now thought we'll probably want to upload test logs eventually here.
echo "Unpacking artifacts" | ||
pushd "${BUILD_ARTIFACTS_DIR}" | ||
for a in *.tar.xz; do sudo tar -xvpf $(readlink -f .)/"$a" -C / --keep-directory-symlink; done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for a in *.tar.xz; do sudo tar -xvpf $(readlink -f .)/"$a" -C / --keep-directory-symlink; done | |
for a in *.tar.xz; do tar -xvpf $(readlink -f .)/"$a" -C /tmp --keep-directory-symlink; done | |
I think we can unpack wherever we want, for example in /tmp
(or in any other directory the user has write permissions to). Thus we don't need to unpack to /lib
and /bin
which allows to drop root permissions when running tar
.
This makes it necessary to also call the executable in the next step but that should be the only downside.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming I'd also have to set the library path etc right? I'd argue I'd want to keep it this way to mimic the intended install destination rather than doing something bespoke. I agree it could work without it but you're getting further from the intended install destination if you do it that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have the libs in e.g. /tmp/${TESTDIR}/lib/
and /tmp/${TESTDIR}/lib/rocm_sysdeps/lib
then /tmp/${TESTDIR}/bin/rocminfo
finds everything it needs in the right place without further ado:
linux-vdso.so.1 (0x00007ffe2fbd3000)
libhsa-runtime64.so.1 => /tmp/TheRockTestDir/bin/../lib/libhsa-runtime64.so.1 (0x00007f1cbb8df000)
libstdc++.so.6 => /usr/lib/gcc/x86_64-pc-linux-gnu/14/libstdc++.so.6 (0x00007f1cbb649000)
libm.so.6 => /lib64/libm.so.6 (0x00007f1cbb59e000)
libgcc_s.so.1 => /usr/lib/gcc/x86_64-pc-linux-gnu/14/libgcc_s.so.1 (0x00007f1cbb571000)
libc.so.6 => /lib64/libc.so.6 (0x00007f1cbb3a9000)
librocm_sysdeps_elf.so.1 => /tmp/TheRockTestDir/bin/../lib/rocm_sysdeps/lib/librocm_sysdeps_elf.so.1 (0x00007f1cbb38a000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007f1cbb385000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f1cbb380000)
librt.so.1 => /lib64/librt.so.1 (0x00007f1cbb37b000)
librocprofiler-register.so.0 => /tmp/TheRockTestDir/bin/../lib/librocprofiler-register.so.0 (0x00007f1cbb2b3000)
librocm_sysdeps_drm.so.2 => /tmp/TheRockTestDir/bin/../lib/rocm_sysdeps/lib/librocm_sysdeps_drm.so.2 (0x00007f1cbb29b000)
librocm_sysdeps_drm_amdgpu.so.1 => /tmp/TheRockTestDir/bin/../lib/rocm_sysdeps/lib/librocm_sysdeps_drm_amdgpu.so.1 (0x00007f1cbb28b000)
librocm_sysdeps_numa.so.1 => /tmp/TheRockTestDir/bin/../lib/rocm_sysdeps/lib/librocm_sysdeps_numa.so.1 (0x00007f1cbb27b000)
/lib64/ld-linux-x86-64.so.2 (0x00007f1cbbc4a000)
librocm_sysdeps_z.so.1 => /tmp/TheRockTestDir/bin/../lib/rocm_sysdeps/lib/librocm_sysdeps_z.so.1 (0x00007f1cbb25c000)
librocm_sysdeps_zstd.so.1 => /tmp/TheRockTestDir/bin/../lib/rocm_sysdeps/lib/librocm_sysdeps_zstd.so.1 (0x00007f1cbb16a000)
librocm_sysdeps_bz2.so => /tmp/TheRockTestDir/bin/../lib/rocm_sysdeps/lib/librocm_sysdeps_bz2.so (0x00007f1cbb151000)
libatomic.so.1 => /usr/lib/gcc/x86_64-pc-linux-gnu/14/libatomic.so.1 (0x00007f1cbb145000)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good for the initial tests
i wonder if we could pair build_linux_packages
with this to avoid having to re-download the artifacts, although the file sizes are so small it should be golden. may be something to consider when we start looking into larger artifacts
That would mean to merge the two jobs as files can only be shared within a job. However, we're planing to break the |
We also don't want to any needless compute on MI300s since they are a very hot commodity. We don't want to compile/build on MI300s. |
echo "Downloading artifacts" | ||
aws s3 cp s3://therock-artifacts/${{github.run_id}}/core-runtime_run"${VARIANT}".tar.xz "${BUILD_ARTIFACTS_DIR}" | ||
aws s3 cp s3://therock-artifacts/${{github.run_id}}/core-runtime_lib"${VARIANT}".tar.xz "${BUILD_ARTIFACTS_DIR}" | ||
aws s3 cp s3://therock-artifacts/${{github.run_id}}/sysdeps_lib.tar"${VARIANT}".xz "${BUILD_ARTIFACTS_DIR}" | ||
aws s3 cp s3://therock-artifacts/${{github.run_id}}/base_lib.tar"${VARIANT}".xz "${BUILD_ARTIFACTS_DIR}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
echo "Downloading artifacts" | |
aws s3 cp s3://therock-artifacts/${{github.run_id}}/core-runtime_run"${VARIANT}".tar.xz "${BUILD_ARTIFACTS_DIR}" | |
aws s3 cp s3://therock-artifacts/${{github.run_id}}/core-runtime_lib"${VARIANT}".tar.xz "${BUILD_ARTIFACTS_DIR}" | |
aws s3 cp s3://therock-artifacts/${{github.run_id}}/sysdeps_lib.tar"${VARIANT}".xz "${BUILD_ARTIFACTS_DIR}" | |
aws s3 cp s3://therock-artifacts/${{github.run_id}}/base_lib.tar"${VARIANT}".xz "${BUILD_ARTIFACTS_DIR}" | |
echo "Downloading artifacts" | |
aws s3 cp s3://therock-artifacts/${{github.run_id}}/core-runtime_run"${VARIANT}".tar.xz "${BUILD_ARTIFACTS_DIR}" --no-sign-request | |
aws s3 cp s3://therock-artifacts/${{github.run_id}}/core-runtime_lib"${VARIANT}".tar.xz "${BUILD_ARTIFACTS_DIR}" --no-sign-request | |
aws s3 cp s3://therock-artifacts/${{github.run_id}}/sysdeps_lib"${VARIANT}".tar.xz "${BUILD_ARTIFACTS_DIR}" --no-sign-request | |
aws s3 cp s3://therock-artifacts/${{github.run_id}}/base_lib"${VARIANT}".tar.xz "${BUILD_ARTIFACTS_DIR}" --no-sign-request |
Beside the typo, it needs --no-sign-request
as it otherwise will fail with fatal error: Unable to locate credentials
even though if the account for which credentials are loaded has no access to the bucket.
- name: Run rocminfo | ||
run: | | ||
echo "Running rocminfo" | ||
/bin/rocminfo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the artifacts stored for run 13583436756, the recent rework of the layering and with (slightly adjusted) steps above, rocminfo
isn't unpacked to /tmp/bin/rocminfo
fior me locally. Are you running something that is already pre-existing on the system?
This PR introduces testing on our MI300 fleet focused on installing rocminfo and its dependencies that were built and uploaded into s3 from the previous build_linux_packages phase.
A few callouts here:
A few other notes, running this fast is a bit challenging. We may want to add a couple other github actions to make debugging easier. What I ended up doing was debugging with only the test part and a hardcoded prior CI run on an ubuntu machine first. Switched to an MI300. Then finally switched to depending on the build_linux_packages workflow.