Install PyTorch for ROCM instead of CPU-only #1032

marbre · 2025-03-04T07:58:15Z

These workflows run on MI300 machines but install a CPU-only version of PyTorch instead of installing the ROCm enabled one.

marbre · 2025-03-04T10:45:32Z

While these tests use a runner with MI300, they might not use torch+ROCm to run anything on the GPU. This PR would switch all those workflows to PyTorch+ROCm. Instead of switching in our workflows (if not needed) we might want to be more specific in the developer_guide.md#install-pytorch-for-your-system docs.

ScottTodd · 2025-03-04T17:05:28Z

.github/workflows/ci-llama-quick-tests.yaml

-          pip install --no-compile -r pytorch-cpu-requirements.txt
+          pip install --no-compile -r pytorch-rocm-requirements.txt


Our CI should install what is minimally needed, and our stack is designed from the ground up to avoid dependencies on kernel libraries and other bloat that makes its way into ML frameworks. Users can install whatever they want, such as if they are mixing stock PyTorch with our packages.

I would be okay with deleting https://github.com/nod-ai/shark-ai/blob/main/pytorch-rocm-requirements.txt and instead directing users to either the official PyTorch/ROCm install instructions or linking to our other recommendations (e.g. https://github.com/nod-ai/TheRock/, once it is ready/tested).

See how much this slows down CI:

Before: https://github.com/nod-ai/shark-ai/actions/runs/13645323949/job/38143083590#step:6:35

40s for "Install pip deps"

Tue, 04 Mar 2025 03:05:47 GMT Looking in indexes: https://download.pytorch.org/whl/cpu/ Tue, 04 Mar 2025 03:05:48 GMT Collecting torch==2.3.0 (from -r pytorch-cpu-requirements.txt (line 2)) Tue, 04 Mar 2025 03:05:48 GMT Downloading https://download.pytorch.org/whl/cpu/torch-2.3.0%2Bcpu-cp311-cp311-linux_x86_64.whl (190.4 MB) Tue, 04 Mar 2025 03:05:49 GMT ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 190.4/190.4 MB 215.3 MB/s eta 0:00:00

After: https://github.com/nod-ai/shark-ai/actions/runs/13648884431/job/38152825413?pr=1032#step:6:35

2m40s for "Install pip deps", downloading 4GB+

Tue, 04 Mar 2025 08:00:18 GMT Looking in indexes: https://download.pytorch.org/whl/rocm6.2 Tue, 04 Mar 2025 08:00:18 GMT Collecting torch>=2.3.0 (from -r pytorch-rocm-requirements.txt (line 2)) Tue, 04 Mar 2025 08:00:18 GMT Downloading https://download.pytorch.org/whl/rocm6.2/torch-2.5.1%2Brocm6.2-cp311-cp311-linux_x86_64.whl (3973.6 MB) Tue, 04 Mar 2025 08:00:45 GMT ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.0/4.0 GB 24.3 MB/s eta 0:00:00

Agree with Scott here. sharktank uses pytorch only for very minimal dependencies. All CIs changed here use the GPUs but not in eager mode, hence not losing out without torch+rocm.
Currently working on enabling ci_eval.yaml to use GPU in eager mode and will require this change in the future.

marbre requested review from ScottTodd, aviator19941, archana-ramalingam and stbaione March 4, 2025 07:58

Install PyTorch for ROCM instead of CPU-only

d6e8e09

These workflows run on MI300 machines but install a CPU-only version of PyTorch instead of installing the ROCm enabled one.

marbre force-pushed the pytorch-rocm branch from eb9b614 to d6e8e09 Compare March 4, 2025 07:59

marbre marked this pull request as draft March 4, 2025 10:32

ScottTodd requested changes Mar 4, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Install PyTorch for ROCM instead of CPU-only #1032

Install PyTorch for ROCM instead of CPU-only #1032

marbre commented Mar 4, 2025

marbre commented Mar 4, 2025

ScottTodd Mar 4, 2025

archana-ramalingam Mar 4, 2025

		pip install --no-compile -r pytorch-cpu-requirements.txt
		pip install --no-compile -r pytorch-rocm-requirements.txt

Install PyTorch for ROCM instead of CPU-only #1032

Are you sure you want to change the base?

Install PyTorch for ROCM instead of CPU-only #1032

Conversation

marbre commented Mar 4, 2025

marbre commented Mar 4, 2025

ScottTodd Mar 4, 2025

Choose a reason for hiding this comment

archana-ramalingam Mar 4, 2025

Choose a reason for hiding this comment