Skip to content

Commit 0a28e0f

Browse files
authored
Add entrypoint in docker file and update document (#159)
**Description** Run `ldconfig` as entry point in dockerfile. Update document for using cuda 12.2. Update gpt performance numbers. Without `ldconfig` in entry point, there will be link errors in torch2.1 docker container.
1 parent ac666ac commit 0a28e0f

File tree

6 files changed

+26
-7
lines changed

6 files changed

+26
-7
lines changed

.github/workflows/unit-tests.yaml

+5-1
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ jobs:
1717
- torch: "1.14"
1818
nvcr: 22.12-py3
1919
dir: torch1
20-
# 2.1.0a0+fe05266f
20+
# 2.1.0a0+32f93b1
2121
- torch: "2.1"
2222
nvcr: 23.10-py3
2323
dir: torch2
@@ -57,6 +57,10 @@ jobs:
5757
export LD_PRELOAD="/usr/local/lib/libmsamp_dist.so:/usr/local/lib/libnccl.so:${LD_PRELOAD}"
5858
cd ${{ matrix.dir }}/
5959
python3 setup.py test
60+
- name: Clean repository
61+
if: always()
62+
run: |
63+
rm -rf ${{ matrix.dir }}/
6064
# - name: Report coverage results
6165
# run: |
6266
# bash <(curl -s https://codecov.io/bash)

dockerfile/entrypoint.sh

+5
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
#!/bin/bash
2+
3+
ldconfig
4+
5+
exec "$@"

dockerfile/torch1.14-cuda11.8.dockerfile

+5
Original file line numberDiff line numberDiff line change
@@ -57,3 +57,8 @@ RUN python3 -m pip install . && \
5757
make postinstall
5858

5959
ENV LD_PRELOAD="/usr/local/lib/libmsamp_dist.so:/usr/local/lib/libnccl.so:${LD_PRELOAD}"
60+
61+
# Set up entrypoint
62+
COPY dockerfile/entrypoint.sh /entrypoint.sh
63+
RUN chmod +x /entrypoint.sh
64+
ENTRYPOINT ["/entrypoint.sh"]

dockerfile/torch2.1-cuda12.2.dockerfile

+7-2
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
11
FROM nvcr.io/nvidia/pytorch:23.10-py3
22

33
# Ubuntu: 22.04
4-
# Python: 3.8
4+
# Python: 3.10
55
# CUDA: 12.2.0
66
# cuDNN: 8.9.5
77
# NCCL: v2.16.2-1 + FP8 Support
8-
# PyTorch: 2.1.0a0+fe05266f
8+
# PyTorch: 2.1.0a0+32f93b1
99

1010
LABEL maintainer="MS-AMP"
1111

@@ -57,3 +57,8 @@ RUN python3 -m pip install . && \
5757
make postinstall
5858

5959
ENV LD_PRELOAD="/usr/local/lib/libmsamp_dist.so:/usr/local/lib/libnccl.so:${LD_PRELOAD}"
60+
61+
# Set up entrypoint
62+
COPY dockerfile/entrypoint.sh /entrypoint.sh
63+
RUN chmod +x /entrypoint.sh
64+
ENTRYPOINT ["/entrypoint.sh"]

docs/assets/gpt-performance.png

-9.69 KB
Loading

docs/getting-started/installation.mdx

+4-4
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ Here're the system requirements for MS-AMP.
1818
* CUDA version 11 or later (which can be checked by running `nvcc --version`).
1919
* PyTorch version 1.14 or later (which can be checked by running `python -c "import torch; print(torch.__version__)"`).
2020

21-
You can try MS-AMP in two ways: Using Docker or installing from source:
21+
You can try MS-AMP in two ways: Using Docker or installing from source.
2222

2323
* Using Docker is a convenient way to get started with MS-AMP. You can use the pre-built Docker image to quickly set up an environment for running MS-AMP.
2424
* On the other hand, installing from source gives you more control over the installation process and allows you to customize the installation to your needs.
@@ -28,8 +28,8 @@ You can try MS-AMP in two ways: Using Docker or installing from source:
2828
You can try the latest MS-AMP Docker container with the following commands:
2929

3030
```bash
31-
sudo docker run -it -d --name=msampcu121 --privileged --net=host --ipc=host --gpus=all -v /:/hostroot ghcr.io/azure/msamp:main-cuda12.1 bash
32-
sudo docker exec -it msampcu121 bash
31+
sudo docker run -it -d --name=msampcu122 --privileged --net=host --ipc=host --gpus=all -v /:/hostroot ghcr.io/azure/msamp:main-cuda12.2 bash
32+
sudo docker exec -it msampcu122 bash
3333
```
3434

3535
MS-AMP is pre-installed in Docker container and you can verify it by running:
@@ -46,7 +46,7 @@ We strongly recommend using [PyTorch NGC Container](https://catalog.ngc.nvidia.c
4646
For example, to start PyTorch 2.1 container, run the following command:
4747

4848
```bash
49-
sudo docker run -it -d --name=msamp --privileged --net=host --ipc=host --gpus=all nvcr.io/nvidia/pytorch:23.04-py3 bash
49+
sudo docker run -it -d --name=msamp --privileged --net=host --ipc=host --gpus=all nvcr.io/nvidia/pytorch:23.10-py3 bash
5050
sudo docker exec -it msamp bash
5151
```
5252

0 commit comments

Comments
 (0)