Skip to content

Commit aaf59dc

Browse files
Improve simple demo for multi-nodes with README and minor changes (#201) (#202)
Co-authored-by: Louie Tsai <louie.tsai@intel.com>
1 parent 0254790 commit aaf59dc

File tree

1 file changed

+34
-0
lines changed

1 file changed

+34
-0
lines changed

demo/README.md

+34
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
# Simple Demo for Intel® oneCCL Bindings for PyTorch*
2+
3+
This simple demo show case the functionality for collective communication primitives in Intel® oneCCL Bindings for PyTorch*.
4+
5+
## Single Node Run
6+
To run the simple demo on a single node with 2 instances, run:
7+
8+
```bash
9+
mpirun -n 2 -l python demo.py
10+
11+
```
12+
The demo could be also run on XPU with " --device xpu " argument.
13+
14+
```bash
15+
mpirun -n 2 -l python demo.py --device xpu
16+
```
17+
18+
## Multiple Nodes Run
19+
To run the simple demo on multiple nodes, please follow below instructions:
20+
21+
### Ethernet
22+
1. Identify the network interface name for collective communication. ex: eth0
23+
2. Identify the IPs of all nodes. ex: 10.0.0.1,10.0.0.2
24+
3. Identify the master node IP. ex: 10.0.0.1
25+
4. Set the value of np for the total number of instances. ex: 2
26+
5. Set the value of ppn for the number of instance per node. ex: 1
27+
28+
Here is a run command example for cpu according to above steps:
29+
30+
```bash
31+
FI_TCP_IFACE=eth0 I_MPI_OFI_PROVIDER=tcp I_MPI_HYDRA_IFACE=eth0 I_MPI_DEBUG=121 mpirun -host 10.0.0.1,10.0.0.2 -np 2 -ppn 1 --map-by node python demo.py --device cpu --dist_url 10.0.0.1 --dist_port 29500
32+
```
33+
The demo could be also run on XPU by changing " --device cpu " to " --device xpu " argument.
34+

0 commit comments

Comments
 (0)