Skip to content

Commit 1d7bea4

Browse files
authored
Merge pull request #55 from carpentriesoffline/jsteyn-patch-6
Update miniHPC_step_by_step.md
2 parents 1e1dcfe + 6b5cac2 commit 1d7bea4

File tree

1 file changed

+212
-43
lines changed

1 file changed

+212
-43
lines changed

miniHPC_step_by_step.md

+212-43
Original file line numberDiff line numberDiff line change
@@ -4,49 +4,127 @@ title: "Setting up the miniHPC login node"
44
---
55

66
{% include sidebar.md %}
7-
# Setting up the miniHPC login node
8-
## (Work in progress)
7+
This is a step by step guide on how to set up a miniHPC using Raspberry Pis.
8+
9+
# 1. Hardware requirement
10+
11+
## Minimal requirements
12+
- Raspberry Pi (RPi) 4 2GB+ single board computers (SBC): 1 for the head node, plus as many nodes as as you want
13+
- A multiport Netgear switch (as many ports as Rasberry Pis)
14+
- 10BaseT Cat6 ethernet cables (1 per Rasberry Pi)
15+
- Power supplies for each Rasberry Pi (alternatively: use a PoE switch to power all Rasberry Pis)
16+
- A 8GB flash drive for shared storage
17+
- A 32GB SD card to boot the main node from
18+
- Cooling device (e.g. USB desktop fan)
19+
20+
## Optional
21+
- Example of casing:
22+
- 3D printed DIN Rail stand
23+
- 3D printed RPi cases
24+
25+
# 2. Initial configuration
26+
_TODO From https://github.com/carpentriesoffline/CarpentriesOffline.github.io/blob/main/rpiimage_step_by_step.md_
27+
28+
## Creating an SD card image: step-by-step
29+
30+
### Setting up a Raspberry Pi
31+
32+
The official [Set up your SD card](https://projects.raspberrypi.org/en/projects/raspberry-pi-setting-up/2) is up to date as of 2nd of May 2024.
33+
34+
When using the The Raspberry Pi Imager, select the Device and OS.
35+
36+
The OS selection should be `Raspberry Pi OS (other)` -> `Raspberry Pi OS Lite (64-bit)`.
37+
38+
![image alt >](../images/screenshots/imager-hero-shot.png)
39+
40+
Selecting the device:
41+
42+
![image alt >](../images/screenshots/imager-device-selection.png)
43+
44+
45+
Selecting the OS:
46+
47+
![](../images/screenshots/imager-OS-selection-1.png)
48+
49+
![](../images/screenshots/imager-OS-selection-2.png)
50+
51+
After this, please select the sdcard you would like to flash the image on, Then press `NEXT`.
52+
53+
![](../images/screenshots/imager-sd-card-selection.png)
54+
55+
it will ask if the user wants to do any customisation, select `EDIT SETTINGS`.
56+
57+
![](../images/screenshots/imager-customiser-dialog.png)
58+
59+
This will show a pop-up window where the following configuration options can be defined for your set-up (below are examples) such that your OS is pre-configured upon first boot.
60+
61+
1. Hostname: `CW24miniHPC`
62+
1. Username: `cw24`
63+
1. Password: `*****`
64+
1. WiFI SSID and Password: Enter your WiFi details
65+
66+
![](../images/screenshots/imager-os-config.png)
67+
68+
Then go to the `SERVICES` tab and enable SSH with password authentication (alternatively, adding a ssh public key). If you would like to set up easy access to the Pi via an ssh key, please see [here](ssh-setup.md).
69+
70+
_TODO: Section on generating an ssh key-pair._
71+
72+
![](../images/screenshots/imager-pwd-setup.png)
73+
74+
75+
After, saving this, select `YES` to apply the configuration.
76+
77+
![](../images/screenshots/imager-os-config-apply.png)
78+
79+
Confim writing to the sdcard (please backup any data on the sdcard, any existing data will be **LOST!**)
80+
81+
![](../images/screenshots/imager-confirm-sdcard-write.png)
82+
83+
# Installing SLURM/HPC
84+
85+
## Setting up the miniHPC login node
86+
87+
- Login to the Pi
88+
Use SSH or login with a local console if you have a monitor attached. Use the login details you used above to log into the Pi.
89+
90+
```bash
91+
ssh <USERNAME>@<IP-ADDRESS>
92+
```
93+
94+
In this example, the username would be `cw24`
995

10-
- Create an SD card (or USB drive if booting from USB) with Raspberry Pi Lite Os on it.
1196
- Do an update and a full-upgrade:
1297

1398
```bash
14-
sudo apt-get -y update
15-
sudo apt-get -y full-upgrade
99+
sudo apt update
100+
sudo apt full-upgrade
16101
```
17102

18-
- Install the following packages:
103+
- Install required dependencies.
19104

20105
```bash
21-
sudo apt-get install -y nfs-kernel-server lmod ansible slurm munge nmap \
106+
sudo apt install -y nfs-kernel-server lmod ansible slurm munge nmap \
22107
nfs-common net-tools build-essential htop net-tools screen vim python3-pip \
23108
dnsmasq slurm-wlm
24109
```
25110

111+
- Setup the Cluster network
26112

27-
- Setup the network
113+
Place the following into `/etc/network/interfaces`
28114

29-
Place the following into /etc/network/interfaces
30-
31-
```
115+
```bash
32116
auto eth0
33117
allow-hotplug eth0
34118
iface eth0 inet static
35119
address 192.168.5.101
36120
netmask 255.255.255.0
37121
source /etc/network/interfaces.d/*
38-
```
39-
40-
- Setup the WiFi
41-
42-
If you want to connect to the internet
43-
Run `sudo raspi-config`, go to System Options, Wireless LAN and enter your SSID and password.
44-
122+
```
45123

46124
- Modify the hostname
47125

48126
```bash
49-
echo node001 | sudo tee -a /etc/hostname
127+
echo pixie001 | sudo tee /etc/hostname
50128
```
51129

52130
- Configure dhcp by entering the following in the file `/etc/dhcpd.conf`
@@ -68,57 +146,148 @@ bogus-priv
68146
dhcp-range=192.168.5.102,192.168.5.200,255.255.255.0,12h
69147
```
70148

71-
- Configure shared drives by addeding the following at the end of the file `/etc/exports`
149+
- Create a shared directory.
150+
151+
```bash
152+
sudo mkdir /sharedfs
153+
sudo chown nobody:nogroup -R /sharedfs
154+
sudo chmod 777 -R /sharedfs
155+
```
156+
157+
- Configure shared drives by adding the following at the end of the file `/etc/exports`
72158

73159
```bash
74160
/sharedfs 192.168.5.0/24(rw,sync,no_root_squash,no_subtree_check)
75-
/modules 192.168.5.0/24(rw,sync,no_root_squash,no_subtree_check)
76161
```
77162

78-
- The `/etc/hosts` file should contain the following:
163+
- The `/etc/hosts` file should contain the following. Make sure to change all occurences of `pixie` in the script to the name of your cluster:
79164

80165
```bash
81166
127.0.0.1 localhost
82167
::1 localhost ip6-localhost ip6-loopback
83168
ff02::1 ip6-allnodes
84169
ff02::2 ip6-allrouters
85170

86-
127.0.1.1 node001
171+
# This login node's hostname
172+
127.0.1.1 pixie001
87173

88-
192.168.5.102 node002
89-
192.168.5.103 node003
90-
192.168.5.104 node004
91-
192.168.5.105 node005
174+
# IP and hostname of compute nodes
175+
192.168.5.102 pixie002
92176
```
93177

178+
- Configure Slurm
94179

95-
- Install EESSI
180+
Add the following to /etc/slurm/slurm.conf. Change all occurences of `pixie` in this script to the name of your cluster.
96181

97182
```
98-
mkdir eessi
99-
cd eessi
183+
SlurmctldHost=pixie001(192.168.5.101)
184+
MpiDefault=none
185+
ProctrackType=proctrack/cgroup
186+
#ProctrackType=proctrack/linuxproc
187+
ReturnToService=1
188+
SlurmctldPidFile=/run/slurmctld.pid
189+
SlurmctldPort=6817
190+
SlurmdPidFile=/run/slurmd.pid
191+
SlurmdPort=6818
192+
SlurmdSpoolDir=/var/lib/slurm/slurmd
193+
SlurmUser=slurm
194+
StateSaveLocation=/var/lib/slurm/slurmctld
195+
SwitchType=switch/none
196+
TaskPlugin=task/affinity
197+
InactiveLimit=0
198+
KillWait=30
199+
MinJobAge=300
200+
SlurmctldTimeout=120
201+
SlurmdTimeout=300
202+
Waittime=0
203+
SchedulerType=sched/backfill
204+
SelectType=select/cons_res
205+
SelectTypeParameters=CR_Core
206+
AccountingStorageType=accounting_storage/none
207+
# AccountingStoreJobComment=YES
208+
AccountingStoreFlags=job_comment
209+
ClusterName=pixie
210+
JobCompType=jobcomp/none
211+
JobAcctGatherFrequency=30
212+
JobAcctGatherType=jobacct_gather/none
213+
SlurmctldDebug=info
214+
SlurmctldLogFile=/var/log/slurm/slurmctld.log
215+
SlurmdDebug=info
216+
SlurmdLogFile=/var/log/slurm/slurmd.log
217+
PartitionName=pixiecluster Nodes=pixie[002-002] Default=YES MaxTime=INFINITE State=UP
218+
RebootProgram=/etc/slurm/slurmreboot.sh
219+
NodeName=pixie001 NodeAddr=192.168.5.101 CPUs=4 State=IDLE
220+
NodeName=pixie002 NodeAddr=192.168.5.102 CPUs=4 State=IDLE
221+
```
222+
223+
- Restart slurm
224+
225+
```bash
226+
sudo systemctl restart slurmctld
227+
```
228+
229+
- Install ESSI
230+
231+
```bash
232+
mkdir essi
233+
cd essi
100234
wget https://raw.githubusercontent.com/EESSI/eessi-demo/main/scripts/install_cvmfs_eessi.sh
101235
sudo bash ./install_cvmfs_eessi.sh
102236
echo "source /cvmfs/software.eessi.io/versions/2023.06/init/bash" | sudo tee -a /etc/profile
103237
```
104238

105-
- Create a shared directory
239+
- Install a client node
240+
241+
Flash another SD card for a Raspberry Pi. Boot it up with internet access and run the following:
106242

107243
```bash
108-
sudo mkdir /sharedfs
109-
sudo chown nobody:nogroup -R /sharedfs
110-
sudo chmod 777 -R /sharedfs
244+
sudo apt-get install -y slurmd slurm-client munge vim ntp ntpdate
111245
```
112246

113-
- configure slurm
114-
- slurm.conf
247+
- On a Linux laptop (or with a USB SD card reader) take an image of this:
115248

116-
- configure cgroup
117-
- cgroup.conf
118-
- cgroup_allowed_device_file.conf
249+
```bash
250+
dd if=/dev/mmcblk0 of=node.img
251+
```
252+
253+
- Copy node.img to the master Raspberry Pi's home directory.
254+
255+
256+
- Setup PXE booting
257+
258+
Download the pxe-boot scripts:
259+
260+
```bash
261+
git clone https://github.com/carpentriesoffline/pxe-boot.git
262+
cd pxe-boot
263+
./pxe-install
264+
```
265+
266+
Initalise a PXE node:
267+
```
268+
./pxe-add <serial number> ../node.img <IP address> <node name> <mac address>
269+
```
270+
271+
for example:
272+
```
273+
./pxe-add fa917c3a ../node.img 192.168.5.105 pixie002 dc:a6:32:af:83:d0
274+
```
275+
276+
This will create an entry with the serial number in /pxe-boot and /pxe-root.
119277

120-
- configure munge
121-
- munge.key
278+
- Copy the Slurm config to the node filesystems
122279

123-
- disable wifi in compute nodes
124-
- /boot/firmware/config.txt
280+
```bash
281+
cp /etc/slurm/slurm.conf /pxe-root/*/etc/slurm/
282+
````
283+
284+
285+
## Test PXE booting
286+
* Boot up a client
287+
* Run sinfo to see if the cluster is working
288+
You should see something like
289+
290+
```bash
291+
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
292+
pixiecluster* up infinite 5 idle pixie[002-006]
293+
```

0 commit comments

Comments
 (0)