@@ -4,49 +4,127 @@ title: "Setting up the miniHPC login node"
4
4
---
5
5
6
6
{% include sidebar.md %}
7
- # Setting up the miniHPC login node
8
- ## (Work in progress)
7
+ This is a step by step guide on how to set up a miniHPC using Raspberry Pis.
8
+
9
+ # 1. Hardware requirement
10
+
11
+ ## Minimal requirements
12
+ - Raspberry Pi (RPi) 4 2GB+ single board computers (SBC): 1 for the head node, plus as many nodes as as you want
13
+ - A multiport Netgear switch (as many ports as Rasberry Pis)
14
+ - 10BaseT Cat6 ethernet cables (1 per Rasberry Pi)
15
+ - Power supplies for each Rasberry Pi (alternatively: use a PoE switch to power all Rasberry Pis)
16
+ - A 8GB flash drive for shared storage
17
+ - A 32GB SD card to boot the main node from
18
+ - Cooling device (e.g. USB desktop fan)
19
+
20
+ ## Optional
21
+ - Example of casing:
22
+ - 3D printed DIN Rail stand
23
+ - 3D printed RPi cases
24
+
25
+ # 2. Initial configuration
26
+ _ TODO From https://github.com/carpentriesoffline/CarpentriesOffline.github.io/blob/main/rpiimage_step_by_step.md _
27
+
28
+ ## Creating an SD card image: step-by-step
29
+
30
+ ### Setting up a Raspberry Pi
31
+
32
+ The official [ Set up your SD card] ( https://projects.raspberrypi.org/en/projects/raspberry-pi-setting-up/2 ) is up to date as of 2nd of May 2024.
33
+
34
+ When using the The Raspberry Pi Imager, select the Device and OS.
35
+
36
+ The OS selection should be ` Raspberry Pi OS (other) ` -> ` Raspberry Pi OS Lite (64-bit) ` .
37
+
38
+ ![ image alt >] ( ../images/screenshots/imager-hero-shot.png )
39
+
40
+ Selecting the device:
41
+
42
+ ![ image alt >] ( ../images/screenshots/imager-device-selection.png )
43
+
44
+
45
+ Selecting the OS:
46
+
47
+ ![ ] ( ../images/screenshots/imager-OS-selection-1.png )
48
+
49
+ ![ ] ( ../images/screenshots/imager-OS-selection-2.png )
50
+
51
+ After this, please select the sdcard you would like to flash the image on, Then press ` NEXT ` .
52
+
53
+ ![ ] ( ../images/screenshots/imager-sd-card-selection.png )
54
+
55
+ it will ask if the user wants to do any customisation, select ` EDIT SETTINGS ` .
56
+
57
+ ![ ] ( ../images/screenshots/imager-customiser-dialog.png )
58
+
59
+ This will show a pop-up window where the following configuration options can be defined for your set-up (below are examples) such that your OS is pre-configured upon first boot.
60
+
61
+ 1 . Hostname: ` CW24miniHPC `
62
+ 1 . Username: ` cw24 `
63
+ 1 . Password: ` ***** `
64
+ 1 . WiFI SSID and Password: Enter your WiFi details
65
+
66
+ ![ ] ( ../images/screenshots/imager-os-config.png )
67
+
68
+ Then go to the ` SERVICES ` tab and enable SSH with password authentication (alternatively, adding a ssh public key). If you would like to set up easy access to the Pi via an ssh key, please see [ here] ( ssh-setup.md ) .
69
+
70
+ _ TODO: Section on generating an ssh key-pair._
71
+
72
+ ![ ] ( ../images/screenshots/imager-pwd-setup.png )
73
+
74
+
75
+ After, saving this, select ` YES ` to apply the configuration.
76
+
77
+ ![ ] ( ../images/screenshots/imager-os-config-apply.png )
78
+
79
+ Confim writing to the sdcard (please backup any data on the sdcard, any existing data will be ** LOST!** )
80
+
81
+ ![ ] ( ../images/screenshots/imager-confirm-sdcard-write.png )
82
+
83
+ # Installing SLURM/HPC
84
+
85
+ ## Setting up the miniHPC login node
86
+
87
+ - Login to the Pi
88
+ Use SSH or login with a local console if you have a monitor attached. Use the login details you used above to log into the Pi.
89
+
90
+ ``` bash
91
+ ssh < USERNAME> @< IP-ADDRESS>
92
+ ```
93
+
94
+ In this example, the username would be ` cw24 `
9
95
10
- - Create an SD card (or USB drive if booting from USB) with Raspberry Pi Lite Os on it.
11
96
- Do an update and a full-upgrade:
12
97
13
98
``` bash
14
- sudo apt-get -y update
15
- sudo apt-get -y full-upgrade
99
+ sudo apt update
100
+ sudo apt full-upgrade
16
101
```
17
102
18
- - Install the following packages:
103
+ - Install required dependencies.
19
104
20
105
``` bash
21
- sudo apt-get install -y nfs-kernel-server lmod ansible slurm munge nmap \
106
+ sudo apt install -y nfs-kernel-server lmod ansible slurm munge nmap \
22
107
nfs-common net-tools build-essential htop net-tools screen vim python3-pip \
23
108
dnsmasq slurm-wlm
24
109
```
25
110
111
+ - Setup the Cluster network
26
112
27
- - Setup the network
113
+ Place the following into ` /etc/ network/interfaces `
28
114
29
- Place the following into /etc/network/interfaces
30
-
31
- ```
115
+ ``` bash
32
116
auto eth0
33
117
allow-hotplug eth0
34
118
iface eth0 inet static
35
119
address 192.168.5.101
36
120
netmask 255.255.255.0
37
121
source /etc/network/interfaces.d/*
38
- ```
39
-
40
- - Setup the WiFi
41
-
42
- If you want to connect to the internet
43
- Run ` sudo raspi-config ` , go to System Options, Wireless LAN and enter your SSID and password.
44
-
122
+ ```
45
123
46
124
- Modify the hostname
47
125
48
126
``` bash
49
- echo node001 | sudo tee -a /etc/hostname
127
+ echo pixie001 | sudo tee /etc/hostname
50
128
```
51
129
52
130
- Configure dhcp by entering the following in the file ` /etc/dhcpd.conf `
@@ -68,57 +146,148 @@ bogus-priv
68
146
dhcp-range=192.168.5.102,192.168.5.200,255.255.255.0,12h
69
147
```
70
148
71
- - Configure shared drives by addeding the following at the end of the file ` /etc/exports `
149
+ - Create a shared directory.
150
+
151
+ ``` bash
152
+ sudo mkdir /sharedfs
153
+ sudo chown nobody:nogroup -R /sharedfs
154
+ sudo chmod 777 -R /sharedfs
155
+ ```
156
+
157
+ - Configure shared drives by adding the following at the end of the file ` /etc/exports `
72
158
73
159
``` bash
74
160
/sharedfs 192.168.5.0/24(rw,sync,no_root_squash,no_subtree_check)
75
- /modules 192.168.5.0/24(rw,sync,no_root_squash,no_subtree_check)
76
161
```
77
162
78
- - The ` /etc/hosts ` file should contain the following:
163
+ - The ` /etc/hosts ` file should contain the following. Make sure to change all occurences of ` pixie ` in the script to the name of your cluster :
79
164
80
165
``` bash
81
166
127.0.0.1 localhost
82
167
::1 localhost ip6-localhost ip6-loopback
83
168
ff02::1 ip6-allnodes
84
169
ff02::2 ip6-allrouters
85
170
86
- 127.0.1.1 node001
171
+ # This login node's hostname
172
+ 127.0.1.1 pixie001
87
173
88
- 192.168.5.102 node002
89
- 192.168.5.103 node003
90
- 192.168.5.104 node004
91
- 192.168.5.105 node005
174
+ # IP and hostname of compute nodes
175
+ 192.168.5.102 pixie002
92
176
```
93
177
178
+ - Configure Slurm
94
179
95
- - Install EESSI
180
+ Add the following to /etc/slurm/slurm.conf. Change all occurences of ` pixie ` in this script to the name of your cluster.
96
181
97
182
```
98
- mkdir eessi
99
- cd eessi
183
+ SlurmctldHost=pixie001(192.168.5.101)
184
+ MpiDefault=none
185
+ ProctrackType=proctrack/cgroup
186
+ #ProctrackType=proctrack/linuxproc
187
+ ReturnToService=1
188
+ SlurmctldPidFile=/run/slurmctld.pid
189
+ SlurmctldPort=6817
190
+ SlurmdPidFile=/run/slurmd.pid
191
+ SlurmdPort=6818
192
+ SlurmdSpoolDir=/var/lib/slurm/slurmd
193
+ SlurmUser=slurm
194
+ StateSaveLocation=/var/lib/slurm/slurmctld
195
+ SwitchType=switch/none
196
+ TaskPlugin=task/affinity
197
+ InactiveLimit=0
198
+ KillWait=30
199
+ MinJobAge=300
200
+ SlurmctldTimeout=120
201
+ SlurmdTimeout=300
202
+ Waittime=0
203
+ SchedulerType=sched/backfill
204
+ SelectType=select/cons_res
205
+ SelectTypeParameters=CR_Core
206
+ AccountingStorageType=accounting_storage/none
207
+ # AccountingStoreJobComment=YES
208
+ AccountingStoreFlags=job_comment
209
+ ClusterName=pixie
210
+ JobCompType=jobcomp/none
211
+ JobAcctGatherFrequency=30
212
+ JobAcctGatherType=jobacct_gather/none
213
+ SlurmctldDebug=info
214
+ SlurmctldLogFile=/var/log/slurm/slurmctld.log
215
+ SlurmdDebug=info
216
+ SlurmdLogFile=/var/log/slurm/slurmd.log
217
+ PartitionName=pixiecluster Nodes=pixie[002-002] Default=YES MaxTime=INFINITE State=UP
218
+ RebootProgram=/etc/slurm/slurmreboot.sh
219
+ NodeName=pixie001 NodeAddr=192.168.5.101 CPUs=4 State=IDLE
220
+ NodeName=pixie002 NodeAddr=192.168.5.102 CPUs=4 State=IDLE
221
+ ```
222
+
223
+ - Restart slurm
224
+
225
+ ``` bash
226
+ sudo systemctl restart slurmctld
227
+ ```
228
+
229
+ - Install ESSI
230
+
231
+ ``` bash
232
+ mkdir essi
233
+ cd essi
100
234
wget https://raw.githubusercontent.com/EESSI/eessi-demo/main/scripts/install_cvmfs_eessi.sh
101
235
sudo bash ./install_cvmfs_eessi.sh
102
236
echo " source /cvmfs/software.eessi.io/versions/2023.06/init/bash" | sudo tee -a /etc/profile
103
237
```
104
238
105
- - Create a shared directory
239
+ - Install a client node
240
+
241
+ Flash another SD card for a Raspberry Pi. Boot it up with internet access and run the following:
106
242
107
243
``` bash
108
- sudo mkdir /sharedfs
109
- sudo chown nobody:nogroup -R /sharedfs
110
- sudo chmod 777 -R /sharedfs
244
+ sudo apt-get install -y slurmd slurm-client munge vim ntp ntpdate
111
245
```
112
246
113
- - configure slurm
114
- - slurm.conf
247
+ - On a Linux laptop (or with a USB SD card reader) take an image of this:
115
248
116
- - configure cgroup
117
- - cgroup.conf
118
- - cgroup_allowed_device_file.conf
249
+ ``` bash
250
+ dd if=/dev/mmcblk0 of=node.img
251
+ ```
252
+
253
+ - Copy node.img to the master Raspberry Pi's home directory.
254
+
255
+
256
+ - Setup PXE booting
257
+
258
+ Download the pxe-boot scripts:
259
+
260
+ ``` bash
261
+ git clone https://github.com/carpentriesoffline/pxe-boot.git
262
+ cd pxe-boot
263
+ ./pxe-install
264
+ ```
265
+
266
+ Initalise a PXE node:
267
+ ```
268
+ ./pxe-add <serial number> ../node.img <IP address> <node name> <mac address>
269
+ ```
270
+
271
+ for example:
272
+ ```
273
+ ./pxe-add fa917c3a ../node.img 192.168.5.105 pixie002 dc:a6:32:af:83:d0
274
+ ```
275
+
276
+ This will create an entry with the serial number in /pxe-boot and /pxe-root.
119
277
120
- - configure munge
121
- - munge.key
278
+ - Copy the Slurm config to the node filesystems
122
279
123
- - disable wifi in compute nodes
124
- - /boot/firmware/config.txt
280
+ ``` bash
281
+ cp /etc/slurm/slurm.conf /pxe-root/* /etc/slurm/
282
+ ````
283
+
284
+
285
+ # # Test PXE booting
286
+ * Boot up a client
287
+ * Run sinfo to see if the cluster is working
288
+ You should see something like
289
+
290
+ ` ` ` bash
291
+ PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
292
+ pixiecluster* up infinite 5 idle pixie[002-006]
293
+ ` ` `
0 commit comments