layout | title |
---|---|
static |
Current Unknowns and Testing |
We need to test this system prior to using it in a workshop with learners. It may also be sensible to use it initially with a backup option of learners having accounts on an existing HPC system in case of failures. Current questions we have include:
{% include sidebar.md %}
- How reliable is a Pi HPC? Workshops commonly include between 20 and 40 learners. Raspberry Pis are relatively low-spec machines, especially if using the first, second and third generation models, rather than the latest Pi4s. There's an open question about how many simultaneous users a Pi-based login node can support.
- Can our DHCP server support that many users? The default CIDR/24 block of 192.168.1.1 to 192.168.1.254 has enough IP addresses for 255 external clients, but in practice client limits and the memory requirements of managing them (especially on a Pi) can lead to address assignment failures.
- Can the Pi's WiFi interface handle the required traffic? Related to the above, there's an open question as to whether the Pi will handle having 40+ clients connected through the WiFi interface.
- How many Pi nodes do we need in a cluster to handle all the Slurm jobs that will be queued and launched? This is also a relevant question for anyone implementing the Pi HPC in a Carpentries Offline lesson, as for cost-effectiveness, we should provide an estimate connecting the number of nodes to the number of learners.
- Will we need multiple login nodes? If a single login node slows down too much with too many learners connected, it may be necessary to create another login node.
- Will there be a bottleneck in the shared storage USB? USB Shared storage can become bottlenecked under high resource use. Earlier generations than the Raspberry Pi 4 were restricted by USB-2's 480 Mbps limit. Some testing of throughput should be conducted. An external SSD could potentially be faster than USB storage.