Protogonus is our Beowulf cluster used for preparing, debugging and inspecting various parallel applications that use MPI.
The journal gives an account of building Protogonus, our Beowulf Cluster. Since this construction is something that we have not done before, we encountered a lot of dead ends that are not always detailed in our journal, primarily related to the difficulties of booting from USB.
All the events in this journal took place early 2021 using software up-to-date at that time and somewhat outdated hardware equipment. While days below refer to actual calendar days, a day usually does not mean a full day, just a couple of hours or sometimes even less. Only active days are enumerated, days with no work done are not shown.
An important lesson learnt was that there are an awful lot of outdated documents on this subject that are on the top of Google’s search results. It is also worth noting that skipping the hostname step included in other descriptions was not a good idea and since got remedied.
Preparatory work: cataloguing and some decisions
Preparation works included gathering, cataloguing, and checking the hardware planned to be assigned to the FINAL Labs™ Protogonus HPC Cluster. These include network components and various boxes that are to act as nodes of the system.
The early decision to go with Ubuntu was made, even though other solutions, such as rocks, are available. We believe that a more mainstream platform can mean access to more support material if needed. For a lean setup, Ubuntu Server was chosen.
Boundary & VPN
On Days 1 and 2 the boundary node was set up, ensuring that outside access to the dedicated HPC network is both secure and safe through a VPN tunnel. Day 2 ended with successful login to the network from an outside terminal and testing the egress and ingress of a dummy head node located within the Protogonus network.
Day 3. Connection from our development computers, primarily running Windows, was implemented and further tested from outside the Protogonus network to the level of the Beowulf cluster’s nodes. Remote management is possible, for example, using PuTTy and SSH. We also installed, as a test, a number of basic software tools needed to manage the nodes in the cluster.
Setting up the actual location
Day 4. As part of another, unrelated, project, the future physical location of the nodes was prepared.
Identifying the performance of future nodes
Day 5. Benchmarking the nodes. We used PassMark’s PerformanceTest (for Windows 10) to get a better and more data-driven understanding of computer speeds. The tests crashed for some reason on three of the nodes. Nevertheless, they did help us pick the head node. This, according to the Beowulf HOWTO, should be the most powerful computer available.
Trickier than expected: installation
Booting, however, required a lot of struggle and fiddling around with various ‘legacy boot’ and ‘secure boot’ BIOS settings, and also trying (in vain) Unetbootin. We also came across a description of how to netboot that we eventually did not try. At the end it turned out that in the case of this specific computer, a Lenovo, a somewhat secluded ‘OS Optimized Boot’ setting also had to be disabled, as described here. Installing to GPT with Rufus as described here might also be useful (this is not the default setting).
Sometimes opening BIOS settings might be tricky. This link might help if this is the case. It is also possible to boot into the BIOS from Windows through the Restore menu in Settings.
It seems that secure boot should be disabled.
The milestone achieved on Day 6 thus was successfully booting from the USB stick. Ubuntu Server is not installed yet.
Day 7. Installing Ubuntu Server has been unsuccessful so far. Problems include the sheer booting from the USB, consistently booting the installed system, and eventually setting up WIFI.
So far, we have tried installing Ubuntu Server on three nodes. Only one of them worked out, but even in that case the USB stick seems to need ‘repair’ following the install (as reported by Windows 10 on another box), Furthermore, the booting sequence is somehow corrupted and eventually ends in a diagnostic loop. Installing WIFI on this box has also proven unsuccessful, even though any missing components can probably be installed through a wired LAN.
It appears that disabling LVM and Open SSH during install might lead to a more robust process. SSH can be installed in a later step. Installing without the network cable plugged in also smooths out the installation.
Tomorrow will undoubtedly be a good opportunity to start again.
Day 8. Ubuntu Server was successfully installed on one of the boxes and the wired network can be accessed successfully. As detailed in one of the online Beowulf installation instructions, we chose a strong password (this will be used throughout the cluster so that management can be automated).
We have encountered these known problems:
- system bootOrder not found (on HP boxes Xeno’s answer regarding Customized Boot does work)
- cloud-init smg in login screen (this is not a major problem following the first boot)
It also appears that Windows 10 is not happy about the USB stick once it is plugged back in following an install on another box.
Creating a partition from Windows is described here. When installing to a new partition, the target partition should be created from Windows but not formatted, nor mounted. Then the Ubuntu installer can use it as destination as intended. In such cases a bigger, install or root, and a smaller, boot, partition will be needed.
Day 9. Getting the WIFI to work was a bit challenging. Eventually, by shuffling the devices around quite a few times, we succeeded. In the final setup, there is a wireless access point installed as part of the Protogonus boundary. Just to help others and for clarity, the right plug order is:
Main WAN modem&router to Protogonus Boundary Router: from LAN to WLAN
Protogonus Boundary to Wifi Access: from LAN to LAN
To connect to WIFI (first you will need the wire):
- sudo apt update
- sudo apt install network-manager
- sudo nmcli dev wifi connect <mySSID> password <myPassword>
Completing Ubuntu installation
Day 10. Continuing the installation of the nodes.
All-in-all, the biggest challenge in our heterogenous setup has been getting the USB boot to work, including getting rid of secure boot, getting legacy etc. boot settings and various drive partition combinations right.
All the nodes are installed now, apart from one. We have a single leftover box which will need some tweaking as it has no ethernet port. It appears that Ubuntu Server would need wired internet to install WIFI. A few software-based ideas and an Ethernet to USB dongle came up as ideas to circumvent the problem, but for now we want to move along and set up the Beowulf cluster. We will try and add this extra node later.
Day 11. SSH has been installed on all the nodes. The process is simple and is described here.
- sudo apt update
- sudo apt install openssh-server
- sudo ufw allow ssh
- sudo systemctl status ssh
To get the ip address of the target, use ip -4 address from Ubuntu. Then connecting from a computer external to the Protogonus LAN is relatively easy using the VPN tunnel through the cluster’s boundary.
Putty is handy for Windows. During the initial login, a warning screen is displayed, as anticipated.
Let’s make this into a Beowulf with MPI
With this done, we are entering the Beowulf realm. Let us try and identify the next steps.
The document titled BeoWulf Howto seems to be somewhat outdated, as hinted by the use of LAM described in this Stackoverflow post. According to Beowulf.org, MPI is now the way to go. To add to the confusion caused by outdated documents that Google brings up, what once was MPICH2 is now called MPICH. OpenMPI is another implementation of MPI, same as MPICH.
- sudo apt-get install libopenmpi-dev
Installing MPICH is equally simple using
- sudo apt-get install mpich
Differences might be shown by a relatively old Stackoverflow’s thread, this Quora entry, and a reddit thread suggesting that benefits and disadvantages of using one over another seem to be kind of a wash. There are more Google hits for OpenMPI compared to mpich.
Eventually we decided to go with OpenMPI for the Protogonus Cluster.
In addition to libopenmpi-dev, which seems to install a lot of things such as gfortran, installing openmpi-doc might also useful. The OpenMPI documentation can also be accessed online.
An important lesson learnt today was that there are an awful lot of outdated documents on this subject that are on the top of Google’s search results.
NFS, the Network File System
Day 12. More networking in order to eventually install the network file system, NFS, to the cluster. This will serve as a basis for communication between the head node and the worker node.
To make network navigation easier, we installed a few networking tools:
- sudo apt-get install nmap
- sudo apt install net-tools
- sudo apt install clustershell
These might help querying the network neighbourhood using
- sudo nmap -sP 192.168.1.0/24
or whatever your subnet ID is, and managing the nodes in bulk.
As another step, in preparation for the NFS setup and to make future management easier also with clush, we have set the nodes’ wifi connection to use fixed IP addresses:
- sudo nmcli con mod YourConnectionName ipv4.addresses xxx.xxx.xxx.xxx/24
- sudo nmcli con mod YourConnectionName ipv4.method "manual"
- sudo nmcli con mod YourConnectionName ipv4.dns "184.108.40.206 220.127.116.11"
- sudo nmcli con mod YourConnectionName ipv4.gateway YourDefaultGateway
- sudo nmcli con down YourConnectionName; sudo nmcli con up YourConnectionName
It is important to use the ; between the two commands in the last row, so that you can reconnect through Putty (the connection will be lost either way). A sudo reboot is also a good solution.
The source we based our solution on was an askubuntu post, but the answers needed to be combined together.
nmcli has proven to be a relatively comfortable method of managing our wifi.
On the head node, the nfs server was installed.
- sudo apt-get install nfs-kernel-server
To properly access the worker nodes through clustershell, passwordless ssh login also needs to be implemented. This will be the objective of the next day.
Users, IPs, and passwordless login
It is worth noting at this point that skipping the hostname step included in other descriptions was not a good idea and since got remedied.
Day 13. We set each node to a fixed IP with the naming scheme of 100 for the head node and 100+i for the ith worker node. This makes cluster management somewhat easier. There were earlier network settings that led to a persisting old IP address but eventually the entire cluster is listed along these IPs.
The next step is to create a dedicated mpi user on each node. The user should have the same name, uid and password on each node.
- sudo adduser mpiuser --uid 911
We chose 911 as the uid because 999 was already taken on the head node (and perhaps elsewhere).
Let’s set up passwordless login for clush to work. The description is clear and the process is straightforward. We set up passwordless login for both the main user and the newly created mpi user, just in case.
If everything works, the following command or the like should jog through your nodes and run whoami on each node, returning the response:
- clush -w 192.168.1.[101-150] -L whoami
-L makes sure that the output is sorted by node name (here by IP address), for clarity.
This is the first major and spectacular milestone of the setup process. Managing a number of computers at once through clush is an amazing experience.
To run sudo on the nodes, the only workaround we have found was:
- echo 'password' | clush -w .... sudo -S command
where you need to enter your sudo password. Perhaps cat passwordfile can be used instead of echo ‘password’ as detailed here. The ‘ can perhaps be omitted.
Day 14. Late yesterday it turned out that the power supply to some of the nodes wasn’t reliable and also that the gateway was not properly set when we moved to the fix ip layout, preventing the cluster from reaching the world wide web, while working properly when just communicating between the Beowulf cluster’s nodes themselves. The code that solved the gateway problem is already added to the list of commands for Day 12 above, so if you followed the instruction there, you should not encounter this problem. Sudoing via clush for these commands was not successful and we applied the changes one-by-one instead. Fortunately, passwordless login speeds things up a lot.
More NFS installation
Day 12’s nfs install is continued on the nodes with
- sudo apt-get install nfs-common
Sharing the NFS folder (the home folder of the mpiuser created for the MPI’s purposes) is done by adding the line
to /etc/exports with the command
- sudo nano /etc/exports
- sudo service nfs-kernel-server restart
- sudo ufw allow from 192.168.1.0/24
In the last command, use your own subnet.
You can then check if all is working using
- clush -w <worker node ip range> -L showmount -e <IP of the head node>
Then the shared folder needs to be mounted on each node; anything before the clush command is our workaround for the sudo password entering problem:
- echo <yourpassword> | clush -w <worker node ip range> -L sudo -S mount <IP of the head node>:/home/mpiuser /home/mpiuser
And then you can make sure that this works:
- sudo touch /home/mpiuser/test
creates a file called test on the head node, and this should be visible on each of the workers:
- clush -w <worker node ip range> -L ls /home/mpiuser
To mount the folder automatically on the worker nodes, the /etc/fstab file on each of them needs to be supplemented with
- <IP of the head node>:/home/mpiuser /home/mpiuser nfs
Use nano, same as above.
- sudo reboot
To test after reboot, you can use clush and ls, same as above.
We did not use the host file, just the IP addresses. Perhaps this might lead to problems if the subnet changes at a later date, hence we might update the setup later (we have moved to the solution using hostfiles since). Also, perhaps it would be better to only use a subdirectory within the mpiuser folder, instead of all of it. We shall see, we shall see.
Installing Open MPI
And now for mpi. To install Open MPI, its man pages, and the make utility, you need to run the following commands:
- on the head and the workers: sudo apt-get install openmpi-bin
- on the head: sudo apt-get install openmpi-doc
- on the head: sudo apt-get install make
- on the head: sudo apt-get install g++mpirun
This installs a lot of things, including gfortran.
When trying to copy codes for testing found online, this makefile tweeking information might help.
For a C code (we choose helloworld as shown here), compilation using mpicc was straightforward. The runnable file in our example was a.out.
shows output from a single process; using
- mpirun ./a.out
runs through all the cores in the head node (this might be half the processes of what a regular threaded program would show).
To run the code throughout the entire cluster, you need to create a file with your worker hosts (you can also include the head node, actually). This is just a simple list of IPs or host names. Then you can use
- mpirun -hostfile <name of your host file> ./a.out
and each node responds with their output.
This is quite an exciting moment.
At this point it is also clear that working with IP numbers instead of hostnames will not be sustainable or easy to maintain on the longer run and therefore we are going to eventually set up the host names and host files for the entire system.
Build me a helloworld worthy of a cluster
In order to do something more clusterlike using our new system, we are running the Monte Carlo PI calculator. To run this program from this specific source, we had to change MPI_Intracomm to MPI_Comm within the code. This might have to do with Open MPI being used instead of MPICH or perhaps the source is slightly older. The program apparently does not do anything when run on its own, it seems to need mpirun even when running for a single node. With this code, speedup is not easy to estimate because there is randomness involved.
Something seems to be fishy though, because running the program only on the head node is much faster than when the entire cluster, including the head node, is invoked. The same happens when the program runs on a single node only. We are not going to investigate this, because the purpose of this part of the installation was to see if the process works.
Nevertheless, at this point it can be announced that the Protogonus Cluster is up and running.
And now, to conclude the process, let’s run something in Fortran. We chose the sequential prime counter code. Compilation does not need any tweaking this time around, and the speed increase is consistently detectable.
Possible future improvements
There are a lot of things that can be done.
This includes fine tuning the Ubuntu setup across the board, eliminating its bottlenecks that slow the system down during boot time or afterwards, improve security, and implement hostnames in an Ubuntu hostfile.
Scripts can be created to make management and development more convenient.
Benchmarking also sounds interesting and, due to the inhomogeneous nature of the nodes, seems to be unavoidable.