Case Study: Clusters and Image Processing, Part II - page 3
The ImageLink Case, Reviewed
To set up the Beowulf cluster, the ImageLinks did the following:
- They obtained the necessary hardware for the 13 machines involved--12 slave (client) machines and one master (server). This hardware included:
- Twelve identical single-processor 650MHz PIII machines, each with 384MB of RAM and 36GB SCSI hard drives
- One dual-processor 650MHz PIII machine, identical to the others except for the aforementioned extra processor, 512MB of RAM, and a total of 150GB of SCSI hard drive space.
NOTE: You do not have to have identical hardware for clustered machines, but it helps.
- Once the machines were individually assembled, ImageLinks set up a rack-mount frame to place them in, as shown in Figure 1.
- The machines were placed in the frame as shown in Figure 2, forming a giant robot-like super machine out of a bunch of individual machines.
- The machines were networked using 100Mbps Ethernet cards, cables, and a switch.
- Linux can be installed on a cluster in a number of ways. ImageLinks installed Red Hat Linux on all the machines first.
NOTE: This would have been an excellent place to utilize Red Hat's Kickstart tool to automate the installation process. The folks at ImageLinks could have cloned the installation on all the slave machines, which would have saved them 11 installs. They could have even temporarily used the master machine as an NFS server to run the installation from and used the scripting capabilities in Kickstart to remove the packages mentioned in Step 6 and add the packages mentioned in Step 7.
- They went through and removed all unnecessary software packages that they had not gotten rid of with the custom install. This includes items such as GUI components, Sendmail (email server) and related components, and Apache (Web server) components.
- They went to the Beowulf Software site (www.beowulf.org/software/software.html) and downloaded the following items and packages:
- BPROC: Beowulf Distributed Process Space--Provides a method of spanning a single process (program or otherwise) across multiple machines in the cluster. It also allows for starting a process on a specific slave from the master.
- Network device drivers--Typically people building clusters require the ability to use high-speed networking, with 100Mbps Ethernet being the minimum transfer speed sought. The Beowulf Software site has drivers for this speed of network hardware in case you don't.
- Beowulf Ethernet Channel Bonding--Implements load balancing across the Ethernet network. Load balancing ensures that whenever the cluster needs to process information, the machine that has the lowest CPU and RAM load is chosen to do the job. This ensures maximum overall system performance.
- PVM-TCL--An addition to the Tool Command Language (TCL) that allows you to work directly with the Parallel Virtual Machine (PVM).
- Virtual Memory Pre-Pager--Used for kernel 2.0.x. It allows for loading data into memory slightly before it is necessary for access.
- LM78 hardware monitor driver--Used for kernels 2.0.x and 2.1.x. It provides an interface to the LM78 hardware monitoring chip through the /proc portion of the Linux file system, which is held in RAM. You can access information such as how hot your CPU and motherboard are, fan speed, power supply voltages, and more using this feature. You must have the chip installed to utilize this driver, however.
NOTE: Because ImageLinks used Red Hat in particular, I will point out that current versions of Red Hat Linux include clustering software as one of the package choices in the Custom install. This can save you a lot of downloading and additional installing. Also distributions are available that focus on clustering, such as Extreme Linux (www.extremelinux.org), which focuses on Beowulf clustering.
- They ensured that the kernel source code and related compilation packages were installed. For example, with Red Hat Linux 6.2, this would entail checking for kernel-source, kernel-headers, egcs, make, cpp, and glibc-devel at the very least.
- They patched the kernel with bproc and perf using the patch command.
- They recompiled the kernel with the new pieces in place.
- They compiled and loaded the drivers they needed for their particular solution.
- They compiled the ifenslave.c source code file to allow for processor load balancing.
- They configured the cluster.
- They located libraries for their software that allow for the implementation of parallel computing. These libraries are detailed in Table 4.
Table 4 Libraries useful for building applications to use on clusters.
Message Passing Interface (MPI)
This library is used for memory management in parallel-processing applications.
This library is used for enhancing the Linux kernel so that it can transparently run a cluster without any of the applications having to be altered.
Parallel Virtual Machine (PVM)
This library is used for grouping machines running Unix variants or Windows NT to work together as a cluster.
- They utilized these libraries to make their software "cluster capable" so that its processes could be divided between the many CPUs.