How the Wolf Will Survive: Linux Supercomputing and Los Lobos
Gigaflops, Teraflops, and Linux
Seymour Cray is undoubtedly rolling over in his grave.
Linux once again proves its scalability and robustness, as the University of New Mexico today announces its plan to build a virtual supercomputer using 256 IBM Netfinity servers, Red Hat Linux, and enhanced Beowulf clustering technology. Dubbed "Los Lobos," which is Spanish for wolf pack (an unfortunate moniker, considering that Microsoft's Windows NT clustering technology is also named Wolfpack), the project is funded by the National Science Foundation and serves as the first cluster of a planned three-cluster group that should eventually become one of the fastest computing environments ever created. It should be operational sometime this summer, and if the specs hold up in real-world performance (the system is expected to handle 375 gigaflops, or 375 billion operations per second), Los Lobos will be the 24th fastest supercomputer in the world.
In terms of structure, the project will rely on a basic, unenhanced Linux distribution--Red Hat Linux--combined with several clustering and optimization tools, all of which will be returned to the community as Open Source software. "We'll be happy to work with the community on code for this class of machine," says Dr. Frank Gilfeather, executive director of high-performance computing at the University of New Mexico. "Most of what we're working on involves additions to management tools and development tools. We're not making any changes to the Linux kernel, though--we're using basic, unenhanced Red Hat Linux." In addition, Los Lobos will feature Maui Scheduler technology ported from the IBM RS/6000 SP platform and developed at the UNM Maui High Performance Computing Center. Maui Scheduler is a tool that packs, schedules and runs a machine at full capacity.
Beowulf technology is also involved. "We view Beowulf as the genuine basis for all Linux supercomputing, and what we're working on is an extension of Beowulf," Gilfeather says. "Our approach to superclustering is an extension that involves mass storage, fast I/O, high speed visualization, and more. Again, our enhancements to Beowulf will be given back to the community."
As stated before, Los Lobos is one of three planned clusters; the other two are implemented at Argonne National Laboratory and the University of Illinois. Within a year Gilfeather predicts that there will be three 10-teraflop clusters, which will be ready for further scalability when IA-64 comes online in the near future.
"When we combine clusters, we'll have a virtual machine room," Gilfeather says. "We expect to be a valuable resource for researchers in a variety of disciplines: weather prediction, tornado forecasting, the total range of quantum physics. A researcher will submit a job, the job will seek a cluster where it can run most efficiently, and then the results will be sent to the researcher. We'll be the equivalent of a national power grid."
This approach to supercomputing--which has been undertaken in several other projects built around Beowulf technology (which we'll discuss later)--is diametrically the opposite of traditional supercomputing pioneered by the likes of Seymour Cray (first at Control Data Corp. and then at Cray Research) where a single, superoptimized unit relied on an elegant hardware design and a stripped-down operating system to achieve high performance. There weren't many models or units produced, leading to sky-high prices and a lack of choice for enterprises needing supercomputing power. (Once you did commit to a Cray, you did get your choice of fabric pattern used for the seating area surrounding the unit; Tartan was reportedly the most popular.) By contrast, today's Linux supercomputers rely to an extent on brute CPU force and clustering software to achieve performance.
Gilfeather has grand plans for his Los Lobos Linux cluster, promising to change the way universities and research institutions approach computing, in essence extending Open Source in a practical way. "The trend in academic computing is community code, where a community of scholars maintain a code base designed for a specific discipline," Gilfeather says. "To date you've seen large, complex and expensive code packages designed for specific disciplines, but we're going to open up the process."
Los Lobos is not the only Linux clustering solution in the works: the NOAA Forecast Systems Laboratory is building a weather forecasting system using 276 unmodified, off-the-shelf Compaq Alpha systems with 667 MHz processors and 512 MB of memory. (Check out the excellent story on this project from Linux Weekly News.)
For IBM, which won the Los Lobos contract in a competitive bidding process that also attracted bids from Hewlett-Packard and VA Linux Systems, "the project is proof of what IBM has been saying all along about Linux: it's a real technology that works,"; says John Patrick, vice president of Internet technology at IBM. "We have several customers that would benefit from high-end computing on a Linux cluster. Down the road, we believe that this will be important for e-business as well--as more and more customers deploy middleware and message queueing, they'll need the kind of power you'll find in a Linux cluster."
And the cost? The initial hardware investment is $1.5 million. Seymour Cray definitely is rolling over in his grave.