April 24, 2019

SGI Busts into Linux with 64-Processor Scalability

Breaking the Eight-Processor Ceiling

  • January 7, 2003
  • By Brian Proffitt

There's been a rather odd rumor in the world of high-performance computing that Linux does not scale well beyond eight processors. But Silicon Graphics, Inc. has blown that myth right out of the proverbial waters with today's release of a new line of servers that can physically scale up to 12 processors and 96 GB of memory.

If that does not sound like a major breakthrough, then how about this one: SGI also announced the release of a supercluster product line that will scale up to 64 Itanium 2 processors and 512 GB of memory in a single node.

Using SGI's new hardware architecture, these 64-processor machines can be tied together to form superclusters that eventually will virtually behave as a single computer with up to 2,048 processors.

This may seem impossible to believe, if you listened to hardware vendors that proclaimed left and right that Linux simply did not scale beyond the magic number eight processors. Those familiar with Linux and its underpinings in the kernel knew full well that there was nothing inherent in the operating system that was creating this limitation. SGI's new Altix 3000 product line seems to prove this once and for all.

SGI, which has long been in the business of high-performance computing for technical and media clients with its IRIX operating system on MIPS processors, decided that it would capitalize on the same architecture the company uses to create 1,024-processor IRIX Origin 3000 servers. This architecture, known as NUMAlink, allows data to be shared between separate machines with only a negligible performance hit, compared to multiple processors on the same machine.

Addison Snell, Marketing Manager of High Performance Computing, explained is in this way: on a local multi-processor machine, the average time that data takes to make a round trip across a switch between processors is about 200 nanoseconds. If a similar operation is made between a local processor and a remote, clustered processor, that time is up to 10 microseconds.

Using NUMAlink, that local-to-remote round trip is slashed down to just 50 nanoseconds--200 times faster than ordinary clustered switches. Because of the speed enhancements in connecting nodes together under the NUMAlink architecture, SGI has been able to bring a new trick to their Linux/Itanium machines: something they call global shared memory.

One of the drawbacks to running a conventional cluster is that applications may need to be tuned to the limitations of multiple-OS memory management, Snall said. For instance, on a 4-node, 16-processor cluster, there will be four instances of an operating system--and each instance will have to deal with its own machine's memory separately.

If the application running on the cluster needs more memory than one single node in the cluster has, then it will have to be reprogrammed to adjust for the limitations of each node's memory capabilities.

With global memory management, each node's operating system instance still deals with the memory on that node--but if one node suddenly needs more memory, it can request--and get--memory capacity from one of the other nodes.

"In a regular clustered system," Snell mused, "the nodes would be talking to each other like you and I are having this conversation. With this shared memory management, it would be like we could read each other's minds."

The advantages of this process are three-fold, according to Product Manager Jason Pettit. Now, applications wish could not run on clusters before due to memory requirements can utilize clustered technology. This solution also eliminates the need for inefficient overallocation of memory that some clusters have just to avoid memory limitations. And finally, Pettit added, "it just flat out runs faster."

The Altix systems have also cranked up the I/O rate to a huge 2 GB/second throughput rate, which is another reason why large, memory intensive applications will find a better home on these superclusters than other cluster systems, Pettit said.

Pettit described some of the technical aspects of the Altix line. SGI is using a stock 2.4.19 kernel, with the appropriate patches that allow the kernel to use the hardware's chipset. Pettit emphasized that much of the work was done in conjunction and cooperation with standing open-source projects, such as the Linux for Large Systems project, the Linux Scalability Project, and the work done by David Mosberger to port Linux over to the IA-64 platform.

Because of the adherence to the IA-64 standards, any binary capable of running on a 64-bit platform can be run on the Altix machines.

Most Popular LinuxPlanet Stories