NASA Boosts Linux SGI Supercomputer to 512 Processors
Double the Processors, Double the Performance?
The National Aeronautics and Space Administration (NASA) is now doubling the number of processors in its Itanium 2 system from SGI, first announced October 20 at 256 processors. Late last week, NASA started testing the computer-based climatory, ocean, and fluid dynamics modeling system with 512 processors in a single system.
"At 6:50 pm on October 29, we booted the 512-processor system. So we are 'there,'" said Robert B. Ciotti, a NASA researcher, in an interview at the end of last week. "We're still working on a few issues, but we expect to have an operational system very soon."
NASA and SGI, co-developers of the Linux-enabled supermodeler, previously worked together on high-end shared memory systems based on SGI's MIPS-based processors and Irix OS.
"Climate and ocean modeling are big at NASA. We also look at computational fluid dynamics to see the flow around airplanes and rocket motors, for instance," Ciotti said.
SGI first tested Linux running on Intel processors internally, back during the days of Itanium 1, according to the NASA researcher. "That, however, was just a first try for SGI at doing high-end processing on an Intel architecture. The Itanium 1 didn't meet expectations. When you're doing lots of memory, you need tremendous amounts of memory bandwidth."
NASA perceives a number of advantages to running Linux on Itanium 2, he said. In terms of speed, the current Intel processor is "near the top of the list, if not the fastest," according to Ciotti.
"Linux is a lightweight form of Unix. It doesn't have all the tons of features that SGI and Sun (previously) added to their own versions of Unix, to make it more robust," he noted.
"The Linux code base is relatively small, but it gives the ability to progress to a large-scale system very quickly. People have access to the source code, which makes it easier to add features that will support the use of up to 512 processors for a single image," he added.
"We've been looking for support for lots of simultaneous transactions, as well as the ability to control memory allocation so that memory can either be unified or divided up between several users. To reduce contention, you need to be able map parallel threads of execution, and the data associated with them, to memory so that they are located as near to each other physically as possible."
Yet doubling 256 processors to 512 has turned out to be a more difficult prospect than the earlier doubling of 128 processors to 256, he observed.
"The first (Itanium 2) system we used had 128 processors. We got every encouraging results, so we decided to go to 256," he elaborated. Based on previous experience with large-scale systems, the researchers anticipated challenges with using large numbers of file descriptors, as well as running certain software modules, across 256 processors. "We were very surprised, however, that we were able to perform the linear scaling so well."
Moving to 512 processors, on the other hand, has been more than a matter of just redoubling systems performance. "Our assessment is that the system will support 512 processors. Will the applications scale, though? Can you get the system to behave well at 512 processors? Everything might be good till you get to 300 or 400 processors, and then you can hit a 'knee in the curve.'"
"There are certain programming issues: locking mechanisms, event handling. These all have to do with memory allocation, in one form or another. I think we can work around them, though. Basically, we need to get the OS out of the way," he added.
In the SGI/NASA co-development effort, SGI brings "systems experience" to the table, according to Ciotti. "NASA brings expertise in applications and integration."