The Heart of the Penguin - page 2
The Heart of Darkness
In a small server room on the campus of UAB sits a kitchen supply rack, with four big shelves, each of which holds four dual-processor Dell computers. These sixteen boxes are hooked together in a standard Beowulf cluster, so that all 32 CPUs are working on the same computations in concert, providing a level of computation, their administrator believes, that rivals the performance he could get out of a supercomputer and at a fraction of the cost.
Dr. Andy Pollard is a Biomedical Engineer who is very familiar with using supercomputers and mainframes. Throughout much of the last 15 years, he has used them to perform the billions of calculations needed to simulate the effects of electrical fields on human heart tissue in a deep effort to understand exactly what are the triggers for a fibrillation event.
Pollard is now using what he describes as a "boilerplate Beowulf" configuration of machines that are all running Red Hat Linux 7.2 to run the computational software he uses. And one of the exciting things for him is that with the tools that come with Red Hat, he is already achieving 91 percent parallelism *practically out of the box*.
And this power is certainly needed. The research itself is targeted towards three areas: observation of the effect of electrical fields on heart tissue to learn how and why defibrillation works (and, in so doing, track down why fibrillations occur in the first place); learning in a more direct manner why fibrillations start; and how a fibrillation event progresses from start to finish.
In order to simulate these events, models of heart cells are created in great detail, and connected to each other (virtually) as if they were in a huge resistive network. Then, in the models, differential equations are used to determine how each cell might react when an ionic current is applied. It is the solving of these differential equations that the real power of the Linux cluster is needed.
To give a rough idea of just how much computing muscle is needed, Pollard described the amount of real time it would take to run a heart simulation. With one CPU, the amount of time it would take to simulate a 1 ms event on a two-dimensional sheet (measuring 2 cm X 2 cm) of heart cells would be about 450 seconds. Given that a good simulation should have about 1-2 seconds of a simulated event, then the amount of computational time jumps to 125-250 hours (roughly 5-10 days).
With their current cluster setup, Pollard estimates that 1 ms of simulation can be done in 50 seconds, which immediately pulls the time of a full one-second simulated run to 13.9 hours.
Thanks to the open-source nature of Linux, Pollard has already identified areas within Red Hat that can be tweaked to improve the performance of the message passing interface (MPI) and parallel virtual machines (PVMs). These tweaks, coupled with improvements to the application software, may get the parallelism number up to 98 percent--which could reduce that 1 ms of simulated time down to 20 seconds of computational time and bring the time for a total one-second event to a mere 5 and a half hours.
With this kind of performance, Pollard could increase the simulated times of his events, or increase the area of virtual cells that are being tested and gain far more data in the same amount of time it used to take him to run far simpler experiments.
Nor is his program limited to settling for this level of performance.
In the past, Pollard would have to rely on proprietary hardware and software on mainframes and supercomputers to do his work for these research programs. Like many such projects in academia, grants are the financial source for everything. Many times, justifying the $50,000 expense for a new mainframe might be done at the very first stages of getting grants for a project. But, after the project was underway, very rarely could Pollard ever get money to upgrade these proprietary systems--leaving he and his students stuck using computers that would grow more obsolete as the months would pass by.
Now, Pollard says, this kind of problem no longer affects him. "The thing that I think is great," he said, "Is that this is the first solution that provides real independence." Pollard went on to explain that because he is using machines with ordinary Intel CPUs, his initial hardware costs are much lower than they would be for a mainframe. Now he can obtain the same kind of performance for around $10,000--a number that is much easier to get from a grant.
And, because the hardware number is so much lower, Pollard feels free to ask for new hardware on subsequent rounds of funding. With the ability to roll out machines or even just processors on a regular basis, Pollard is assured he can keep his work running on relatively cutting edge hardware.
"Now all I have to do is assume these machines over just the lifetime of the grant," Pollard said.
Some of this upgradeability is due to more than just cost. Because of the standard nature of Red Hat, Pollard can shift his applications to the new or upgraded machines with ease. In the past, variations between different flavors of UNIX, for example, would delay porting his apps from older to newer machines as he had to tweak his software to play well with the new operating system.
This standardized platform assisted Pollard well even before he installed his cluster. During the viability testing for the initial grant proposal for this cluster, he was able to go from system to system around the campus and get performance numbers from other Red Hat machines.
Red Hat also helped him reduce system maintenance costs. In the past, a dedicated support staff member would have to handle many (if not all) administrative tasks on the mainframe Pollard used.
"I don't need a systems person to do everything like before," he said, adding that he now can handle most of the administrative tasks with Red Hat by himself, which saves his grant money for something else--like more students to think of more ways to find a solution to the problem of fibrillation.