Tolerating Fault in an Intolerant World
The Limitations of Clustering

Brian Proffitt
Monday, December 23, 2002 09:49:05 AM
Buzzwords are as common in the Linux world as they are in the realm of
proprietary software. This truism is certainly highlighted by the
recent popularity of such terms of "thin versus fat," "scaling up
versus scaling out;" and "consolidation versus clustering."
Clustering itself is a buzzword that is popular in all aspects of IT--it is not
something that is native to Linux, though Linux is being adopted for
clustering use at a very fast pace. That is mainly because Linux is
ideally suited for multi-server parallel processing, usually right out
of the box. Couple that with Linux's affordable licensing fees (read:
little to none), and you have a strong recipe for success.
Clustering is usually used for one or two specific computational
needs: high performance and high availability. It is important to make
this distinction early on, because clustering may not be the universal
elixer that it is often made out to be.
High-performance computing is ideally suited for organizations that
need to crunch a lot of numbers in as little amount of time as
possible. These include problems such as mapping the cardiac tissue
impulse work being done at the University of Alabama Birmingham, or
the geological work done by companies in the oil and gas
industry. These are problems that have a lot of data and a lot of ways
the data can fit together--the kind people used to haul out a Cray
supercomputer to handle.
Clustering is suited for high-performance work because be adding more
and more processors ("nodes") to a cluster, you can get some serious
processing power that approaches (and surpasses) supercomputer level
computational speeds for a fraction of the cost.
High-availability computing is something else that clusters can be
used for. If you have a need for a lot of transaction-handling that
need 24/365 uptime, clusters are good because if one processor fails,
then the load will automatically be handled by the other nodes until
the faulty processor can be repaired.
This sounds very good, and it is. But there are some challanges to
making this all work smoothly. For instance, not all software can run
on a cluster. It's not something that you just bring up on the screen,
type "go," and expect to take full advantage of the parallelism that
makes a cluster really shine. There is a significant amount of work
that needs to be done to re-tool an application to run in a cluster.
For high-performance work, this sort of thing is a necessary
evil. After all, you were likely going to have to port the application
to a new platform anyway, and porting to a clustered Linux farm is
still a lot easier than porting to a proprietary mainframe OS.
But to have to do this sort of work for a high-availability cluster is
a notion that one major computer manufacturer is challanging. In fact,
this company is turning the whole notion of using clusters for high
availabilty computing on its ear.
The company is Japan's NEC, which is currently promoting something it
calls "Unstoppable Linux"--a deliberate send-up of Oracle's
"Unbreakable Linux." That's because unlike Oracle, which is
approaching the high-availability from the traditional clustering
direction, NEC is using what it terms "Fault Tolerant Linux"
hardware/software combination to bring users high availability.
Next: Introducing Unstoppable Linux »