Tolerating Fault in an Intolerant World
The Limitations of Clustering
Buzzwords are as common in the Linux world as they are in the realm of proprietary software. This truism is certainly highlighted by the recent popularity of such terms of "thin versus fat," "scaling up versus scaling out;" and "consolidation versus clustering."
Clustering itself is a buzzword that is popular in all aspects of IT--it is not something that is native to Linux, though Linux is being adopted for clustering use at a very fast pace. That is mainly because Linux is ideally suited for multi-server parallel processing, usually right out of the box. Couple that with Linux's affordable licensing fees (read: little to none), and you have a strong recipe for success.
Clustering is usually used for one or two specific computational needs: high performance and high availability. It is important to make this distinction early on, because clustering may not be the universal elixer that it is often made out to be.
High-performance computing is ideally suited for organizations that need to crunch a lot of numbers in as little amount of time as possible. These include problems such as mapping the cardiac tissue impulse work being done at the University of Alabama Birmingham, or the geological work done by companies in the oil and gas industry. These are problems that have a lot of data and a lot of ways the data can fit together--the kind people used to haul out a Cray supercomputer to handle.
Clustering is suited for high-performance work because be adding more and more processors ("nodes") to a cluster, you can get some serious processing power that approaches (and surpasses) supercomputer level computational speeds for a fraction of the cost.
High-availability computing is something else that clusters can be used for. If you have a need for a lot of transaction-handling that need 24/365 uptime, clusters are good because if one processor fails, then the load will automatically be handled by the other nodes until the faulty processor can be repaired.
This sounds very good, and it is. But there are some challanges to making this all work smoothly. For instance, not all software can run on a cluster. It's not something that you just bring up on the screen, type "go," and expect to take full advantage of the parallelism that makes a cluster really shine. There is a significant amount of work that needs to be done to re-tool an application to run in a cluster.
For high-performance work, this sort of thing is a necessary evil. After all, you were likely going to have to port the application to a new platform anyway, and porting to a clustered Linux farm is still a lot easier than porting to a proprietary mainframe OS.
But to have to do this sort of work for a high-availability cluster is a notion that one major computer manufacturer is challanging. In fact, this company is turning the whole notion of using clusters for high availabilty computing on its ear.
The company is Japan's NEC, which is currently promoting something it calls "Unstoppable Linux"--a deliberate send-up of Oracle's "Unbreakable Linux." That's because unlike Oracle, which is approaching the high-availability from the traditional clustering direction, NEC is using what it terms "Fault Tolerant Linux" hardware/software combination to bring users high availability.