Case Study: Clusters and Image Processing, Part I - page 2
After lunch one day, Mark and some of his colleagues decided to take the plunge. Rather than convincing management to take the risk, they went and bought components for a Linux box using one of their credit cards. This box was set up within their own network. Then Dave Burken and Ken Melero began the six-month side project of porting (converting) all 5,000 source code files so they would compile properly under Linux.
The sheer immensity of porting 5,000 files helps to explain why the Linux version of their program took so long to produce. There was another issue to deal with, however. The C++ compiler (gcc) that came with Red Hat Linux 5.2 did not fully support a feature that the ImageLinks team had used heavily--templates. For those not familiar with C++, a template provides a way of creating reusable source code. You create a generic template and then associate specific objects to that template. When a compiler that supports templates comes across object code that points to a template, it takes the data given in the code and plugs it into the template definition, quickly creating the new object.
Because the compiler did not have full template support, Dave and Ken had to try to work around this issue--not a simple task when there are 5,000 source files involved! When Red Hat 6 came out, their work became a lot easier. The version of gcc that shipped with this release supported C++ templates. Suddenly, the work went much faster.
After a total of six months' work, the port was complete. The hypothesis for this experiment was that the SGI and Intel Linux performance should be similar. The team thought the PC performance might even be slightly slower. However, if it was comparable enough, the savings in licenses and hardware would be significant enough to take the performance hit.
Imagine their surprise when they compiled the 5,000 source files and discovered that what took 12 hours to compile on an SGI Indigo 2 took only two hours on an Intel Linux box!
This immense software package is used to analyze satellite and digital aerial photos and create detailed maps that are remarkably free of traditional distortion problems. It has to handle all the numerical data in floating-point format for the best accuracy in the results.
The next step of the initial experiment involved handing the software an image and seeing how long it took to process that image. The hypothesis was that the Linux boxes would perform close to how the SGI boxes did, or perhaps a little slower.
Once again, Linux performed far better than expected. The details are examined in the next section.
The results were so dramatic that it was obvious to ImageLinks that using Linux gave them some sort of advantage over their previous setup. As much as Linux proponents might like to just take this evidence at face value, it is educational to take a look at why Linux turned out to be, by far, the superior solution in this situation.
The folks at ImageLinks feel that two different issues contributed to their results. First is the compiler (gcc), which provides tighter final code than the compiler on the SGI. Direct evidence of this fact is that the binary version of the application produced by gcc is almost half the size of the one produced by the SGI. A smaller binary also contributes to the programming running faster, and it makes more RAM available for other applications and loading data.
Another reason for the faster performance was the move from high-end workstations to commodity computing, which refers to using computers that most people can afford. Changing to PCs from a vendor-locked solution--where ImageLinks could not purchase upgrades or peripherals without voiding its warranty unless it did so directly from SGI--brought the price of each component down, so faster parts were economical. Specifically, ImageLinks could suddenly afford faster CPUs, hard drives, CD-ROM drives, SCSI interfaces, and more RAM.