Hardware Notes: Hard Drive Benchmarking With iozone
Modern machines: clockspeed is only part of the pictureTry this experiment some time on an unsuspecting geek friend:
Ask your unwitting subject about the performance of his or her newest computer, and I bet you'll be treated to a litany of specs: how many megahertz or gigahertz the processor and graphics chip are, how much memory is on the motherboard and the graphics board, what kind and size of cache the CPU uses, etc. But nowhere in this recital will your friend tell you anything specific about the hard drives in the system--no data transfer rates, no platter RPM values, no drive cache sizes, etc. For many computer users, even the hardest of the hard core, disk drives are "just there," and any detail beyond the drive being EIDE or SCSI, or (possibly) UDMA 66 or UDMA 100 or some particular sub-flavor of SCSI, is a mystery.
Modern desktop computers are so fast in all respects that we often have the luxury of ignoring performance issues. (People like me who started with a 1MHz processor have no trouble remembering how we lusted after every extra slice of CPU speed.) But if you're in a position where you need or want to maximize your computer's speed, all but the most compute-bound tasks will likely benefit from a faster disk drive system. Which raises the question: Aside from reading specs (which are really just a first cousin to statistics in terms of veracity), how do you evaluate the performance of a disk drive?
My favorite tool in this area is iozone, a relatively small but full-featured open source program for benchmarking disk systems. iozone was written by William Norcott and Don Capps, with contributions credited in the source code from several other people, and it can be built for over two dozen OS's and versions, including Linux, Windows (32-bit), several BSD's, and Solaris.
One of the annoying but true facts of life in benchmarking is that you have to decide up front exactly what you want to measure, and then take great pains to make sure that's what you really measure. Nowhere in computer system performance work is that more true than in disk drives, where the response time you see from disk activity is a function of a whole gaggle of factors. Besides the characteristics of the drive itself (head seek time, rotational and data transfer speed), you have to contend with the operating system, too, and its penchant for caching disk contents, and in some cases rescheduling pending disk I/O to use the disk more efficiently. (One of the most fascinating results I've seen in this area involved running two mainframe operating systems, one atop the other, when I worked at IBM. The host OS was VM, which is so good at scheduling disk I/O that the guest OS actually ran slightly faster on top of VM than it did on the bare metal.)
So, what do you want to measure? The performance of the bare drive, or how it works in the context of a real world setting, including your choice of Linux distro, applications, and usage pattern, or something in between? There are arguments to be made for and against both sides of this issue, and like almost all interesting questions, there is no single right answer; it all boils down to figuring out what kind of data is most useful to you, and then making sure you get it as accurately as possible.
- Skip Ahead
- 1. Modern machines: clockspeed is only part of the picture
- 2. Modern machines: clockspeed is only part of the picture
- 3. Modern machines: clockspeed is only part of the picture