Back to article
Optimizing Hard Drives For Maximum Speed in Linux
Scheduling and Fair Queuing
May 12, 2009
Imagine a trailer hooked on to the back of an Indy car - the effect on performance would be similar to what happens when you hook up hard drives to a server. That's because a hard drive is a slow, delicate, mechanical component attached to a computer system that is otherwise made up of solid-state electronics built for speed.
To get data on or off a disk, a drive head has to swing into the correct position and wait for the right part of the disk to come around. This can take hundredths of a second - many times longer than it takes to access data stored in RAM, for example. As a result, a disk I/O subsystem can be a huge data bottleneck, and significant improvements in the overall performance of the server can be achieved if the effects of this bottleneck can be minimized.
To understand how this might be done, let's take a look at how data is moved to and from a disk.
Put simply, the I/O subsystem accepts a stream of requests to read or write data to and from a disk which it holds in a queue. To help speed things up, it usually merges read or write requests together if they are close to each other in the queue, and if they involve the same area of the disk.
Read requests are generally given higher priority than write requests because a given process will have to wait for the read data it has requested before it can continue, while it won't have to wait for the result of a write.
The subsystem will also usually detect when data is being read sequentially from the disk and use a technique called "read-ahead" to read and cache the disk block following the one it has been asked to read. This can reduce seek time during sequential reads, although it does nothing to speed up reads to other random parts of the disk, and it is switched off when the subsystem detects random (i.e., non-sequential) reads.
After this simple merging has been carried out, the I/O subsystem could process requests in the queue in the order they are arrive at the head of the queue, but usually it doesn't. Instead it tries to speed things up by using a scheduler algorithm to decide the order in which requests should be processed. At the most basic level a scheduler will reorder requests to minimize the traveling the disk heads have to carry out, as this mechanical process is relatively time consuming. So reads in one area of the disk will be grouped, then reads in the next area, and so on.
To illustrate what scheduler algorithms do, it's helpful to think of elevators as an analogy. When an elevator arrives at the ground floor of a busy office block, many people get in and press the buttons for the floors to which they want to go, in a random order. Clearly it would make no sense to travel to the fifth floor, then the 40th, then the second, and then the 39th, just because that was the order in which the buttons were pressed. A far more efficient strategy is to stop at the floors in ascending order: the second, fifth, 39th and finally the 40th. By traveling in a single direction instead of backing up on itself, the elevator can drop everyone off at their floors in the minimum time.
But it turns out that this type of scheduling is not always optimal for an I/O subsystem (or for an elevator); it actually has a number of flaws. Going back to the elevator analogy, people working on the very highest floors would have to waste a lot of time, stopping at tens or potentially hundreds of intermediate floors, every time they traveled from the lobby to their office. For that reason many large buildings have some elevators that don't serve the lower floors at all, instead they go straight to the upper levels before stopping. The upper-floor workers benefit, but at the cost of the lower-floor workers, who now have fewer elevators that go to their floors.