|
Using RAID in Linux
The Mysteries of RAIDWhen you look at some of the installation documents for any of popular Linux distributions, you will see only few mentions of the term RAID, typically with passages such as "you will need RAID only if you are a very professional systems administrator and you already know what are you doing." Even in the latest documentation of the latest Linux releases, this is likely the only thing you will see about RAID. This is one big reason why I think we should move past this barrier and demonstrate that RAID can be used by "normal" people. RAID stands for "Redundant Array of Inexpensive Disks." This seems to be rather self-explantory, except for that strange word "inexpensive." In reality, it's usually just refers to common PC hard disks, either SCSI and IDE. But some additional explanation is necesary, however. Array simply means multiple units. It is perhaps the most significant term in the acronym--for owners of just one, even huge, hard drive, RAID is absolutely useless. Also, the word "redundant" is not an entirely descriptive label. As you'll see, it is not as easy as that. First, let's begin by describing what RAID can be.
Hardware vs. Software RAIDHardware RAID means that you are the proud owner of some hardware device and you are applying RAID concepts to it. There are a lot of such devices--starting from a simple controller card all the way up to a big strong-box with many cables and hard drives inside. The difference between these devices are that the multi-disk devices are taking care about data organization on hard drives, backup copies, "hot" replacement and other intraRAID stuff, while only asking the operating system of your PC only about one thing--the corresponding driver. As a rule, all disks of such an array are viewable by the OS as one or more "virtual" disk device(s), nearly the same as other common drives. Linux supports a large amount of hardware RAIDs, but almost every serious server vendor tries to add to their favorite controller support for their RAID systems. We should mention cards of vendors such as AMI MegaRAid and its Hewlett-Packard and Compaq-clones, IBM ServerRAID, and controllers by Mylex and DPT. These devices are highly recommended. One should also point out that every serious RAID device isn't SCSI or IDE at all. Usually, when you have the proper controller driver, you can work with Linux as usual--because hardware RAID is absolutely separated from your system. Of course, you can have some problems during the initial loading of your system--not all controllers can be automatically detected during installation and not for all of them have support integrated into the system kernel by default. Sometimes you have to re-build the system kernel and even temporarily use that for the separated "normal" disk. But this typically only happens in very rare and hard cases. A fiscal word of caution about the popular "IDE-RAID" cards should be put forth here. Sometimes such cards can be integrated right onto motherboards, but of the price of this option is rather high. Sometimes when the term "hardware" is applied to devices, it could refer to cards such as the popular and cheap WinModem with the combined presence of a software driver. In the case of the WinModem, this driver exists only for Microsoft software and users of other OSes can't use it. Nevertheless, using the fact that physically the data of "RAID-controllers" are nothing more than multi-channel IDE-controllers, nothing prevents us from using such devices in Linux for organization of software RAID. Software RAID can be organized with the help of the OS and it doesn't need anything except additional CPU time for its support. But, CPU time is the cheapest resource among all that we have. There is myth that states hardware RAID is always better than software RAID. Hardware sellers like this myth very much, for reasons which you can well understand. We can also hear the same line of thinking from system administrators, too, though nobody knows why. Nevertheless, this myth is basically cancelled out because any hardware nowadays usually becomes obsolete before it will actually wear out, and yesterday's beautiful RAID hardware device seemingly perfect properties will just be just lost money tomorrow. Also, RAID hardware devices, as a rule, are incompataible with each other and once a controller or some specific disk was broken it would have to be changed to exactly the same model from exactly the same vendor--otherwise it would be a serious headache. Beyond that, software RAID works with the paritions of a hard drive, not with whole disks, and provides more flexibility. In some cases, well-constructed and organized software RAID works much faster than hardware RAID, and with less cost. With these factors in mind, let's learn more about software RAID. When we refer to "RAID" in the next sections, we will actually be talking about "software RAID." Also, we should stipulate that for all of the techniques descibed in this article, you should have a rather new Linux package, with a kernel version equal to or higher than the latest 2.2/2.4 releases, and with the raidtools version higher than 0.90. In earlier versions, it's simply not a trivial task and there would be a lot of manual work. One last thing, in everything described below we are are assuming that you have Red Hat Linux 6.2 or later, or something from a Red Hat-based distro.
What are We Fighting For?And, by the way, just exactly what do we want from RAID? In general, two things: high speed or high reliability. And, if we can, both of them. We can acheive high speed in some variants of RAID by setting in-out operations in two parallel modes to several different disk devices. We can increase reliability because several kinds of RAID keep track of additional information that helps to restore data after system crash. For example, assume we need a "fast" RAID system. First, it should be noted that RAID can parallelize data streams for physical devices only, so paritions in "fast" RAID systems need to be on different hard drives. If you are using IDE-RAID be sure to remove all slave devices! Any one of these devices will slow down data exchange for other devices because in IDE it's impossible to maintain different data exchange rates with both devices on one cable. For "reliable" RAID systems, you need to remember the above mentioned IDE-RAID caveat too, though for another reason. Even if you have SCSI or some other type of device, don't place too many devices on one interface. For example, in the case of a 40 Mbit UW-interface, with hard drives that each support data streams of 10-12Mbit, we don't need to place more than 3-4 such disks on that cable. Let's discuss "reliable" RAID some more, and just what that term means. You should never think that software RAID will protect you from all software problems and errors or will eliminate the neccessity of performing a backup of your system. Nothing could be further from the truth. Any RAID is a low-level function, and any data corruption done by the system will be invisible to the RAID functions and will be duplicated on additional hard stores. The same holds true about any kind of disk errors, which cannot be detected by controller, either. You also shouldn't try to use RAID in place of an APC reserve-power device--once the electricity is off, some data exchange transactions on the disks could be in different stages of completion, and after the next reload, the array will be asynchronized. To minimize the probability of such trouble, some hardware RAIDs can be integrated with reserve power batteries. So basically, here's what you should know: "reliable" RAID can help you to keep your data safe only in case of good disk hardware error detection, which depends on the "level" of RAID--something we will discuss in the next section.
Counting... 0, 1, 4, 5!Linux supports these RAID levels very well. It also supports their combinations, for example, the popular 0+1, which sometimes can be called RAID 10. Also, there is Linear-mode--also known as "paritioned volumes." We should mention there are a lot of different pieces of literature and documentation about the different RAID levels, so we will browse them only briefly.
What Are We Keeping There?This is a good question. First of all--what don't we need to keep in an array? There is no sense to keep our swap there there, especially in a RAID 0 or RAID 5 configuration. Linux can put its swap on common disks and will handle the swap space better. For example, /etc/fstab configuration can look like: /dev/sda2 swap swap defaults,pri=1 0 0 /dev/sdb2 swap swap defaults,pri=1 0 0 /dev/sdc2 swap swap defaults,pri=1 0 0 /dev/sdd2 swap swap defaults,pri=1 0 0 which means that partitions /dev/sda2 to /dev/sdd2 are using swap with equal priority and the system will balance the load on them itself. The only exception to this approach is when using RAID 1--in this case, the mirroring of swap-partitions can increase the long-life of your system. In case a disk crashes, then, the computer will continue to work with the swap space on the mirror. Should we place the root file system on the array and/or try to boot from it? I don't know the proper answer, and it's a never-ending dispute among system administrators. From my point of view, there is no profitability in such a configuration, and only possible harm when you may not be able to boot at all. In any case, it's kind of a moot point, since nowadays there is no possibility to boot from any RAID except RAID 1. Therefore, if you want to keep file systems (for some reason) on any other level of array, you will need to create a special separate partition (/boot) for kernel loading. Also, I don't think it's good idea to keep /usr on RAID 0 or RAID 5, because in case of array rejection or breakdown, you can easily lose all the useful system tools, and without them you will have really big problems trying to restore your system integrity. There are also the file systems /home, /opt, /var, /tmp, /usr/local and others to consider. When planning RAID, remember that usually UNIX filesystems like /home, /opt, and /usr/local obviously keep "slow-changing" data, and file systems like /var "fast-changing" data. And for /tmp, well, we don't need to take care of it at all after a system crash. So, I recommend that for /home, /opt, and /usr/local the best choice will be RAID 5 and for /var its preferable to apply RAID 0 or RAID 10. Remember, everything you decide about RAID configuration should come from your system targets and common sense.
Setting It UpThe easiest way to create a RAID array is to do it during the installation of any new Linux distribution from the graphical installer. In Red Hat the utility named Disk Druid suits our needs. You can create RAID partitions as easy as simple partitions; then you can combine them into one array and set its level. That's all! However, sometimes Disk Druid is too "clever" and it suggests partitions placement on disks, which goes absolutely against what any system administrator would want.
If this has happened to you, you can easily divide partitions with the
command If you don't want to reinstall a distro, this may be the best way to start working with RAID anyway. Though, as with everything else in Linux -- the best of RAID can be achieved by editing its configuration file. So, with your favorite text editor, create file /etc/raidtab and typing something like this for RAID Linear-mode: raiddev /dev/md0 # raid device name raid-level linear # linear mode nr-raid-disks 2 # number of used disks chunk-size 32 # in this case it doesn't affect at all persistent-superblock 1 # list of partitions below and their placement device /dev/sdb6 # partition name raid-disk 0 # disk number in array device /dev/sdc5 # ... and so on raid-disk 1 # ...For creating such array we just need to execute: mkraid /dev/md0After that, while viewing /proc/mdstat, we can make sure of the workability of our array. This device can be run with next command: raidstart /dev/md0and stopped with this command: raidstop /dev/md0 Easy, isn't it? Once the array is created, the device /dev/md0 can be used for placement of system files as usual--like with any other disk of the system. After reboot, this device will be auto-connected, without any raidstart (or raidstop) needed. You don't need to fix initialization scripts, you don't need to touch absolutely anything! For RAID 0, the file /etc/raidtab can looks like: raiddev /dev/md0 # as above raid-level 0 nr-raid-disks 2 persistent-superblock 1 chunk-size 4 # here, size makes sense. look commons below. # everything is the same, like in example above device /dev/sdb6 raid-disk 0 device /dev/sdc5 raid-disk 1The chunk-size argument in this case means stripe size in kilobytes. For best productivity, (at least in this configuration) the size of the partitions should average out to be the same. The default value is 4KB, however a higher value--about 32KB--will give more productiviy. It should be like the size of disk cylinders. The calculation of disk caches in modern hard drives can sometimes vary, sometimes becoming more like cache size. Creating, starting, and stopping RAID uses identical methods as those describe above. /etc/raidtab for RAID 1 will read: raiddev /dev/md raid-level 1 . nr-raid-disks 2 nr-spare-disks 1 chunk-size 4 # doesn't matter persistent-superblock 1 # as usual device /dev/sdb6 raid-disk 0 device /dev/sdc5 raid-disk 1 # description of drives of "hot reserve" device /dev/sdd5 spare-disk 0 When we have "hot reserve" disks and if one of the "mirror" disks fails, a process of reconstruction of disk information from the proper disk in the array will start in the background. After that, the "hot reserve" disk will be exchanged with the broken disk. Finally, for RAID 5, the /etc/raidtab file will read: raiddev /dev/md0 raid-level 5 nr-raid-disks 3 nr-spare-disks 1 persistent-superblock 1 parity-algorithm left-symmetric # it should be this way chunk-size 128 # "good" value for the beginning # device /dev/sda3 raid-disk 0 device /dev/sdb1 raid-disk 1 device /dev/sdc1 raid-disk 2 device /dev/sdd5 # reserve disk spare-disk 0 This situation is like what we find in RAID 1. Array productiviy depends on chunk-size, so in this case you should increase that value, more than what it is in RAID 0. 128-256KB usually gives good results.
It is important to remember that while formatting the file system with
mke2fs command, you have special argument You point it this way: mke2fs -b 4096 -R stride=32 Only run this command for RAID levels 0, 4, or 5. For Linear-mode and RAID 1, it doesn't make any sense.
Recovering RAID, Hot Upgrades, and Some Final CautionsUsually this is the best method to take for recovery.
"Hot" upgrades refer to the changing out of broken hard disk "on the fly", without stopping the server. This is a very useful ability, especially for servers where even a little downtime could means big trouble. This ability is often supported by expensive hardware controllers, but nothing prevents us from using it in software RAID. But, if your RAID is IDE--forget about it, it's impossible. You can destroy your drive even with unstable electricity or just turning the machine on/off, because there is no such "bug trap" even in the interface. Beyond that, rescanning of IDE devices is absolutely necessary, and usually this can only be done with the BIOS of PC during booting. With SCSI drives, it's a bit harder; but with special cable/disks/cutoff points and powerful controllers you canachieve a hot upgrade. But, before doing anything you should look through the hardware documentation from vendor, and check with the support team for the device if the docs aren't clear. Finally, here are some very definite don'ts when working with RAID arrays:
In general RAID is not that scary, if you look into it more deeply. Nevertheless, with all aspects of RAID, you need always remember simple precept: always back up your files!
|