An In-Depth Look at Reiserfs - page 3
Included in the Linux kernel
Modern filesystem designs, such as OS/2's HPFS, NT's NTFS, and Linux's popular ext2, do a very good job of implementing the things discussed in the previous section. If you have a system crash, it may take a great deal of time to check the metadata during bootup but the odds are good that you will still have all your files when it's done. As Linux begins to take on more complex applications, on larger servers, and with less tolerance for downtime, there is a need for more sophisticated filesystems that do an even better job of protecting data and metadata. The journaled filesystems now available for Linux are the answer to this need.
It's important to note here that we are talking about journaled filesystems in general. There are a number of such systems available, including "xfs" from Silicon Graphics, "Reiserfs" from The Naming System Venture, "ext3" currently hosted at Red Hat, and "Journaled File System" from IBM. In this article, I use "journaled filesystem" in lower case to mean the generic type of system, as opposed to the capitalized version that refers specifically to IBM's software. You can find links to all of these projects in the references attached to this article.
No matter which journaled filesystem is used, there are certain principles that always apply. The term "journaled" means that the filesystem maintains a log or record of what it is doing to the main data areas of the disk, so that if a crash occurs it can re-create anything that was lost. That can be a little confusing, so let's take a closer look at this process.
In public speaking classes, there is an old saying that goes, "Tell them what you're going to tell them, then tell them, then tell them what you told them." This is similar to what the journal does in a filesystem. When the system is about to alter the metadata, it first makes an entry in the journal saying, "Here is what I'm going to change." Then it makes the change. Finally, it goes back to the journal and either marks that change as "completed" or simply deletes the journal entry entirely. There are variations on this sequence, and other ways to accomplish the same thing, but this simplified view will suffice for our purposes.
The idea is that the system can crash at any point in this process but that such a crash won't have lasting effect. If the crash happens before the first journal entry, then the original data is still on the disk. You lost your new changes, but you didn't lose the file in its previous state. If the crash happens during the actual disk update, you still have the journal entry showing what was supposed to have happened. So when the system reboots, it can simply replay the journal entries and complete the update that was interrupted, or it can back out a partially completed update to restore the file's previous state. In either case, you have valid data and not a trashed partition.
These concepts are familiar to anyone who works with SQL databases with their transaction logic. Replaying and completing an operation that was interrupted is called "roll forward" and backing out such an operation to its previous, consistent state is called "roll back." Ideas that were developed to prevent lost data in SQL databases are also valuable on regular mass storage devices. That is the real benefit of journaled filesystems.