ext3 or ReiserFS? Hans Reiser Says Red Hat's Move Is Understandable
Red Hat's Decision is Conservative, Not Radical
Red Hat's decision to employ ext3 as the default filesystem in its upcoming release has sparked considerable interest among technically savvy Linux users. But it is not the only, nor in many ways the best, of the journaling filesystems available to users of modern Linux kernels. Yet it has attributes that make it an attractive first step for a large distribution, chief among them backward compatability.
A brief and incomplete explanation for those who have not followed journaling filesystem development:
The traditional Linux filesystem, ext2, is ideally suited for fairly small files on fairly small drives. As the size of drives has grown, and the size of files has, too, performance has suffered. Some of this is in gaining access to data on the drive, as wasted space -- "slack" -- and fragmentation have grown. Some comes in the filesystem's recovery time in the event of power failure or other improper shutdown. Enduring a filesystem check by e2fsck on a one-gigabyte drive is easy; the same check on a 40-gigabyte drive can be very time consuming. Some comes in the bitmap method of keeping track of the filesystem -- satisfactory for small drives with few files, it's inefficient with the large drives commonly employed today. Hence, journaling filesystems.
These keep track of the state of the drive in a file, called a journal, so that restarting after an improper shutdown requires reference to that lone file for restoration of the filesystem's state instead of a scan of the entire drive. Additionally, depending on their design, journaling filesystems make more efficient use of drive space and make data reads and writes faster over a wide variety of file sizes. To top it off, journaling filesystems offer what amounts to dynamic space allocation, meaning that the system administrator needn't guess at appropriate partition sizes at the time of installation, and they offer the potential of spanning drives in a single logical volume. A journaling filesystem is something that becomes essential as programs and their data files (and the drives that hold them) grow huge.
Linux does not have a journaling filesystem. It has four. Well, three and a half:
- Reiser filesystem, named after Hans Reiser, is probably the best known of this quick new class of keeping track of the contents of hard drives. It has worked and been in relatively wide use for more than a year, and is the filesystem recommended by the installation program of SuSE 7.1 and 7.2.
- JFS, developed by IBM and made available to the Linux world, is designed with high throughput in mind. After a series of betas that began in February 2000, its 1.0 release became available at the end of June.
- XFS is the Silicon Graphics, Inc., journaling filesystem, also made available for Linux. It, too, offers all the features of full-blown journaling filesystems.
- ext3 is the "half" of a journaling filesystem mentioned above. Why half? It is a layer atop the traditional ext2 filesystem that does keep a journal file of disk activity so that recovery from an improper shutdown is much quicker than that of ext2 alone. But, because it is tied to ext2, it suffers some of the limitations of the older system and therefore does not exploit all the potential of the pure journaling filesystems. This is not entirely bad, though, because it means that ext3 partitions do not have a file structure different from ext2, so backing out to the old system (by choice or in the event the journal file were to become corrupted) is extremely simple.
Red Hat's adoption of ext3 is a first, tentative step toward a journaling filesystem. When the company's plans became known with its release of the second beta of its upcoming release, Michael K. Johnson, chief of the company's kernel hackers, was quick to provide a rationale.
"Why do you want to migrate from ext2 to ext3? Four main reasons: availability, data integrity, speed, and easy transition," he wrote. Availability, he pointed out, involves quick recovery from a system interruption rather than enduring e2fsck taking the long way around. The journaling provided by ext3 makes avoiding data corruption likelier. "Despite writing some data more than once, ext3 is often faster (higher throughput) than ext2 because ext3's journaling optimizes hard drive head motion," he wrote. Perhaps the determining factor, though, was Johnson's fourth reason.
"It is easy to change from ext2 to ext3 and gain the benefits of a robust journaling filesystem, without reformatting," he said. "That's right, no need to do a long, tedious, and error-prone backup, reformat, restore operation in order to experience the advantages of ext3."
Johnson said that Red Hat's choice was not meant to disparage any of the other new filesystems, but instead was the most sensible one for the biggest commercial distribution right now. Indeed, the developers of the various journaling filesystems, too, have gone to considerable lengths to avoid a holy war of the kind that erupts frequently among backers of different projects that perform similar functions.
"I personally think filesystems should be rewritten from scratch every 5 years, but there are lots of people who think quite differently on this," said Hans Reiser, for whom the Reiser filesystem is named, in an email interview yesterday. "Reiser4 is going to have a completely new core engine, and quite a lot of people think that we should just make lots of tweaks to what we have instead. It is extremely expensive, risky, and just plain hard work, for us to do that core engine rewrite, and yet I think it just has to be done. I could give you lots of logical reasons why we are doing it, but those aren't the real reasons why we rewrite when other filesystems don't. People just have different styles, and fortunately both styles work in their way, each with different effects and benefits."
While pointing to benchmarks that demonstrate a substantial speed increase when using the Reiser filesystem as opposed to ext3, Reiser said there's sense in Red Hat's more circumspect approach.
"ext3 is in its way an excellent filesystem written by very talented programmers, and the upgrade path is surely easy for users and distro alike," he said. "The upgrade path issue really makes it a conservative rather than crazy decision for RedHat; I can easily understand their decision."Even as there are many distributions, allowing users to select one that best suits their needs, the multiplicity of journaling filesystems is, Reiser said, a sign of health, offering users to select one to match their knowledge and comfort level. His development effort is to push the technology to the maximum.
"Reiser4 is designed to be highly extensible thanks to DARPA's funding us to do plugins. There are lots of semantic enhancements like inheritance and auditing coming down the pipe in version 4. We want, in our small way, to help make Linux not just another Unix, but something novel and cutting edge. This is the main reason users should find Reiser4 of interest. Not every distro is attracted to pushing past traditions though, and the beauty of Linux is that users get to choose what distro they need. I think that Microsoft is going to heat up the race for semantic innovation in the filesystem namespace in the next few years. We are going to try to innovate faster than Microsoft in the filesytem namespace enrichment arena, and I hope you will wish us luck in it."
There is an enormous body of highly technical literature explaining not just the superiority but the inevitability of journaling filesystems. While not entirely one of these in the strictest sense, ext3 provides a painless way for nontechnical users to enjoy some of the benefits of the new high-power systems, while keeping one hand on all that is familiar. But it seems clear that, as storage, code, and user data grow in size, and as flexibility in storage options grows, today's cutting edge will be in universal use tomorrow, and ext2 and its derivatives will take a place in history -- an honored place, to be sure, but history nonetheless. For now, users have the choice to dive in head first, dip their toes, or remain entirely ashore.