Using the InterMezzo Distributed Filesystem

By: Bill von Hagen
Monday, August 12, 2002 11:53:57 AM EST
URL: http://www.linuxplanet.com/linuxplanet/reports/4368/1/

Getting Connected in a Disconnected World

All distributed filesystems provide access to storage that is located on remote systems, known as servers, and enable authorized users to transparently read and write data there over the network. Standard Linux file and directory security in distributed environments is enforced through the use of a network-oriented authentication mechanism such as Kerberos, LDAP, NIS, or even the careful synchronization of password and group entries in the case of extremely small networks.

The introductory article in this series, "Modern Distributed Filesystems for Linux; An Introduction" discussed the fact that various distributed filesystems are designed to solve specific types of problems. This article discusses InterMezzo, one of the more interesting modern Linux distributed filesystems, which is primarily designed to support disconnected operation. Disconnected operation is the interesting case when you need to work on files that are located in a distributed filesystem while you are not actually connected to the network.

Though this initially sounds like an impossibility, disconnected operation is a fact of life in network-centric mobile computing environments today. Many people use laptops or other portable systems as their primary computing platforms. These systems typically access files stored on a network while they are in use on one's desktop, but also need to be able to continue to work on these same files when using the laptop at home or when on the road in order to maximize productivity.

The trivial solution to this problem is to manually copy the files you are working on from the distributed filesystem onto your laptop's local disk. This is not only time-consuming, but makes it easy for you to overlook some critical file. A better solution to this problem is a distributed filesystem such as InterMezzo, which transparently caches the files that you are working on from the InterMezzo server. In this case, caching means that InterMezzo makes a copy of the files that you are working on, stores them on a local partition that has been configured for use as a cache, and then synchronizes changes to the cached files with the original file on the InterMezzo server. If you disconnect from the network, your system contains a copy of your files, on which you can continue working. When you reconnect to the network, InterMezzo automatically updates the original version of the file with your changes. This mechanism is known as "synchronization."

InterMezzo's synchronization mechanism is analogous to the "Offline Files" or "Briefcase" mechanisms provided in most modern versions of Microsoft Windows--except, of course, that InterMezzo is free and you can use it on a true multi-user system such as Linux.

History of InterMezzo

InterMezzo is a relatively young distributed filesystem with a focus on high availability, flexible replication of directories, disconnected operation, and persistent caching. InterMezzo is an Open Source project that is available under the GPL. The primary InterMezzo web site is http://www.inter-mezzo.org.

InterMezzo has a distinguished family tree in distributed filesystem terms. InterMezzo was inspired by Carnegie-Mellon University's Coda distributed filesystem, another popular distributed filesystem that is available for Linux and which will be discussed in a subsequent article in this series. InterMezzo is not based on the Coda source code (which is freely available), but has a completely new codebase. The father of InterMezzo, Peter Braam, was the head of the Coda project at CMU for several years before moving on with InterMezzo and other advanced computing projects. As a further twist, Coda itself began life as a branch from the AFS 2.0 source code, another popular distributed filesystem whose Open Source version, OpenAFS, will also be discussed in a subsequent article in this series. This conceptually rich family tree gives InterMezzo a long, well-established intellectual pedigree enables InterMezzo to take advantage of years of conceptual development in AFS and Coda without inheriting any parts of their large, complex code-base.

InterMezzo is becoming more and more popular. Articles on InterMezzo have appeared in Linux Format magazine and on the web at Byte.com. One fortunate side effect of its growing popularity and ongoing development is that installing and configuring InterMezzo is easier than ever. InterMezzo has been a part of the main Linux kernel source since kernel version 2.4.5, and comes pre-compiled as a loadable kernel module in many recent out-of-the-box Linux distributions, including Red Hat 7.3. As discussed in the next section, recent changes to the components of InterMezzo and their installation and configuration process have obsoleted the procedures described in earlier articles, and have also turned installing InterMezzo into a 15-minute administrative task.

Components of InterMezzo

On both clients and servers, InterMezzo uses an existing filesystem to store its data, and tracks filesystem changes in "hidden" directories in those filesystems. (These directories are hidden in the classic Unix/Linux sense, meaning that their names begin with a period.) InterMezzo data is typically stored in journaling filesystems, though ext2s filesystems can also be used. Using a journaling filesystem as the underpinnings of InterMezzo filesystems helps guarantee both that changes to the files in the InterMezzo filesystem(s) won't be lost, and that InterMezzo's log of file changes will always be as accurate as possible, even after a spontaneous reboot.. Rather than providing yet another journaling filesystem (YAJF), InterMezzo relies on a set of wrapper functions that enable it to take advantage of existing journaling filesystems such as ext3, JFS, and ReiserFS. Work on using XFS with InterMezzo is still in progress.

Software-wise, InterMezzo basically consists of kernel components and a server process known as InterSync that synchronizes files on InterMezzo client and server systems. An instance of InterSync runs on each InterMezzo client, keeping cached files on the client system synchronized with the contents of the InterMezzo server's exported filesystem. The InterMezzo server maintains a record of changes made to any files in its experted filesystem(s) in a file known as a "kernel modification log" that is stored in the exported InterMezzo filesystem. Each client's InterSync process periodically polls the server, retrieves this file, and scans it for records related to cached files. If a record indicating a change to a cached file is found, the InterSync server retrieves a fresh copy of the file from the server. All communication between client and server in the latest version of InterMezzo is done using the HTTP protocol.

Only synchronizing multiple clients with the contents of an InterMezzo server and exported filesystem(s) is useful if you simply want to propagate files and directories to client systems in an essentially read-only fashion. However, in real life, it's most useful to synchronize files in both directions. You can therefore also run an InterSync server on the InterMezzo server in order to do active synchronization. Active synchronization guarantees that changes to files on the server are propagated to clients, but also guarantees that changes to cached files on client systems are propagated back to the server. Change to files on the server are then propagated out to all other clients in the standard fashion.

Early releases of InterMezzo were notable because the synchronization server that they used was written in Perl. Though operating-system-level servers written in scripting languages such as Perl are fairly rare, this rapid prototyping approach made it easy to quickly explore a variety of synchronization mechanisms. Earlier InterMezzo synchronization mechanisms relied on the standard Unix/Linux rsync application and protocol rather than HTTP. Using HTTP not only makes InterMezzo compatible with an existing, widely-used protocol that is already available on many servers, but also enables InterMezzo to take advantage of existing HTTP-oriented security mechanisms such as SSL. Because HTTP is the foundation of the Web, it is also likely that increasing amounts of security will be available for HTTP communications, all of which InterMezzo can therefore get for free.

The next few sections describe how to install and start InterMezzo. All of the commands in these sections must be executed as the root user. These sections do not discuss security issues, since InterMezzo itself doesn't do anything special with security. The InterMezzo client and server filesystems described in the next few section are created as publicly writable filesystems for experimentation purposes. You can impose and enforce security on them using standard Linux security mechanisms that are outside the scope of this article.

Requirements for InterMezzo

Installing and using InterMezzo requires at least two computers running Linux, one to act as an InterMezzo file server, and at least one to act as a client of that server. For optimal experimentation with InterMezzo, you should have a un-used partition on the server and on each client, which will serve as the main InterMezzo filesystem and the client's local cache, respectively. Since few of the laptops that I've ever used have had spare filesystems sitting around, the instructions in the next few sections also explain how to use a loopback filesystem as a client cache when experimenting with InterMezzo, but an actual filesystem is much more robust and therefore preferable.

Installing and using InterMezzo and related software packages requires the following:

InterSync depends on the glib2 package, and also requires headers provided with ghttpd. If you have to build InterSync, you should build and install ghttpd first so that InterSync can find the mandatory header files. If you are building InterSync, a directory or symbolic link named /usr/src/linux must exist that points to or contains the kernel source code on your system. InterSync also requires header files from your Linux kernel distribution.

Kernel Configuration

All of the systems on which you want to run InterMezzo must have InterMezzo support in the kernel, either compiled into the kernel itself or available as a loadable kernel module (LKM). (A pre-compiled InterMezzo LKM is provided with the out-of-the-box kernel for some Linux distributions, such as Red Hat 7.3.) Built-in InterMezzo support is available in all kernel versions newer than 2.4.5, but can be added to many older kernels by applying patches available on the Web or by building an InterMezzo module from source. This article focuses on activating and using InterMezzo in Linux 2.4 kernels that contain InterMezzo support.

You can determine if an InterMezzo LKM is already available for your kernel by issuing the command "insmod intermezzo" as the root user. If an InterMezzo module is available, this command will displays the message Using XXXXX, where XXXXX is the full pathname of the InterMezzo module. If no InterMezzo module is available, you'll see a message like insmod: intermezzo: no module by that name found. If no LKM is available, there is no easy way to determine if InterMezzo support is compiled into your kernel other than trying to mount an InterMezzo filesystem, as described later in this section. If you see the message mount: fs type intermezzo not supported by kernel, you will need to build a loadable kernel module for InterMezzo, as described in the next paragraph. If you could successfully load the InterMezzo LKM, you can skip the next paragraph and proceed to

If your kernel does not already contain active InterMezzo support and no InterMezzo LKM is already available, you will need to have installed the kernel source code for the version of the kernel that your system is running. If installed, it should be located in a subdirectory of your /usr/src directory that has the same name as your kernel version. If it is not installed, a package containing the kernel source code for your Linux distribution should be available on your distribution disks.

Once you've located the kernel source code, execute the command make xconfig to display the Linux kernel configuration's X Window system-based kernel configuration mechanism. The InterMezzo option is located in "File Systems" panel's "Network File Systems" panel. Select "m" beside the "InterMezzo file system support (experimental, replicating fs)" entry to build this as a loadable kernel module, and exit from xconfig, saving your changes. You can then execute the make dep, make modules, and make modules_install. You will need to do this on both your client and server systems. Once the module is compiled and installed, you can then use the insmod command described earlier in this section to verify that the module compiled correctly and could be correctly installed on your system.

If you don't have a spare partition on your client system(s) that you can dedicate to InterMezzo, the kernel running on each of your systems must also have support for the "loopback device". Loopback support is only necessary if you do not have a spare partition on your client system. You can build loopback device support as a module by by selecting "m" beside the "Block Devices" panel's "Loopback Device Support" option in make xconfig. If you need to add loopback device support to your kernel, you should do this at the same time that you selected building the InterMezzo module, and then build both modules at the same time by using the make commands that were described earlier in this section.

Installing or Building Other Software

Once InterMezzo support is available to your system's kernel, you can then install or build the other packages required by InterMezzo. As with kernel support for InterMezzo, you must install or build the packages described in this section on both your server and client systems.

First, install the ghttpd package or verify that /usr/sbin/httpd is already installed on your system. If it is already installed on your system, skip to the next paragraph. If your Linux distribution is an RPM-based system, install the ghttpd package that was discussed in the "Requirements for InterMezzo" section of this article using the command rpm -U libghttp-1.0.9-1.i386.rpm. If you need to build ghttpd from source, extract the contents of the libghttp-1.0.9.tar.gz archive that was discussed in the "Requirements for InterMezzo" section of this article using the tar zxvf libghttp-1.0.9.tar.gz command. You can then cd to the libghttp-1.0.9 directory and build and install ghttpd using the ./configure, make, and make install commands as the root user. This will install ghttpd in subdirectories of /usr/local.

Next, install InterSync. If your Linux distribution is an RPM-based system, install the InterSync package that was discussed in the "Requirements for InterMezzo" section of this article using the command rpm -U intersync-0.9.4-1.i386.rpm. If you need to build InterSync from source, extract the contents of the intersync-0.9.4.tar.gz rchive that was discussed in the "Requirements for InterMezzo" section of this article using the tar zxvf intersync-0.9.4.tar.gz command. You can then cd to the intersync-0.9.4 directory and build and install InterSync using the ./configure, make, and make install commands as the root user.

If you install the InterSync RPM, a script for automatically installing InterMezzo will be installed in your system's startup directory, which is typically /etc/rc.d/init.d on Linux distributions such as Red Hat. Automatically starting InterMezzo on client and server systems and mandatory modifications that you must make to the startup file are discussed in the appropriate sections of this article.

Starting InterMezzo and Exporting a Filesystem on the Server

First, you must create or identify the journaling filesystem that will hold the InterMezzo data that you plan to export from the server. This section uses an ext3 filesystem as an example, though you can also use any journaling filesystem or even an ext2 filesystem in a pinch. Using an ext2 filesystem is not as robust as using a journaling filesystem because it increases the possibility that InterMezzo synchronization data and log information may be lost if you system crashes.

If you have a free ext3 partition that you want to use for InterMezzo, skip to the next paragraph. If not, create an ext3 filesystem on an unused partition by executing the command mke2fs -t ext3 partition-name, where partition-name is the name of your partition.

Next, create the mount point for the InterMezzo filesystem using the command mkdir -p /exports/server. This can be anywhere, but /exports/server is the most commonly-used location. After creating the mount point, mount the InterMezzo filesystem using the command mount -t intermezzo partition-name /exports/server. For the purposes of this article, change the mode of the root of the filesystem to 777 (making it publicly writable) by using the command chmod 4777 /exports/server. You would not want to do this in most production environments.

If you are going to be experimenting with InterMezzo for a while, you should add an appropriate entry for this filesystem to the end of your systems /etc/fstab file so that your system automatically mounts this filesystem each time it boots.

Next, start the InterSync synchronization server using the command intersync /exports/server&. You should then copy some sample file(s) and/or directories into this filesystem, so that you can verify that things are working once you start your client system, as described in the next section.

If you installed InterSync on a Red Hat system from the RPM described earlier in this article, a startup file for InterSync was installed in your system's startup directory. On InterMezzo server systems, you can automatically start InterSync (and load the InterMezzo loadable kernel module) each time you boot your system by creating a symbolic link to the file /etc/rc.d/init.d/intersync in the startup directory for your system's default run level (which is listed in the file /etc/inittab). This file was created automatically if you installed InterSync from the RPM package described earlier in this article. For example, on a Red Hat Linux system that starts run level 5 by default, you could do this by creating a symbolic link named /etc/rc.d/rc5.d/S99intersync that points to this file.

Before attempting to automatically start InterMezzo on server systems, you must edit the file /etc/rc.d/init.d/intersync and supply a correct values for the CACHE environment variable. Though something of a misnomer, on server systems the CACHE variable should contain the InterMezzo directory that your server is exporting, which would be /exports/server if you followed the suggestions given in this section.

Mounting a Remote InterMezzo Filesystem on the Client

The procedure for importing an InterMezzo filesystem on a client system is very similar to the procedure described in the previous section.

First, you must create or identify the journaling filesystem that will hold the cached InterMezzo data that you plan to import from the server onto your client. This section uses an ext3 filesystem as an example, though you can also use any journaling filesystem or even an ext2 filesystem in a pinch. Using an ext2 filesystem is not as robust as using a journaling filesystem because it increases the possibility that InterMezzo synchronization data and log information may be lost if you system crashes.

If you have a free ext3 partition that you want to use for InterMezzo, skip ahead to the instructions for creating its mount point and mounting the client's cache filesystem. If not, create an ext3 filesystem on an unused partition by executing the command mke2fs -t ext3 partition-name, where partition-name is the name of your partition.

If you do not have a dedicated partition to use for testing InterMezzo, you can create a 10 MB loopback device filesystem for testing purposes by executing commands like the following (where filename is the name of the file that you want to contain the loopback filesystem):


 dd if=/dev/zero of=filename bs=1024 count=10000k
 /sbin/losetup /dev/loop0 filename 
 /sbin/mke2fs -j /dev/loop0

NOTE: If you have created a loopback filesystem for use with InterMezzo, you should use /dev/loop0 as the name of your filesystem in the remainder of this section. Also, using a small loopback device for your cache limits the number of files that you can keep in the cache on the fileserver. Optimally, the size of your client cache and the size of the filesystem that InterMezzo exports from the server should be the same.

Next, create the mount point for the InterMezzo filesystem using the command mkdir -p /imports/server. This can be anywhere, but /imports/server is the most commonly-used location. After creating the mount point, mount the InterMezzo filesystem using the command mount -t intermezzo partition-name /imports/server. If you are using a loopback device for testing purposes, the command should be mount -t vintermezzo partition-name /imports/server. For the purposes of this article, change the mode of the root of the filesystem to 777 (making it publicly writable) by using the command chmod 4777 /imports/server. You would not want to do this in most production environments.

If you are going to be experimenting with InterMezzo for a while, you should add an appropriate entry for this filesystem to the end of your systems /etc/fstab file so that your system automatically mounts this filesystem each time it boots. If you are using a loopback device for longer-term testing, you will need to add a command such as /sbin/losetup /dev/loop0 filename to your system's startup scripts so that the mapping between the loopback device and the file that contains the loopback filesystem is done before the system attempts to mount it.

Finally, start the InterSync synchronization server using a command like intersync --server="server-name" /imports/server &, where server-name is the name of your InterMezzo server. After a few seconds, you should be able to list the /imports/server directory and see that the files which you copied into /exports/server on the InterMezzo fileserver are now available in the /imports/server directory.

If you installed InterSync on a Red Hat system from the RPM described earlier in this article, a startup file for InterSync was installed in your system's startup directory. On InterMezzo client systems, you can automatically start InterSync (and load the InterMezzo loadable kernel module) each time you boot your system by creating a symbolic link to the file /etc/rc.d/init.d/intersync in the startup directory for your system's default run level (as shown in the file /etc/inittab). For example, on a Red Hat Linux system that starts run level 5 by default, you could do this by creating a symbolic link named /etc/rc.d/rc5.d/S99intersync that points to this file.

Before attempting to automatically start InterMezzo on client systems, you must also edit the file /etc/rc.d/init.d/intersync and supply correct values for the CLIENT_OPTS and CACHE environment variables. The CLIENT_OPTS variable should contain the string --server="servername", where servername is the name of your InterMezzo server, with the name of the server surrounded by double quotation marks. The CACHE variable should contain the name of your system's InterMezzo cache directory, which is /imports/server if you followed the suggestions given in this section.

Troubleshooting

The cost of Linux and the fact that it is continually being enhanced are two of the best features of Linux. One unfortunate side effect of the number of different Linux distributions and associated software and kernel versions is that this makes it difficult for articles such as this one to guarantee that kernel-level procedures, such as loading and using a new type of filesystem, will work in every case.

InterMezzo is actively being used and is quite stable, but is continually being enhanced and optimized. If you encounter problems with the InterMezzo loadable kernel module that is provided with your kernel distribution, the easiest solution is to obtain the source code for the latest version and compile and install it on your system.

The InterMezzo development project, like hundreds of other useful collaborative Open Source development projects, is hosted at SourceForge. The InterMezzo project page there is http://sourceforge.net/projects/intermezzo, where you can get the latest releases of InterMezzo there by using the following commands when connected to the Internet:


cvs -d:pserver:anonymous@cvs.intermezzo.sourceforge.net:/cvsroot/intermezzo login
cvs -z3 -d:pserver:anonymous@cvs.intermezzo.sourceforge.net:/cvsroot/intermezzo co izo

The first command logins you in anonymously to the InterMezzo project's CVS server, while the second retrieves the source code for the izo source directory from which you can build the InterMezzo loadable kernel module.

Once you've retrieved the source code, make sure that a directory or symbolic link named /usr/src/linux exists, pointing to the kernel source code on your system. Next, change directory to the izo directory that you retrieved earlier and execute the following commands:


sh autogen.sh
./configure
make install

After executing these commands, you should reboot your system if you had already loaded it during the current session. You can then repeat the commands described for starting the InterMezzo client or server sections of this article, depending on where you experienced the problem.

Wrapping Up

InterMezzo is a great distributed filesystem that is extremely useful for people who need to be able to access their files while they are not be connected to the network, but still want to be able to resynchronize with a central file server when they reconnect.

For complete, up-to-the-minute information about Intermezzo and versions of the software that it requires, see the Intermezzo web site at http://www.inter-mezzo.org. For up-to-the-minute access to the latest versions of the InterMezzo source code, you can use the anonymous CVS access mechanism described earlier in this article, and build it yourself.

InterMezzo is a relatively light-weight distributed filesystem that uses the standard HTTP protocols for its synchronization mechanism, and depends on standard Linux security to control access to the files that it distributes. It is easy to set up on a desktop system and provides an ideal mechanism for users to synchronize actively-used files on the network with copies on laptops, making "disconnected operation" painless on Linux. The Coda distributed filesystem also supports disconnected operation under Linux, and is the subject of the next article in this series, which will not only discuss Coda but will compare and contrast the administrative requirements of the two distributed filesystems.

Bill von Hagen has written for Linux Magazine, Maximum Linux, Linux Format, Mac Home, Mac Tech, and various Linux and Macintosh-related online publications. He is the author of books on SGML, Linux Filesystems, and Red Hat Linux, and is the co-author of a book on Mac OS X. He is the Content Manager for TimeSys Corporation.

Copyright Jupitermedia Corp. All Rights Reserved.