|
The Coda Distributed Filesystem for Linux
Introduction to CodaThe initial article in this series provided an overview of the basic principles of distributed filesystems, and highlighted several of the most popular and up-and-coming distributed filesystems that are available for Linux today. The previous articles explored the InterMezzo distributed filesystem and explained how to install and configure a simple InterMezzo client and server. This article explores the Coda distributed filesystem that provided much of the inspiration for InterMezzo and which is also readily available for Linux. Coda is a well-established distributed filesystem that was developed at Carnegie-Mellon University, is actively in use there, and is also still actively under development. Coda began life as a variant of the AFS distributed filesystem (version 2) from Carnegie-Mellon University, but has since taken on a complete life of its own. Led by M. Satyanarayanan, the Coda filesystem project is focused on specific distributed filesystem functionality required for mobile computing, such as support for disconnected operation. As explained in the article on InterMezzo, "disconnected operation" is the term used to describe the situation where a system that is ordinarily a part of a networked, distributed filesystem, is used without being connected to a network. Due to its heritage, Coda shares a basic set of terminology and features with AFS. (The Open Source version of AFS, OpenAFS, will be discussed in the next article in this series.) Coda provides a number of features that make it an excellent, high-performance distributed filesystem. Beyond its focus on mobile and disconnected operation, one of Coda's most significant features is its extensive use of caching. Caching means that copies of files or portions of file retrieved from Coda servers are preserved on Coda clients as long as they can be verified to match the master data stored on the Coda server. This is therefore known as "client-side caching". Client-side caching reduces the amount of time that it takes to restart a Coda client by minimizing the amount of data that needs to be transmitted over the network after a Coda client is restarted. It's a fact of computer life that people tend to work on the same files and in the same directories--these change over time, of course, but the files you are working on today are probably more-or-less the same ones that you worked on yesterday. Client-side caching reduces network communication and minimizes client restart times, but doesn't always guarantee that the files that you need to work on are present on the client. In a networked environment, this is fine--the client system can simply retrieve the file from the file server on which it is located. Given Coda's focus on disconnected operation, Coda also provides command-line commands that let users manipulate the contents of the cache, guaranteeing that specific files and directories will be present in the client's cache. Coda's "hoard" command therefore enables you to anticipate being disconnected from the network. uses any preloading your system with cached copies of specific file and directories. This function is typically used before you disconnect a laptop from the network prior to working in a disconnected fashion for some period of time. An example of using the hoard command is provided later in this article.
What's in a Name?Distributed filesystems provide a number of administrative advantages, most of which were highlighted in the first article in this series. For real users, the biggest advantages of distributed filesystems are that they give you access to a greater amount of storage than most desktop or laptop systems contain, and that they can be accessed from any authorized workstation. However, accessing distributed filesystems from different systems can be confusing if the filesystem doesn't provide a single way of accessing those files. Anyone who has ever used Windows shared filesystems and directories (usually referred to as "shares") is probably familiar with the frustration of the fact that different drive letters are often mapped to different shares on different computers. For example, the filesystem mapped to the drive letter F: on one system is often different than the filesystem mapped to the same drive letter on another. This makes it difficult to tell other people how to find a specific file or directory--sending email asking your business partners to review the file F:\projects\startup\bus_plan.txt will only frustrate prospective business partners if your drive F: and theirs are mapped to two different shares. Distributed filesystems such as Coda (and AFS) eliminate this problem by providing a single way of accessing the distributed filesystem that is the same on all computer systems. The entire Coda distributed filesystem on any workstation is mounted under the directory /coda (Not surprisingly, AFS uses the /afs directory for the same purpose.) This is referred to as a "global name space" because it means that all Coda systems in each administrative domain share the exact same view of the Coda distributed filesystem. Users can therefore locate familiar files and directories that are stored in Coda can be accessed from any machine through the same pathname. If your home directory is /coda/home/wvh, that same directory is available from any other systems in the same administrative Coda domain by using the same pathname. As explained later in this article, installing a Coda client provides an instant example of this by enabling you to access a special, open shared directory at CMU as soon as you start Coda on your system. Another significant advantage of Coda is that it provides support for a much richer set of file and directory permissions than standard Linux/Unix systems do. These permissions are known as Access Control Lists (ACLs). Standard Linux file and directory permissions let you restrict access based on the owner of a file, a group that is associated with that file or directory, and set default permissions for everyone else. In Coda, as in AFS, you can set explicit permissions for any number of individuals or existing groups. You can also create your own groups, add users to them, and then grant special sets of permissions to the group as a whole. Access control lists are not unique to Coda and AFS--they are also available in journaling filesystems such as XFS. Work to add support for access control lists is also underway for standard Linux filesystems such as ext2 and ext3.
The best example of the advantage of Access Control Lists is the
flexibility that they provide when working on special projects. If you
are using a standard Linux system, you could ask your system
administrator to create a Linux group for the project, add all of the
project members to that group, and then ask him or her to create a
directory owned by that group. When working on files in that
directory, users would have to use the Linux In contrast, when using Coda ACLs, any user can create their own shared directory in the Coda filesystem, create a project group, and assign users to that group. All Coda users who are added to the group instantly have access to those files because Coda users can simultaneously belong to multiple groups. Any files and subdirectories created in the project directory automatically inherit the permissions of the parent directory. No current or future intervention from a system administrator is required--the Coda user who owns the group can add or remove members at any time.
Support for the Coda filesystem is present in the source code for all
2.4 kernels and comes pre-compiled as a loadable kernel module (LKM)
in many recent out-of-the-box Linux distributions, including Red Hat
7.3. An easy way to see if support for Coda is compiled into your
kernel is to examine the file /proc/filesystems after booting your
system. If an entry for Coda is present in this file, support for Coda
is compiled into your kernel. If not, use the If your kernel does not already contain active Coda support and no Coda LKM is already available, you will need to have installed the kernel source code for the version of the kernel that your system is running. If installed, it should be located in a subdirectory of your /usr/src directory that has the same name as your kernel version. If it is not installed, a package containing the kernel source code for your Linux distribution should be available on your distribution disks.
Once you've located the kernel source code, execute the command
Installing a Coda ClientThe source code and pre-compiled binaries for Coda are readily available from Carnegie-Mellon University. For convenience sake, the examples in this section install a Coda client from Resource Package Manager (RPM) files on a Red Hat 7.3 Linux system. Installing a Coda client require that you download (or compile) and install four packages:
After downloading these packages, you can install these packages (as the root user) using the following commands: su Your-Root_Password rpm -U lwp-1.8-1.i386.rpm rpm -U rpc2-1.13-1.i386.rpm rpm -U rvm-1.6-1.i386.rpm rpm -U coda-debug-client-5.3.19-1.i386.rpm At this point, all of the software required for the Coda client is installed but the Coda filesystem is not yet active. You can verify this by listing the /coda directory which was created for you during the installation of the client: ls /coda NOT_REALLY_CODA If Coda has been installed but is not running, the /coda directory contains a file named NOT_REALLY_CODA to let you know that the Coda filesystem is not mounted. Finally, use the Coda initialization script to start Coda's cache manager (named Venus) and mount the Coda filesystem: /etc/rc.d/init.d/venus.init start Starting venus: done. Date: Sun 09/15/2002 00:15:54 /usr/coda/LOG size is 549376 bytes 00:15:54 /usr/coda/DATA size is 2193368 bytes 00:15:54 Loading RVM data 00:15:54 Last init was Sun Sep 8 19:27:00 2002 00:15:54 Last shutdown was clean 00:15:54 starting VDB scan 00:15:54 2 volume replicas 00:15:54 1 replicated volumes 00:15:54 0 CML entries allocated 00:15:54 0 CML entries on free-list 00:15:54 starting FSDB scan (833, 20000) (25, 75, 4) 00:15:54 781 cache files in table (6824 blocks) 00:15:54 52 cache files on free-list 00:15:54 starting HDB scan 00:15:54 3 hdb entries in table 00:15:54 0 hdb entries on free-list 00:15:54 Getting Root Volume information... 00:15:54 Venus starting... 00:15:54 /coda now mounted. A number of messages display in the window or terminal session from which you started Coda. These provide status information about Coda's startup sequence and should conclude with a message stating that the Coda filesystem has been mounted on the directory /coda. Distributed filesystem clients aren't all that interesting without a server that they can talk to, so the Coda project kindly provides a sample server, running at Carnegie-Mellon University, that Coda clients connect to by default. After installing the Coda client and starting Coda, you can list the contents of /coda to verify that Coda is actually working: ls /coda ftp.coda playground cd /coda/playground ls Documentation file george MUHAHA test www.brabanten.com The "playground" directory is a publicly writable in which you can create files to verify that everything is working correctly. If you examine some of the files that are already present in this directory, you'll see some excited comments from previous explorers of the Coda distributed filesystem who were similarly impressed that "it just works."
Installing a Coda ServerInstalling the software for a Coda server is very similar to installing a Coda client, requiring two additional packages:
In addition to these packages, you will also need a spare partition to hold the files that your server will be making available to clients through the /coda directory. You don't need to install the Coda client software on a Coda server unless you are also using that server as a desktop system and want to be able to access /coda from that system. For simplicity's sake, the examples in this article don't install a Coda client on the server. After downloading these packages and the packages listed in the previous section of this article, you can install them (as the root user) using the following commands: su Your-Root_Password rpm -U lwp-1.8-1.i386.rpm rpm -U rpc2-1.13-1.i386.rpm rpc -U rvm-1.6-1.i386.rpm rpm -U rvm-tools-1.6-1.i386.rpm rpm -U coda-debug-server-5.3.19-1.i386.rpm The next section explains how to configure and start a Coda server.
Configuring a Coda ServerConfiguring a Coda server is much simpler than configuring similar distributed filesystem servers, because the Coda server package includes a shell script that prompts you for all of the information required. Before executing the script in this section, you should format and mount a disk partition that Coda can use for storing the data exported by your server. This should be a standard ext2 partition. The transactional nature of writes to journaling filesystems may cause synchronization problems, so you should not host Coda server data on a journaling filesystem partition (such as ext3). The standard Coda (and AFS) convention for distributed filesystem data partitions is to mount them at locations such as /vicepa (Vice partition A), /vicepb (Vice partition B), and so on. Vice is the original name of the distributed filesystem used by AFS, the original parent project of Coda. In addition to a partition on which to store your Coda data, you will also need to specify two files or partitions to be used for logging. Coda stores log information in one of these and log metadata (summary information about log data) in the other. If you plan to run a Coda server for purposes other then experimentation, using partitions rather than files will provide higher performance. For simplicity's sake, the examples in this section use files, which I've stored in the /usr/coda/logs directory that I created for this purpose.
The following is a complete transcript of running Coda's vice-setup
script. To make this more readable, responses to the script are
red. The example provides explicit answers to all of the
prompts, even where default values are available by pressing
Your first (or only, in this case) Coda server is usually the master
server. The master server provides volume location and other
information to Coda clients. The text strings that you enter as random
tokens for various types of authentication used by Coda ("foobar" in the
previous examples) can be any sequence of characters.
The serverid number and rootvolume names can be anything that you
want. Production Coda configurations typically use a standard
numbering and naming convention.
The UID and username that you specify in this section do not have to be
those of existing Linux users. They are internal to the authentication
mechanism used by Coda. You'll need to remember the username that you
specified as the Coda system administrator, since you will need to
authenticate as this user whenever you perform any privileged Coda
operation, such as creating accounts, setting ACLs in portions of the
Coda filesystem owned by the system administrator, and so on.
The previous section specifies where Coda should store internal status
information. These can be either files or existing disk partitions. In
production Coda environments, log data and Coda transaction metadata
(internal recoverable virtual memory information about Coda data
transfers, lookups, file status, and so on) should be stored in disk
partitions to improve performance. When experimenting, using files
instead of partitions is standard practice.
As explained at the beginning of this section, the partition you
specify in this section should already exist and should have been
mounted before running the Coda server's setup script.
The final messages from the Coda server setup script explain what you
need to do in order to actually start the Coda server now that its
configuration is complete. All of the applications referenced in this
section were installed in the directory /usr/sbin as part of installing
the Coda server package, so this directory must be located in the root
user's path to refer to them by anything other than a full pathname.
The following is a transcript of starting the processes required by
Coda:
The name of the host that you specify as an argument to the updateclnt
command must be name of the host on which you're installing this Coda
server. The updateclnt server synchronizes system binaries between
file servers and control machines--in this case, since both are the
same, it is conceptually extraneous, but is still required by the
system.
The last step in bringing your Coda server to life is to actually
create a replicated, read/write Coda volume on the partition you set
aside for Coda data. The following example shows this command and its
output. You should type the command exactly as it is specified in the
final output of the Coda server setup script, since this uses the
values you specified during the installation process.
At this point, your Coda server is running, but no Coda client knows
how to connect to it--Coda clients need to know which master
server(s) controls the administrative domain that they're associated
with. The next section explains how to modify a Coda client (whose
installation was explained earlier in this article) so that it will
talk to your Coda server.
Connecting to a specified Coda server is easy, thanks to clever
scripts provided by the Coda project. After installing a Coda client
and server, as explained earlier in this article, log in on the Coda
client system and use the
The venus-setup script modifies the files /etc/services (if necessary)
in order to add entries for the network services used by Coda, and
modifies the client configuration file /etc/coda/venus.conf to reflect
the Coda server name and cache size.
To actually connect to the server, you will need to start (or restart)
Coda's cache manager so that it connects to the correct server. You
can use the Coda initialization script /etc/rc.d/init.d/venus.init to
do this, as in the following example:
The first of these terminates any instance of the cache manager that
may already be running on the client; the second starts a fresh
instance. Your client is now communicating with your Coda server--congratulations, you're running Coda!
If you've followed the instructions in the previous section, your
Coda client is now talking to your Coda server, and you should be able
to list the /coda directory on the client ("ls /coda") and see any
files that are located on the data partition that the server is
exporting. If you see output like that in the following example, your
client and server aren't talking correctly:
In this case, you should check your logs (/var/log/messages, for
system problems, and /vice/srv/SrvLog, for Coda-specific
problems). Unfortunately, debugging every possible problem is outside
the scope of this article, but Coda usually just works.
If you can list the /coda directory on your client and see no files,
the natural thing to do is to want to create a file there and make
sure that you can see it from any and all Coda clients that are
communicating with your server. To do this, you must authenticate to
Coda, which will give you the privileges that you need to actually
create files in a standard Coda filesystem. The Coda filesystem
exported by CMU for Coda testing is specially configured to enable
anyone to read and write there, which--as you might hope--is not the
default configuration of a Coda filesystem. The following shows an
unauthenticated attempt to create a file in /coda:
A complete discussion of Coda authentication is outside the scope of
this article. Coda provides the
Authenticating to Coda can be done manually using Coda's
After authenticating to Coda (known as "acquiring tokens" in
Coda-speak), you should now be able to create a file in the
distributed Coda filesystem, as in the following example.
In this case, no news is good news--the file was correctly copied to
the /coda partition, which is the client's mount point for the Coda
partition that is being exported by your server.
The Coda distributed filesystem is designed for use regardless of
whether or not you're connected to a network. As mentioned earlier,
this seems somewhat paradoxical--how can you connect to a
network-oriented filesystem when you're not connected to a network?
To solve this problem, Coda enables you to make sure that your Coda
client has cached copies of any files that are stored in Coda but
which you need to access when you're not connected to the
network. This is known as "hoarding" and is done using Coda's hoard
command. The files that you want to hoard must be listed in a text
file, known as a "hoard file", that you supply as an argument to the
hoard command using its "-f" option. A sample section of a hoard file
is the following:
Each line contains three sections: the command that the hoard command
should execute (in this example, 'a', meaning to add the file to the
client's hoard), the name of a file or directory that you want to add
to the hoard, and a priority and associated attributes for that
file. In this example, 600 is a standard priority for hoarding files
(the priority determines which files must be hoarded if the number of
requested files exceed the cache size), and the "d+" attribute means
to cache this directory and any of its subdirectories.
If you're not sure which files you're actually using, you can use Coda
"spy" command to monitor the traffic between your client and a
specific server and write a list of these files to standard output, as
in the following example:
After terminating the spy process and editing its output file to use
the correct syntax for the hoard command, you can then use this file
to hoard whatever you're working on before disconnecting from the
network, as in the following example:
Executing this command will display information about each file and/or
directory that is being cached on your client.
Disconnecting from a network is easy--you can simply shutdown your
machine and unplug it. However, in order to safely disconnect from the
network and ensure that any file changes that are currently in your
client's cache are synchronized with the server, you should use Coda's
You can now work on the file that you've hoarded on your machine
without being connected to the network. When you're ready or able to
reconnect to the network, plug your system back into the network and
execute the
This will reconnect your client to its default server and begin the
process of integrating your changes into the same files on the Coda
server. If no one else has modified those files, the files will be
invisibly synchronized. If any conflicts arise, such as if someone
else has modified the same files that you did, you can use Coda's
Coda is a functional distributed filesystem that is relatively easy to
install, configure, and use. As explained earlier in this article,
Coda is designed as a distributed filesystem that you can use when
you're connected to the network, quickly configure for use when you're
not connected to the network (as explained in the previous section),
and automatically synchronize back to the networked filesystem when
you reconnect to the network.
This article explained the highlights of installing Coda clients and
servers. As with any distributed filesystem, there are many
administrative issues that were glossed over. For a complete
discussion of using and administering Coda, see the official Coda
documentation that is available at http://coda.cs.cmu.edu.
The next article in this series discusses using OpenAFS, which is the
Open Source version of the AFS filesystem that was Coda's original
parent project. Different filesystems are designed to address
different issues--as we'll see in the next article, OpenAFS is a
popular distributed filesystem that benefits from the AFS filesystem's
years of research and commercial use and testing. OpenAFS is a very
secure, stable, and powerful distributed filesystem that is actively
used in hundreds of commercial and research installations all over the
world.
|