The Coda Distributed Filesystem for Linux
Introduction to Coda
The initial article in this series provided an overview of the basic principles of distributed filesystems, and highlighted several of the most popular and up-and-coming distributed filesystems that are available for Linux today. The previous articles explored the InterMezzo distributed filesystem and explained how to install and configure a simple InterMezzo client and server. This article explores the Coda distributed filesystem that provided much of the inspiration for InterMezzo and which is also readily available for Linux.
Coda is a well-established distributed filesystem that was developed at Carnegie-Mellon University, is actively in use there, and is also still actively under development. Coda began life as a variant of the AFS distributed filesystem (version 2) from Carnegie-Mellon University, but has since taken on a complete life of its own. Led by M. Satyanarayanan, the Coda filesystem project is focused on specific distributed filesystem functionality required for mobile computing, such as support for disconnected operation. As explained in the article on InterMezzo, "disconnected operation" is the term used to describe the situation where a system that is ordinarily a part of a networked, distributed filesystem, is used without being connected to a network.
Due to its heritage, Coda shares a basic set of terminology and features with AFS. (The Open Source version of AFS, OpenAFS, will be discussed in the next article in this series.) Coda provides a number of features that make it an excellent, high-performance distributed filesystem. Beyond its focus on mobile and disconnected operation, one of Coda's most significant features is its extensive use of caching. Caching means that copies of files or portions of file retrieved from Coda servers are preserved on Coda clients as long as they can be verified to match the master data stored on the Coda server. This is therefore known as "client-side caching". Client-side caching reduces the amount of time that it takes to restart a Coda client by minimizing the amount of data that needs to be transmitted over the network after a Coda client is restarted. It's a fact of computer life that people tend to work on the same files and in the same directories--these change over time, of course, but the files you are working on today are probably more-or-less the same ones that you worked on yesterday.
Client-side caching reduces network communication and minimizes client restart times, but doesn't always guarantee that the files that you need to work on are present on the client. In a networked environment, this is fine--the client system can simply retrieve the file from the file server on which it is located. Given Coda's focus on disconnected operation, Coda also provides command-line commands that let users manipulate the contents of the cache, guaranteeing that specific files and directories will be present in the client's cache. Coda's "hoard" command therefore enables you to anticipate being disconnected from the network. uses any preloading your system with cached copies of specific file and directories. This function is typically used before you disconnect a laptop from the network prior to working in a disconnected fashion for some period of time. An example of using the hoard command is provided later in this article.