Back to article
4 Archiving Tools for Linux Server Admins
tar and zip
October 11, 2010
There are all kinds of fancy backup applications, from free to complicated and expensive. But it's still hard to beat the speed, simplicity, and flexibility of the old standbys.
tar has been around approximately forever. It wraps up a set of files into one single larger achive (making them easier to share with other people), but it does no compression, so it doesn't save space. The tar command does, however, support command-line options for dealing seamlessly with tar archives compressed with either gzip or bzip2 (see below).
When tar creates an archive, it concatenates the files, each with a file header containing metadata: the file name, owner, permissions, and any link information. The metadata is in ASCII for portability. The archive will therefore transfer all of this file information along with the file data.
One occasional problem you might encounter when extracting a tarball happens when it has been created such that it will extract directly into the working directory (rather than creating its own directory). At best this is untidy; at worst it could overwrite existing files. Avoid the problem by using tar -tf file.tar to get a file listing before actually untarring it; you can then move it into a new clean directory first if necessary.
zip provides both data compression and archiving; you can compress multiple files into a single archive. It's been around since 1989 and implementations exist on numerous platforms, so it's one of the most portable options (especially if you need to be able to access your archive on a Windows box).
Zip archives include a general directory, with file names and file metadata, at the end of the file. This makes it quick to list the archive as it's not necessary to read the whole thing, but only the directory.
Compression is not actually required for zip archives, although it normally is used. The DEFLATE method, which uses duplicate string elimination, as well as swapping in short versions of common symbols (and long ones of less common ones), is most commonly used. Files are compressed separately, rather than the whole archive being compressed as one. This makes access easier, but limits how much the files can be compressed by – since metadata is not compressed, an archive with many small files won't shrink as much as an archive of a small number of large files.