Linux Powers Giant Database: 400TB of Climate Data and Counting - page 2
Linux and Climate Change
From a software point of view, the main advantage of CERA-2, which integrates data and metadata, is the flexibility of its schema. It's easy to add new data with entirely different models. The basic entry structure for a CERA-2 entry is a set of blocks of data, centred around a metadata block. If you're into your databases, you can find detailed information on the model structures available online.
Data can be accessed via the web, but not all of the available data can be kept on disk – the less-often used data are kept on tape and accessed only when needed. This is of course a nuisance from the point of view of the scientists as it means access takes longer, and the aim is to have as much as possible of the commonly-used data on disk.
Catalogue metadata is accessed via Java Server Pages and servlets (avoiding any requirement for client-side software). Anyone can access the data searches anonymously, but you need an account to download the actual data. The WDCC also provides a collection of data processing tools which users can use once they've downloaded their data. Here too, it's overwhelmingly UNIX-type machines that are expected and supported. The CDO climate data processing tool runs on Solaris, Linux, and MacOSX (which is Free-BSD-based under the hood), and the NCO command-line tools are all Linux and Mac OSX packages as well. Another tool, the Climate Data Analysis Tools, are written in Python, and supported primarily on Solaris, Mac OSX, and various Linuxes. The scientific climate community is obviously heavily Linux- and FOSS-based!
It's an impressive project, and getting bigger all the time as more models are run and more research data produced. It's good to know that open source is contributing to working on the big problems out there.