Linux Powers Giant Database: 400TB of Climate Data and Counting
Scientists Like Open Source Software

Juliet Kemp
Monday, November 9, 2009 11:18:33 AM
From a software point of view, the main advantage of CERA-2, which
integrates data and metadata, is the flexibility of its schema. It's easy to
add new data with entirely different models. The basic entry structure for a
CERA-2 entry is a set of blocks of data, centred around a metadata block. If
you're into your databases, you can find detailed information on the model
structures available
online.
Data can be accessed via the web, but not all of the available data can be
kept on disk – the less-often used data are kept on tape and accessed
only when needed. This is of course a nuisance from the point of view of the
scientists as it means access takes longer, and the aim is to have as much as
possible of the commonly-used data on disk.
Other Stories on LinuxPlanet
|
Catalogue metadata is accessed via Java Server Pages and servlets (avoiding
any requirement for client-side software). Anyone can access the data searches
anonymously, but you need an account to download the actual data. The
WDCC also provides a
collection of data processing tools which users can use once they've
downloaded their data. Here too, it's overwhelmingly UNIX-type machines that
are expected and supported. The CDO climate data processing tool runs on
Solaris, Linux, and MacOSX (which is Free-BSD-based under the hood), and the
NCO command-line tools are all Linux and Mac OSX packages as well. Another
tool, the Climate Data Analysis Tools, are written in Python, and supported
primarily on Solaris, Mac OSX, and various Linuxes. The scientific climate
community is obviously heavily Linux- and FOSS-based!
It's an impressive project, and getting bigger all the time as more
models are run and more research data produced. It's good to know that open
source is contributing to working on the big problems out there.
« Back: Linux and Climate Change