Linux Powers Giant Database: 400TB of Climate Data and Counting
Linux and Climate Change

Juliet Kemp
Monday, November 9, 2009 11:18:33 AM
Climate change is perhaps the biggest issue facing humanity today –
but one vital part of the puzzle is to calculate what exactly the future
possibilities are. The Millennium Simulations project – now almost a
year old – aims to tackle that issue, generating data which will in due
course be used by the Intergovernmental Panel on Climate Change, the
international scientific body which is generally regarded as the authoritative
voice on the effects and risks of climate change. The Millenium Simulations
creates models of the earth's climate, looking at the influence of human
activity (including carbon generation and changes in land use), together with
natural activity such as volcanoes, to project future climatic changes. It's
the first model which uses an interactive carbon cycle, and it also has
various sub-models available, looking at land, ocean, and atmosphere.
Other Stories on LinuxPlanet
|
As you might guess, this project generates a lot of data (over 50TB
to date) with repeated runs under various conditions. There are numerous
other climate modelling projects out there, as well, all generating large
quantities of data which is useful to climate scientists internationally.
Dealing with storage of and access to this data is the World Data Center for
Climate (WDCC), part of the World Data
System. It's an international archive and data distribution center
for climate research data (based at based at Model and Data, in Hamburg, with
the co-operation of the German Climate Computing Centre). Its aim, as with
all the WDS datacenters, is to give the international scientific community
the best possible access to as much data as is possible.
The WDCC's main focus is on climate modelling and data products, rather
than on raw data. Several research projects, including the IPCC's Data
Distribution Centre, use the WDCC to help distribute data between project
partners, and to users globally, both during and after the projects. This
makes it an incredibly important resource for scientists working on climate
modelling and trying to predict potential future climate changes and their
effects. And it's all based on UNIX/Linux.
Underlying the WDCC's data archive is the CERA-2 (Climate and Environmental
Retrieving and Archiving) database system, one of the world's largest
databases. As of late 2008, it has around 400 TB of data, and nearly a
thousand named users. Running on NEC and Solaris machines, it's a federated
ORACLE database connected to a set of
StorageTek tape libraries hosted at DKRZ (the tape libraries
themselves have a 60 PB capacity, but they store data other than that
belonging to the WDCC). The database itself is a distributed system across multiple NEC
machines running Linux.
Next: Scientists Like Open Source Software »